# Introduction

Comprehensive analysis of the Android app market by comparing thousands of apps in the Google Play store.

# About the Dataset of Google Play Store Apps & Reviews

**Data Source:** <br>
App and review data was scraped from the Google Play Store by Lavanya Gupta in 2018. Original files listed [here](
https://www.kaggle.com/lava18/google-play-store-apps).

# Import Statements

In [101]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go


# Notebook Presentation

In [None]:
pd.options.display.float_format = '{:,.2f}'.format

# Read the Dataset

In [3]:
df_apps = pd.read_csv('apps.csv')

# Data Cleaning

In [4]:
df_apps.shape

(10841, 12)

In [5]:
df_apps.sample(5)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres,Last_Updated,Android_Ver
502,DZ Puzzle,FAMILY,,14,47.0,10,Paid,$0.99,Everyone,Puzzle,"April 22, 2017",2.3 and up
1264,EG India,LIFESTYLE,5.0,3,4.0,100,Free,0,Everyone,Lifestyle,"July 29, 2018",4.0.3 and up
8724,ABCmouse.com,FAMILY,4.3,50887,91.0,5000000,Free,0,Everyone,Education;Education,"July 3, 2018",4.4 and up
5833,BF Combat: Genesis,GAME,4.0,3322,82.0,100000,Free,0,Mature 17+,Action,"March 2, 2017",2.3 and up
3210,Elemental Galaxy Dx - Match3,FAMILY,4.0,127,14.0,10000,Free,0,Everyone,Puzzle,"March 11, 2014",2.2 and up


In [6]:
df_apps.columns

Index(['App', 'Category', 'Rating', 'Reviews', 'Size_MBs', 'Installs', 'Type',
       'Price', 'Content_Rating', 'Genres', 'Last_Updated', 'Android_Ver'],
      dtype='object')

### Drop Unused Columns
 

In [7]:
df_apps.drop(['Last_Updated', "Android_Ver"], axis=1, inplace=True)

In [8]:
df_apps.columns

Index(['App', 'Category', 'Rating', 'Reviews', 'Size_MBs', 'Installs', 'Type',
       'Price', 'Content_Rating', 'Genres'],
      dtype='object')

### Find and Remove NaN values in Ratings

**Challenge**: How may rows have a NaN value (not-a-number) in the Ratings column? Create DataFrame called `df_apps_clean` that does not include these rows. 

In [9]:
len(df_apps)

10841

In [10]:
df_apps_clean = df_apps.dropna(subset=["Rating"])


In [11]:
len(df_apps_clean)

9367

### Remove Duplicates



In [12]:
df_apps_clean.duplicated().sum()


476

In [13]:
df_apps_clean[df_apps_clean["App"] == "Instagram"].duplicated().sum()


1

In [14]:
df_apps_clean.drop_duplicates(subset=["App", "Type", "Price"], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_apps_clean.drop_duplicates(subset=["App", "Type", "Price"], inplace=True)


In [15]:
len(df_apps_clean)

8199

# Highest Rated Apps

In [16]:
ratings = df_apps_clean.sort_values("Rating", ascending=False)
ratings.head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
21,KBA-EZ Health Guide,MEDICAL,5.0,4,25.0,1,Free,0,Everyone,Medical
1230,Sway Medical,MEDICAL,5.0,3,22.0,100,Free,0,Everyone,Medical
1227,AJ Men's Grooming,LIFESTYLE,5.0,2,22.0,100,Free,0,Everyone,Lifestyle
1224,FK Dedinje BGD,SPORTS,5.0,36,2.6,100,Free,0,Everyone,Sports
1223,CB VIDEO VISION,PHOTOGRAPHY,5.0,13,2.6,100,Free,0,Everyone,Photography
1222,"Beacon Baptist Jupiter, FL",LIFESTYLE,5.0,14,2.6,100,Free,0,Everyone,Lifestyle
1214,BV Mobile Apps,PRODUCTIVITY,5.0,3,4.8,100,Free,0,Everyone,Productivity
2680,Florida Wildflowers,FAMILY,5.0,5,69.0,1000,Free,0,Everyone,Education
1206,ADS-B Driver,TOOLS,5.0,2,6.3,100,Paid,$1.99,Everyone,Tools
2750,"Superheroes, Marvel, DC, Comics, TV, Movies News",COMICS,5.0,34,12.0,5000,Free,0,Everyone,Comics


*Not enough reviews*

# Largest Apps in terms of Size (MBs)

In [17]:
largest_apps = df_apps_clean.sort_values("Size_MBs", ascending=False)
largest_apps.head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
9942,Talking Babsy Baby: Baby Games,LIFESTYLE,4.0,140995,100.0,10000000,Free,0,Everyone,Lifestyle;Pretend Play
10687,Hungry Shark Evolution,GAME,4.5,6074334,100.0,100000000,Free,0,Teen,Arcade
9943,Miami crime simulator,GAME,4.0,254518,100.0,10000000,Free,0,Mature 17+,Action
9944,Gangster Town: Vice District,FAMILY,4.3,65146,100.0,10000000,Free,0,Mature 17+,Simulation
3144,Vi Trainer,HEALTH_AND_FITNESS,3.6,124,100.0,5000,Free,0,Everyone,Health & Fitness
9945,Ultimate Tennis,SPORTS,4.3,183004,100.0,10000000,Free,0,Everyone,Sports
7926,Post Bank,FINANCE,4.5,60449,100.0,1000000,Free,0,Everyone,Finance
7927,The Walking Dead: Our World,GAME,4.0,22435,100.0,1000000,Free,0,Teen,Action
7928,Stickman Legends: Shadow Wars,GAME,4.4,38419,100.0,1000000,Paid,$0.99,Everyone 10+,Action
4176,Car Crash III Beam DH Real Damage Simulator 2018,GAME,3.6,151,100.0,10000,Free,0,Everyone,Racing


# App with Most Reviews

In [139]:
reviews = df_apps_clean.sort_values("Reviews", ascending=False)
reviews.head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres,Revenue_Estimate
10805,Facebook,SOCIAL,4.1,78158306,5.3,1000000000,Free,0.0,Teen,Social,0.0
10785,WhatsApp Messenger,COMMUNICATION,4.4,69119316,3.5,1000000000,Free,0.0,Everyone,Communication,0.0
10806,Instagram,SOCIAL,4.5,66577313,5.3,1000000000,Free,0.0,Teen,Social,0.0
10784,Messenger – Text and Video Chat for Free,COMMUNICATION,4.0,56642847,3.5,1000000000,Free,0.0,Everyone,Communication,0.0
10650,Clash of Clans,GAME,4.6,44891723,98.0,100000000,Free,0.0,Everyone 10+,Strategy,0.0
10744,Clean Master- Space Cleaner & Antivirus,TOOLS,4.7,42916526,3.4,500000000,Free,0.0,Everyone,Tools,0.0
10835,Subway Surfers,GAME,4.5,27722264,76.0,1000000000,Free,0.0,Everyone 10+,Arcade,0.0
10828,YouTube,VIDEO_PLAYERS,4.3,25655305,4.65,1000000000,Free,0.0,Teen,Video Players & Editors,0.0
10746,"Security Master - Antivirus, VPN, AppLock, Boo...",TOOLS,4.7,24900999,3.4,500000000,Free,0.0,Everyone,Tools,0.0
10584,Clash Royale,GAME,4.6,23133508,97.0,100000000,Free,0.0,Everyone 10+,Strategy,0.0


In [19]:
head_reviews = reviews.head(50)
head_reviews[head_reviews["Type"] == "Paid"].count()

App               0
Category          0
Rating            0
Reviews           0
Size_MBs          0
Installs          0
Type              0
Price             0
Content_Rating    0
Genres            0
dtype: int64

# Content Ratings

In [20]:
ratings = df_apps_clean.Content_Rating.value_counts()
ratings

Everyone           6621
Teen                912
Mature 17+          357
Everyone 10+        305
Adults only 18+       3
Unrated               1
Name: Content_Rating, dtype: int64

In [21]:
fig = px.pie(labels=ratings.index, values=ratings.values, title='Content Ratings', names=ratings.index, hole=0.4)
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

  return args["labels"][column]


# Examine the Number of Installs


In [22]:
print("Apps with more than BILLION installations: ", df_apps_clean.Installs.value_counts().loc["1,000,000,000"])
print("Number of apps with only 1 installation: ", df_apps_clean.Installs.value_counts().loc["1"])

Apps with more than BILLION installations:  20
Number of apps with only 1 installation:  3


# Most Expensive Apps & Calculate Sales Revenue Estimate



In [23]:
df_apps_clean.Price = df_apps_clean.Price.str.replace("$", "", regex=False)  # usuwasz znaki dolara z tekstu. gdy ustawisz parametr regex na false
# to nie traktuje dolara jako notacji regexowej

df_apps_clean.Price = pd.to_numeric(df_apps_clean.Price)
most_expensive_apps = df_apps_clean.sort_values("Price", ascending=False)
most_expensive_apps.head(20)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
3946,I'm Rich - Trump Edition,LIFESTYLE,3.6,275,7.3,10000,Paid,400.0,Everyone,Lifestyle
2461,I AM RICH PRO PLUS,FINANCE,4.0,36,41.0,1000,Paid,399.99,Everyone,Finance
4606,I Am Rich Premium,FINANCE,4.1,1867,4.7,50000,Paid,399.99,Everyone,Finance
3145,I am rich(premium),FINANCE,3.5,472,0.94,5000,Paid,399.99,Everyone,Finance
3554,💎 I'm rich,LIFESTYLE,3.8,718,26.0,10000,Paid,399.99,Everyone,Lifestyle
5765,I am rich,LIFESTYLE,3.8,3547,1.8,100000,Paid,399.99,Everyone,Lifestyle
1946,I am rich (Most expensive app),FINANCE,4.1,129,2.7,1000,Paid,399.99,Teen,Finance
2775,I Am Rich Pro,FAMILY,4.4,201,2.7,5000,Paid,399.99,Everyone,Entertainment
3221,I am Rich Plus,FAMILY,4.0,856,8.7,10000,Paid,399.99,Everyone,Entertainment
3114,I am Rich,FINANCE,4.3,180,3.8,5000,Paid,399.99,Everyone,Finance


### The most expensive apps sub $250

In [24]:
expensive_under_250 = most_expensive_apps[most_expensive_apps["Price"] < 250]
expensive_under_250.head(20)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
2281,Vargo Anesthesia Mega App,MEDICAL,4.6,92,32.0,1000,Paid,79.99,Everyone,Medical
1407,LTC AS Legal,MEDICAL,4.0,6,1.3,100,Paid,39.99,Everyone,Medical
2629,I am Rich Person,LIFESTYLE,4.2,134,1.8,1000,Paid,37.99,Everyone,Lifestyle
2481,A Manual of Acupuncture,MEDICAL,3.5,214,68.0,1000,Paid,33.99,Everyone,Medical
4264,Golfshot Plus: Golf GPS,SPORTS,4.1,3387,25.0,50000,Paid,29.99,Everyone,Sports
504,AP Art History Flashcards,FAMILY,5.0,1,96.0,10,Paid,29.99,Mature 17+,Education
2207,EMT PASS,MEDICAL,3.4,51,2.4,1000,Paid,29.99,Everyone,Medical
2463,PTA Content Master,MEDICAL,4.2,64,41.0,1000,Paid,29.99,Everyone,Medical
3241,"Muscle Premium - Human Anatomy, Kinesiology, B...",MEDICAL,4.2,168,25.0,10000,Paid,24.99,Everyone,Medical
4772,Human Anatomy Atlas 2018: Complete 3D Human Body,MEDICAL,4.5,2921,25.0,100000,Paid,24.99,Everyone,Medical


### Highest Grossing Paid Apps (ballpark estimate)

In [25]:
expensive_under_250.Installs = expensive_under_250.Installs.astype('str').str.replace(",", '')
expensive_under_250.Installs = pd.to_numeric(expensive_under_250.Installs)

expensive_under_250["Revenue_Estimate"] = expensive_under_250.Installs.mul(expensive_under_250.Price)
expensive_under_250.sort_values("Revenue_Estimate", ascending=False).head(10)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres,Revenue_Estimate
9220,Minecraft,FAMILY,4.5,2376564,19.0,10000000,Paid,6.99,Everyone 10+,Arcade;Action & Adventure,69900000.0
8825,Hitman Sniper,GAME,4.6,408292,29.0,10000000,Paid,0.99,Mature 17+,Action,9900000.0
7151,Grand Theft Auto: San Andreas,GAME,4.4,348962,26.0,1000000,Paid,6.99,Mature 17+,Action,6990000.0
7477,Facetune - For Free,PHOTOGRAPHY,4.4,49553,48.0,1000000,Paid,5.99,Everyone,Photography,5990000.0
7977,Sleep as Android Unlock,LIFESTYLE,4.5,23966,0.85,1000000,Paid,5.99,Everyone,Lifestyle,5990000.0
6594,DraStic DS Emulator,GAME,4.6,87766,12.0,1000000,Paid,4.99,Everyone,Action,4990000.0
6082,Weather Live,WEATHER,4.5,76593,4.75,500000,Paid,5.99,Everyone,Weather,2995000.0
7954,Bloons TD 5,FAMILY,4.6,190086,94.0,1000000,Paid,2.99,Everyone,Strategy,2990000.0
6856,Threema,COMMUNICATION,4.5,51110,3.5,1000000,Paid,2.99,Everyone,Communication,2990000.0
6746,Card Wars - Adventure Time,FAMILY,4.3,129603,23.0,1000000,Paid,2.99,Everyone 10+,Card;Action & Adventure,2990000.0


# Analysing App Categories

In [26]:
df_apps_clean.Category.nunique()

33

In [27]:
top10_category = df_apps_clean.Category.value_counts()
top10_category

FAMILY                 1610
GAME                    910
TOOLS                   719
FINANCE                 302
LIFESTYLE               302
PRODUCTIVITY            301
PERSONALIZATION         298
MEDICAL                 292
PHOTOGRAPHY             263
BUSINESS                262
SPORTS                  260
COMMUNICATION           257
HEALTH_AND_FITNESS      243
NEWS_AND_MAGAZINES      204
SOCIAL                  203
TRAVEL_AND_LOCAL        187
SHOPPING                180
BOOKS_AND_REFERENCE     169
VIDEO_PLAYERS           148
DATING                  134
EDUCATION               118
MAPS_AND_NAVIGATION     118
ENTERTAINMENT           102
FOOD_AND_DRINK           94
AUTO_AND_VEHICLES        73
WEATHER                  72
LIBRARIES_AND_DEMO       64
HOUSE_AND_HOME           62
ART_AND_DESIGN           61
COMICS                   54
PARENTING                50
EVENTS                   45
BEAUTY                   42
Name: Category, dtype: int64

In [28]:
bar = px.bar(x = top10_category.index[:10],
             y= top10_category.values[:10])
bar.show()

### Highest Competition (Number of Apps)

In [None]:
df_apps_clean.Installs = df_apps_clean.Installs.astype('str').str.replace(",", '')
df_apps_clean.Installs = pd.to_numeric(df_apps_clean.Installs)
most_installed = df_apps_clean.groupby('Category').agg({"Installs": pd.Series.sum})
ten_most_installed = most_installed.sort_values("Installs", ascending=False, inplace=False)[:10]
ten_most_installed = ten_most_installed.sort_values("Installs", ascending=True, inplace=False)[:10]
installed_bar = px.bar(x=ten_most_installed.Installs, y=ten_most_installed.index, orientation='h', title="Most Installed Apps")
installed_bar.update_layout(xaxis_title="Number of Downloads", yaxis_title="Category")
installed_bar.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### Most Popular Categories (Highest Downloads)

In [49]:
most_installed

Unnamed: 0_level_0,Installs
Category,Unnamed: 1_level_1
ART_AND_DESIGN,114233100
AUTO_AND_VEHICLES,53129800
BEAUTY,26916200
BOOKS_AND_REFERENCE,1665791655
BUSINESS,692018120
COMICS,44931100
COMMUNICATION,11039241530
DATING,140912410
EDUCATION,352852000
ENTERTAINMENT,2113660000


In [62]:
installs_merge = pd.concat([top10_category, most_installed], axis=1)

In [75]:
scatter = px.scatter(data_frame=installs_merge,
                     x='Category',
                     y="Installs",
                     size="Category",
                     hover_name=installs_merge.index,
                     color="Installs",
                     title="Category Concentration")

scatter.update_layout(xaxis_title="Number of Apps (Lower=More Concentrated)",
                      yaxis_title="Installs",
                      yaxis=dict(type='log'))
scatter.show()

# Top Genres

In [76]:
df_apps_clean.head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
21,KBA-EZ Health Guide,MEDICAL,5.0,4,25.0,1,Free,0.0,Everyone,Medical
28,Ra Ga Ba,GAME,5.0,2,20.0,1,Paid,1.49,Everyone,Arcade
47,Mu.F.O.,GAME,5.0,2,16.0,1,Paid,0.99,Everyone,Arcade
82,Brick Breaker BR,GAME,5.0,7,19.0,5,Free,0.0,Everyone,Arcade
99,Anatomy & Physiology Vocabulary Exam Review App,MEDICAL,5.0,1,4.6,5,Free,0.0,Everyone,Medical
114,FK Atlantas,SPORTS,1.5,2,26.0,5,Free,0.0,Everyone,Sports
126,Tablet Reminder,MEDICAL,5.0,4,2.5,5,Free,0.0,Everyone,Medical
128,CQ ESPM,BUSINESS,5.0,2,3.4,5,Free,0.0,Everyone,Business
141,Clinic Doctor EHr,MEDICAL,5.0,2,7.1,5,Free,0.0,Everyone,Medical
151,EB Cash Collections,BUSINESS,5.0,1,4.3,5,Free,0.0,Everyone,Business


In [91]:
genres_num = df_apps_clean.Genres.str.split(';', expand=True).stack()
genres_counted = genres_num.value_counts()
genres_counted

Tools                      719
Education                  587
Entertainment              502
Action                     304
Lifestyle                  303
Finance                    302
Productivity               301
Personalization            298
Medical                    292
Sports                     270
Photography                263
Business                   262
Communication              258
Health & Fitness           245
Casual                     216
News & Magazines           204
Social                     203
Simulation                 200
Travel & Local             187
Arcade                     185
Shopping                   180
Books & Reference          171
Video Players & Editors    150
Dating                     134
Puzzle                     124
Maps & Navigation          118
Role Playing               111
Racing                     103
Action & Adventure          96
Strategy                    95
Food & Drink                94
Educational                 93
Adventur

In [98]:
genres_bar = px.bar(x=genres_counted.index[:15],
                    y=genres_counted.values[:15],
                    title='Top Genres',
                    hover_name=genres_counted.index[:15],
                    color=genres_counted.values[:15],
                    color_continuous_scale='Agsunset')
genres_bar.show()

# Free vs. Paid Apps per Category

In [100]:
df_free_vs_paid = df_apps_clean.groupby(["Category", "Type"], as_index=False).agg({'App': pd.Series.count})
df_free_vs_paid


Unnamed: 0,Category,Type,App
0,ART_AND_DESIGN,Free,58
1,ART_AND_DESIGN,Paid,3
2,AUTO_AND_VEHICLES,Free,72
3,AUTO_AND_VEHICLES,Paid,1
4,BEAUTY,Free,42
...,...,...,...
56,TRAVEL_AND_LOCAL,Paid,8
57,VIDEO_PLAYERS,Free,144
58,VIDEO_PLAYERS,Paid,4
59,WEATHER,Free,65


In [115]:
df_free_vs_paid.sort_values(by="App", ascending=False, inplace=True)
free_vs_paid_bar = px.bar(data_frame=df_free_vs_paid,
                          x='Category',
                          y='App',
                          color='Type',
                          barmode='group',
                          title='Free vs Paid Apps by Category'
                          )
free_vs_paid_bar.update_layout(xaxis_title="Number of Apps",
                      yaxis_title="Category of App",
                      yaxis=dict(type='log'),
                      xaxis=dict(tickangle=90))
free_vs_paid_bar.show()

# Lost Downloads for Paid Apps


In [126]:
giving_up_bar = px.box(data_frame=df_apps_clean,
                       y="Installs",
                       x="Type",
                       color="Type",
                       notched=True,
                       points='all',
                       title="How Many Downloads are Paid Apps Giving Up")
giving_up_bar.update_layout(yaxis=dict(type='log'), width=800,
                            height=600)
giving_up_bar.show()

In [137]:
df_apps_clean.Installs = df_apps_clean.Installs.astype('str').str.replace(",", '')
df_apps_clean.Installs = pd.to_numeric(df_apps_clean.Installs)

df_apps_clean["Revenue_Estimate"] = df_apps_clean.Installs.mul(expensive_under_250.Price)
df_apps_clean.sort_values("Revenue_Estimate", ascending=False).head(10)


df_paid_apps = df_apps_clean[df_apps_clean['Type'] == 'Paid']
box = px.box(df_paid_apps, 
             x='Category', 
             y='Revenue_Estimate',
             title='How Much Can Paid Apps Earn?')
 
box.update_layout(xaxis_title='Category',
                  yaxis_title='Paid App Ballpark Revenue',
                  xaxis={'categoryorder':'min ascending', 
                         'tickangle': 90},
                  yaxis=dict(type='log'),
                  width=1200,
                  height=800)
 
 
box.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

