## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [36]:
import pandas as pd
apps=pd.read_csv('apps.csv')
print(apps.head(5))
# Print the total number of apps
print('Total number of apps in the dataset = ',apps.shape[0])


                                                 App        Category  Rating  \
0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
1                                Coloring book moana  ART_AND_DESIGN     3.9   
2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   

   Reviews  Size     Installs  Type  Price      Last Updated  
0      159  19.0      10,000+  Free    0.0   January 7, 2018  
1      967  14.0     500,000+  Free    0.0  January 15, 2018  
2    87510   8.7   5,000,000+  Free    0.0    August 1, 2018  
3   215644  25.0  50,000,000+  Free    0.0      June 8, 2018  
4      967   2.8     100,000+  Free    0.0     June 20, 2018  
Total number of apps in the dataset =  9659


In [37]:
apps.shape

(9659, 9)

In [38]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   App           9659 non-null   object 
 1   Category      9659 non-null   object 
 2   Rating        8196 non-null   float64
 3   Reviews       9659 non-null   int64  
 4   Size          8432 non-null   float64
 5   Installs      9659 non-null   object 
 6   Type          9659 non-null   object 
 7   Price         9659 non-null   float64
 8   Last Updated  9659 non-null   object 
dtypes: float64(3), int64(1), object(5)
memory usage: 679.3+ KB


In [39]:
# characters to remove
chars_to_remove = ['+']
# column names to clean
cols_to_clean = ['Installs']

# Loop for each column in cols_to_clean
for col in cols_to_clean:
    # Loop for each char in chars_to_remove
    for char in chars_to_remove:
        # Replace the character with an empty string
         apps[col] = apps[col].apply(lambda x: x.replace(char, ''))

In [40]:
apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,10000,Free,0.0,"January 7, 2018"
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018"
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,5000000,Free,0.0,"August 1, 2018"
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,50000000,Free,0.0,"June 8, 2018"
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,100000,Free,0.0,"June 20, 2018"


In [41]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   App           9659 non-null   object 
 1   Category      9659 non-null   object 
 2   Rating        8196 non-null   float64
 3   Reviews       9659 non-null   int64  
 4   Size          8432 non-null   float64
 5   Installs      9659 non-null   object 
 6   Type          9659 non-null   object 
 7   Price         9659 non-null   float64
 8   Last Updated  9659 non-null   object 
dtypes: float64(3), int64(1), object(5)
memory usage: 679.3+ KB


In [43]:
app_category_info=apps.pivot_table(values=['Price','Rating'], index='Category')
app_category_info.head()

Unnamed: 0_level_0,Price,Rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
ART_AND_DESIGN,0.093281,4.357377
AUTO_AND_VEHICLES,0.158471,4.190411
BEAUTY,0.0,4.278571
BOOKS_AND_REFERENCE,0.539505,4.34497
BUSINESS,0.417357,4.098479


In [51]:
app_category_info['Number of apps']=apps['Category'].value_counts()
#app_category_info['Number of apps']=apps['Category'].value_counts()
app_category_info.head()

Unnamed: 0_level_0,Price,Rating,Number of apps
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,0.093281,4.357377,64
AUTO_AND_VEHICLES,0.158471,4.190411,85
BEAUTY,0.0,4.278571,53
BOOKS_AND_REFERENCE,0.539505,4.34497,222
BUSINESS,0.417357,4.098479,420


In [56]:
#Cleaning the Data
app_category_info.reset_index(level=0, inplace=True)
app_category_info.head()

Unnamed: 0,index,Category,Price,Rating,Number of apps
0,0,ART_AND_DESIGN,0.093281,4.357377,64
1,1,AUTO_AND_VEHICLES,0.158471,4.190411,85
2,2,BEAUTY,0.0,4.278571,53
3,3,BOOKS_AND_REFERENCE,0.539505,4.34497,222
4,4,BUSINESS,0.417357,4.098479,420


In [57]:
app_category_info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   index           33 non-null     int64  
 1   Category        33 non-null     object 
 2   Price           33 non-null     float64
 3   Rating          33 non-null     float64
 4   Number of apps  33 non-null     int64  
dtypes: float64(2), int64(2), object(1)
memory usage: 1.4+ KB


In [88]:
app_category_info.drop('index',axis=1,inplace=True)
app_category_info.head(5)


Unnamed: 0,Category,Average price,Average rating,Number of apps
0,ART_AND_DESIGN,0.093281,4.357377,64
1,AUTO_AND_VEHICLES,0.158471,4.190411,85
2,BEAUTY,0.0,4.278571,53
3,BOOKS_AND_REFERENCE,0.539505,4.34497,222
4,BUSINESS,0.417357,4.098479,420


In [97]:
#Re
app_category_info.columns=['Category','Average price','Average rating','Number of apps']

app_category_info.head()

Unnamed: 0,Category,Average price,Average rating,Number of apps
0,ART_AND_DESIGN,0.093281,4.357377,64
1,AUTO_AND_VEHICLES,0.158471,4.190411,85
2,BEAUTY,0.0,4.278571,53
3,BOOKS_AND_REFERENCE,0.539505,4.34497,222
4,BUSINESS,0.417357,4.098479,420


In [99]:
app_category_info=app_category_info[['Category','Number of apps','Average price','Average rating']]
app_category_info.head()

Unnamed: 0,Category,Number of apps,Average price,Average rating
0,ART_AND_DESIGN,64,0.093281,4.357377
1,AUTO_AND_VEHICLES,85,0.158471,4.190411
2,BEAUTY,53,0.0,4.278571
3,BOOKS_AND_REFERENCE,222,0.539505,4.34497
4,BUSINESS,420,0.417357,4.098479


In [102]:
import pandas as pd
import numpy as np
user_reviews=pd.read_csv('user_reviews.csv')
user_reviews.head()

Unnamed: 0,App,Review,Sentiment Category,Sentiment Score
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25
2,10 Best Foods for You,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4
4,10 Best Foods for You,Best idea us,Positive,1.0


In [113]:
finance_apps=apps[(apps['Category']=='FINANCE')&(apps['Type']=='Free')]
finance_apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
837,K PLUS,FINANCE,4.4,124424,,10000000,Free,0.0,"June 26, 2018"
838,ING Banking,FINANCE,4.4,39041,,1000000,Free,0.0,"August 3, 2018"
839,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018"
840,The postal bank,FINANCE,3.7,36718,,5000000,Free,0.0,"July 16, 2018"
841,KTB Netbank,FINANCE,3.8,42644,19.0,5000000,Free,0.0,"June 28, 2018"


In [116]:
finance_apps_with_sentiment=pd.merge(left=finance_apps,right=user_reviews,on='App')
finance_apps_with_sentiment.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
0,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","Forget paying app, designed make fail payments...",Negative,-0.5
1,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","It's working expected, talking best bank Mexic...",Positive,0.4
2,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",It has many problems with Android 8.1. You can...,Positive,0.25
3,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","I changed my phone to a Xiaomi Redmi Note 5, t...",Positive,0.175
4,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",In her eagerness to make her look pretty with ...,Negative,-0.158333


In [117]:
finance_apps_with_sentiment=finance_apps_with_sentiment.drop(columns=['Rating','Reviews','Size','Installs','Type','Price','Last Updated'])
finance_apps_with_sentiment.head()

Unnamed: 0,App,Category,Review,Sentiment Category,Sentiment Score
0,Citibanamex Movil,FINANCE,"Forget paying app, designed make fail payments...",Negative,-0.5
1,Citibanamex Movil,FINANCE,"It's working expected, talking best bank Mexic...",Positive,0.4
2,Citibanamex Movil,FINANCE,It has many problems with Android 8.1. You can...,Positive,0.25
3,Citibanamex Movil,FINANCE,"I changed my phone to a Xiaomi Redmi Note 5, t...",Positive,0.175
4,Citibanamex Movil,FINANCE,In her eagerness to make her look pretty with ...,Negative,-0.158333


In [118]:
app_sentiment=finance_apps_with_sentiment.groupby('App').agg({'Sentiment Score':'mean'})
app_sentiment=app_sentiment.sort_values('Sentiment Score', ascending=False)
app_sentiment.head()

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
BBVA Spain,0.515086
Associated Credit Union Mobile,0.388093
BankMobile Vibe App,0.353455
A+ Mobile,0.329592
Current debit card and app made for teens,0.327258


In [120]:
app_sentiment.reset_index(level=0, inplace=True)
top_10_user_feedback=app_sentiment[:10][['App','Sentiment Score']]
top_10_user_feedback

Unnamed: 0,App,Sentiment Score
0,BBVA Spain,0.515086
1,Associated Credit Union Mobile,0.388093
2,BankMobile Vibe App,0.353455
3,A+ Mobile,0.329592
4,Current debit card and app made for teens,0.327258
5,BZWBK24 mobile,0.326883
6,"Even - organize your money, get paid early",0.283929
7,Credit Karma,0.270052
8,Fortune City - A Finance App,0.266966
9,Branch,0.26423
