## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

### Project Tasks:
>* Read the apps.csv file and clean the Installs column to convert it into integer data type. Save your answer as a DataFrame `apps`. Going forward, you will do all your analysis on the apps DataFrame.
>* Find the number of apps in each category, the average price, and the average rating. Save your answer as a DataFrame `app_category_info`. Your should rename the four columns as: Category, Number of apps, Average price, Average rating.
>* Find the top 10 free FINANCE apps having the highest average sentiment score. Save your answer as a DataFrame `top_10_user_feedback`. Your answer should have exactly 10 rows and two columns named: App and Sentiment Score, where the average Sentiment Score is sorted from highest to lowest.

In [21]:
import pandas as pd
import numpy as np

In [22]:
apps = pd.read_csv('apps.csv')
apps.head()

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [23]:
char_to_remove = ['+', ',', '$']
col = ['Installs', 'Price']
for col in col:
    for char in char_to_remove:
        apps[col] = apps[col].apply(lambda x: x.replace(char, '') )

In [24]:
apps.head()

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,10000,Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,5000000,Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,50000000,Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,100000,Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [26]:
apps['Installs'] = apps.Installs.astype(int)
apps['Price'] = apps.Installs.astype(int)

In [27]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      9659 non-null   int64  
 1   App             9659 non-null   object 
 2   Category        9659 non-null   object 
 3   Rating          8196 non-null   float64
 4   Reviews         9659 non-null   int64  
 5   Size            8432 non-null   float64
 6   Installs        9659 non-null   int32  
 7   Type            9659 non-null   object 
 8   Price           9659 non-null   int32  
 9   Content Rating  9659 non-null   object 
 10  Genres          9659 non-null   object 
 11  Last Updated    9659 non-null   object 
 12  Current Ver     9651 non-null   object 
 13  Android Ver     9657 non-null   object 
dtypes: float64(2), int32(2), int64(2), object(8)
memory usage: 981.1+ KB


# app_category_info


### The hard way

In [28]:
num_apps_category = apps['Category'].value_counts().sort_values(ascending=False)
num_apps_category.head()

FAMILY      1832
GAME         959
TOOLS        827
BUSINESS     420
MEDICAL      395
Name: Category, dtype: int64

In [29]:
num_apps_category_sorted_index = num_apps_category.sort_index()
num_apps_category_sorted_index.head()

ART_AND_DESIGN          64
AUTO_AND_VEHICLES       85
BEAUTY                  53
BOOKS_AND_REFERENCE    222
BUSINESS               420
Name: Category, dtype: int64

In [30]:
avg_price_category = apps.groupby('Category')['Price'].mean()
avg_price_category.sort_values(ascending=False).head()

Category
COMMUNICATION    3.504215e+07
VIDEO_PLAYERS    2.409143e+07
SOCIAL           2.296179e+07
ENTERTAINMENT    2.072216e+07
PHOTOGRAPHY      1.654501e+07
Name: Price, dtype: float64

In [31]:
avg_price_category_sorted_index = avg_price_category.sort_index()
avg_price_category_sorted_index.head()

Category
ART_AND_DESIGN         1.786533e+06
AUTO_AND_VEHICLES      6.250613e+05
BEAUTY                 5.131519e+05
BOOKS_AND_REFERENCE    7.504367e+06
BUSINESS               1.659916e+06
Name: Price, dtype: float64

In [32]:
avg_rating_category = apps.groupby('Category')['Rating'].mean()
avg_rating_category.sort_values(ascending=False).head()

Category
EVENTS                 4.435556
EDUCATION              4.364407
ART_AND_DESIGN         4.357377
BOOKS_AND_REFERENCE    4.344970
PERSONALIZATION        4.332215
Name: Rating, dtype: float64

In [33]:
avg_rating_category_sorted_index = avg_rating_category.sort_index()
avg_rating_category_sorted_index.head()

Category
ART_AND_DESIGN         4.357377
AUTO_AND_VEHICLES      4.190411
BEAUTY                 4.278571
BOOKS_AND_REFERENCE    4.344970
BUSINESS               4.098479
Name: Rating, dtype: float64

### app_category_info DataFrame

In [34]:
app_category_info = pd.DataFrame({'Category':num_apps_category_sorted_index.index, 
                                  'Number of apps': num_apps_category_sorted_index.values, 
                                  'Average price': avg_price_category_sorted_index.values,
                                  'Average rating':avg_rating_category_sorted_index.values})

app_category_info.head()

Unnamed: 0,Category,Number of apps,Average price,Average rating
0,ART_AND_DESIGN,64,1786533.0,4.357377
1,AUTO_AND_VEHICLES,85,625061.3,4.190411
2,BEAUTY,53,513151.9,4.278571
3,BOOKS_AND_REFERENCE,222,7504367.0,4.34497
4,BUSINESS,420,1659916.0,4.098479


### Using Groupby

In [35]:
app_category_info = apps.groupby('Category').agg({'App': 'count', 
                                                  'Price': 'mean', 
                                                  'Rating': 'mean'})
app_category_info.head()

Unnamed: 0_level_0,App,Price,Rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,64,1786533.0,4.357377
AUTO_AND_VEHICLES,85,625061.3,4.190411
BEAUTY,53,513151.9,4.278571
BOOKS_AND_REFERENCE,222,7504367.0,4.34497
BUSINESS,420,1659916.0,4.098479


In [36]:
app_category_info = app_category_info.rename(columns={'App': 'Number of apps', 'Price': 'Average price', 'Rating': 'Average rating'})
app_category_info.head()

Unnamed: 0_level_0,Number of apps,Average price,Average rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,64,1786533.0,4.357377
AUTO_AND_VEHICLES,85,625061.3,4.190411
BEAUTY,53,513151.9,4.278571
BOOKS_AND_REFERENCE,222,7504367.0,4.34497
BUSINESS,420,1659916.0,4.098479


# top_10_user_feedback

In [38]:
user_review = pd.read_csv('user_reviews.csv')
user_review.head()

Unnamed: 0,App,Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462
2,10 Best Foods for You,,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4,0.875
4,10 Best Foods for You,Best idea us,Positive,1.0,0.3


In [39]:
merged_df = apps.merge(user_review, on='App')
merged_df.head()

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,500000,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,A kid's excessive ads. The types ads allowed a...,Negative,-0.25,1.0
1,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,500000,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,It bad >:(,Negative,-0.725,0.833333
2,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,500000,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,like,Neutral,0.0,0.0
3,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,500000,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,,,,
4,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,500000,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,I love colors inspyering,Positive,0.5,0.6


In [40]:
finance_apps = merged_df[merged_df.Category.isin(['FINANCE'])]
finance_apps.head(2)

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
14112,1050,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,5000000,Everyone,Finance,"July 27, 2018",20.1.0,5.0 and up,"Forget paying app, designed make fail payments...",Negative,-0.5,0.3
14113,1050,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,5000000,Everyone,Finance,"July 27, 2018",20.1.0,5.0 and up,"It's working expected, talking best bank Mexic...",Positive,0.4,0.45


In [41]:
free_finance_apps = finance_apps[finance_apps['Type'] == 'Free']

In [42]:
free_finance_apps.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2200 entries, 14112 to 60235
Data columns (total 18 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Unnamed: 0              2200 non-null   int64  
 1   App                     2200 non-null   object 
 2   Category                2200 non-null   object 
 3   Rating                  2200 non-null   float64
 4   Reviews                 2200 non-null   int64  
 5   Size                    1600 non-null   float64
 6   Installs                2200 non-null   int32  
 7   Type                    2200 non-null   object 
 8   Price                   2200 non-null   int32  
 9   Content Rating          2200 non-null   object 
 10  Genres                  2200 non-null   object 
 11  Last Updated            2200 non-null   object 
 12  Current Ver             2200 non-null   object 
 13  Android Ver             2200 non-null   object 
 14  Review                  1435 non-nu

In [47]:
app_sentiment_score = free_finance_apps.groupby('App').agg({'Sentiment_Polarity': 'mean'})

app_sentiment_score.head()

Unnamed: 0_level_0,Sentiment_Polarity
App,Unnamed: 1_level_1
A+ Mobile,0.329592
ACE Elite,0.252171
Acorns - Invest Spare Change,0.046667
Amex Mobile,0.175666
Associated Credit Union Mobile,0.388093


In [52]:
app_sentiment_score_sorted = app_sentiment_score.sort_values(by = 'Sentiment_Polarity', ascending=False)
app_sentiment_score_sorted.head()

Unnamed: 0_level_0,Sentiment_Polarity
App,Unnamed: 1_level_1
BBVA Spain,0.515086
Associated Credit Union Mobile,0.388093
BankMobile Vibe App,0.353455
A+ Mobile,0.329592
Current debit card and app made for teens,0.327258


In [51]:
top_10_user_feedback = app_sentiment_score_sorted.head(10)
top_10_user_feedback

Unnamed: 0_level_0,Sentiment_Polarity
App,Unnamed: 1_level_1
BBVA Spain,0.515086
Associated Credit Union Mobile,0.388093
BankMobile Vibe App,0.353455
A+ Mobile,0.329592
Current debit card and app made for teens,0.327258
BZWBK24 mobile,0.326883
"Even - organize your money, get paid early",0.283929
Credit Karma,0.270052
Fortune City - A Finance App,0.266966
Branch,0.26423


` Note`

This is the **Unguided** version of The Android App Market Analysis on DataCamp.

**DONE ON:** 22.09.2021