## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [22]:
import pandas as pd
import numpy as np

In [23]:
apps = pd.read_csv('datasets/apps.csv')
apps = apps.drop_duplicates()
print(apps.sample(5))

                                          App        Category  Rating  \
6757                                 CS-Touch          FAMILY     3.9   
11    Name Art Photo Editor - Focus n Filters  ART_AND_DESIGN     4.4   
135         Step By Step Hairstyles For Women          BEAUTY     4.1   
5918                             Bx Access 4d           TOOLS     NaN   
6977                         Cx File Explorer           TOOLS     4.7   

      Reviews  Size    Installs  Type  Price    Last Updated  
6757      337  22.0     10,000+  Free    0.0    May 11, 2018  
11       8788  12.0  1,000,000+  Free    0.0   July 31, 2018  
135        66   2.9     10,000+  Free    0.0   April 5, 2018  
5918        1   1.9        100+  Free    0.0  April 29, 2016  
6977      175   4.3     10,000+  Free    0.0   July 28, 2018  


In [24]:
# clean the Installs column
chars_to_remove = ['+', ',']

for char in chars_to_remove:
    apps['Installs'] = apps['Installs'].apply(lambda x: x.replace(char, ''))
    
apps['Installs'] = apps['Installs'].astype('int')
print(np.dtype(apps['Installs']))

int64


In [25]:
# find the info for each category
num_apps_in_category = apps['Category'].value_counts()
avg_app_price = apps.groupby('Category')['Price'].mean()
avg_app_rating = apps.groupby('Category')['Rating'].mean()

In [26]:
app_category_info = pd.concat([num_apps_in_category, avg_app_price, avg_app_rating], axis=1)
app_category_info = app_category_info.reset_index()
app_category_info.columns = ['Category', 'Number of apps', 'Average price', 'Average rating']

In [27]:
app_category_info.head()

Unnamed: 0,Category,Number of apps,Average price,Average rating
0,ART_AND_DESIGN,64,0.093281,4.357377
1,AUTO_AND_VEHICLES,85,0.158471,4.190411
2,BEAUTY,53,0.0,4.278571
3,BOOKS_AND_REFERENCE,222,0.539505,4.34497
4,BUSINESS,420,0.417357,4.098479


In [28]:
# method 2
app_category_info = apps.groupby('Category').agg({'App': 'count', 'Price': 'mean', 'Rating': 'mean'})
app_category_info = app_category_info.rename(columns={"App": "Number of apps", "Price": "Average price", "Rating": "Average rating"})

In [29]:
app_category_info.head()

Unnamed: 0_level_0,Number of apps,Average price,Average rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,64,0.093281,4.357377
AUTO_AND_VEHICLES,85,0.158471,4.190411
BEAUTY,53,0.0,4.278571
BOOKS_AND_REFERENCE,222,0.539505,4.34497
BUSINESS,420,0.417357,4.098479


In [30]:
# find top 10 free finance apps
reviews_df = pd.read_csv('datasets/user_reviews.csv')
merged_df = apps.merge(reviews_df, on='App')
merged_df = merged_df.dropna(subset=['Sentiment Score'])

In [31]:
free_finance_apps = merged_df.query('Type == "Free" and Category == "FINANCE"')
app_sentiment_score = free_finance_apps.groupby('App').agg({'Sentiment Score':'mean'})

In [32]:
top_10_user_feedback = app_sentiment_score.sort_values('Sentiment Score', ascending=False)[:10]

In [33]:
top_10_user_feedback

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
BBVA Spain,0.515086
Associated Credit Union Mobile,0.388093
BankMobile Vibe App,0.353455
A+ Mobile,0.329592
Current debit card and app made for teens,0.327258
BZWBK24 mobile,0.326883
"Even - organize your money, get paid early",0.283929
Credit Karma,0.270052
Fortune City - A Finance App,0.266966
Branch,0.26423
