## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [1]:
import pandas as pd

In [2]:
apps = pd.read_csv('datasets/apps.csv')
reviews = pd.read_csv('datasets/user_reviews.csv')

In [3]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
App             9659 non-null object
Category        9659 non-null object
Rating          8196 non-null float64
Reviews         9659 non-null int64
Size            8432 non-null float64
Installs        9659 non-null object
Type            9659 non-null object
Price           9659 non-null float64
Last Updated    9659 non-null object
dtypes: float64(3), int64(1), object(5)
memory usage: 679.2+ KB


In [4]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64295 entries, 0 to 64294
Data columns (total 4 columns):
App                   64295 non-null object
Review                37427 non-null object
Sentiment Category    37432 non-null object
Sentiment Score       37432 non-null float64
dtypes: float64(1), object(3)
memory usage: 2.0+ MB


In [5]:
reviews.head()

Unnamed: 0,App,Review,Sentiment Category,Sentiment Score
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25
2,10 Best Foods for You,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4
4,10 Best Foods for You,Best idea us,Positive,1.0


In [6]:
apps.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
6828,CT CONNECT,PRODUCTIVITY,,2,28.0,50+,Free,0.0,"August 8, 2017"
683,EasyBib: Citation Generator,EDUCATION,3.5,1405,7.3,"100,000+",Free,0.0,"March 29, 2018"
9011,CNY Slots : Gong Xi Fa Cai 发财机,GAME,3.6,33,71.0,"5,000+",Free,0.0,"June 6, 2017"
1534,Soul Knight,GAME,4.7,292164,59.0,"10,000,000+",Free,0.0,"August 1, 2018"
7437,Insave-Download for Instagram,PHOTOGRAPHY,4.3,46242,5.2,"1,000,000+",Free,0.0,"September 20, 2016"
8655,"ES File Explorer & Manager, Locker Xplorer 2018",TOOLS,3.5,11,3.0,"1,000+",Free,0.0,"March 17, 2018"
7661,Dr. Panda Farm,FAMILY,4.1,265,46.0,"10,000+",Paid,2.99,"July 21, 2016"
2206,Makeup Photo Editor: Makeup Camera & Makeup Ed...,PHOTOGRAPHY,4.4,10525,25.0,"1,000,000+",Free,0.0,"July 27, 2018"
9245,FH Wallet,FINANCE,,0,9.9,1+,Free,0.0,"July 26, 2018"
2840,Family Album Mitene: Private Photo & Video Sha...,PARENTING,4.7,34336,,"1,000,000+",Free,0.0,"July 31, 2018"


In [9]:
# Removing special characters from the 'Installs' column.

chars_to_replace = [',', '+']

for char in chars_to_replace:
    apps['Installs'] = apps['Installs'].str.replace(char, '', regex=False)
apps['Installs'] = pd.to_numeric(apps['Installs'], errors='raise')

In [7]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
App             9659 non-null object
Category        9659 non-null object
Rating          8196 non-null float64
Reviews         9659 non-null int64
Size            8432 non-null float64
Installs        9659 non-null object
Type            9659 non-null object
Price           9659 non-null float64
Last Updated    9659 non-null object
dtypes: float64(3), int64(1), object(5)
memory usage: 679.2+ KB


In [8]:
apps.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
154,Offline English Dictionary,BOOKS_AND_REFERENCE,4.2,860,13.0,"100,000+",Free,0.0,"July 17, 2018"
226,UPS Mobile,BUSINESS,3.9,23243,,"5,000,000+",Free,0.0,"June 25, 2018"
2300,Fotor Photo Editor - Photo Collage & Photo Eff...,PHOTOGRAPHY,4.5,597068,,"10,000,000+",Free,0.0,"July 9, 2018"
4960,Avios for Android,TRAVEL_AND_LOCAL,2.5,862,30.0,"100,000+",Free,0.0,"April 10, 2018"
5875,BW COMPANY FINDER,BUSINESS,4.6,48,7.4,"1,000+",Free,0.0,"August 21, 2017"
8343,Stickman Warriors Heroes 2,GAME,4.4,13714,41.0,"1,000,000+",Free,0.0,"April 15, 2017"
462,"Chat Rooms, Avatars, Date - Galaxy",DATING,4.3,135418,,"10,000,000+",Free,0.0,"July 7, 2018"
7143,DB BAHN,TRAVEL_AND_LOCAL,,4,8.0,500+,Free,0.0,"December 19, 2017"
2113,Blibli.com Online Shopping,SHOPPING,4.2,171584,12.0,"10,000,000+",Free,0.0,"July 26, 2018"
8026,The Simpsons™: Tapped Out,FAMILY,4.3,636995,49.0,"10,000,000+",Free,0.0,"July 31, 2018"


In [231]:
'''
date = pd.to_datetime(apps['Last Updated'], format = '%B %d, %Y')
apps['Last Updated'] = date
'''

In [10]:
# a = app_category_info['Category'] = apps.groupby('Category')['Category'].count()
no_apps_df = apps.groupby('Category')['App'].count()
avg_price_df = apps.groupby('Category')['Price'].agg('mean')
avg_rating_df = apps.groupby('Category')['Rating'].agg('mean')

In [11]:
app_category_info = pd.DataFrame(data = {
                     'Number of apps':no_apps_df, 
                     'Average price':avg_price_df, 
                     'Average rating':avg_rating_df})

In [12]:
app_category_info.reset_index()

Unnamed: 0,Category,Number of apps,Average price,Average rating
0,ART_AND_DESIGN,64,0.093281,4.357377
1,AUTO_AND_VEHICLES,85,0.158471,4.190411
2,BEAUTY,53,0.0,4.278571
3,BOOKS_AND_REFERENCE,222,0.539505,4.34497
4,BUSINESS,420,0.417357,4.098479
5,COMICS,56,0.0,4.181481
6,COMMUNICATION,315,0.263937,4.121484
7,DATING,171,0.160468,3.970149
8,EDUCATION,119,0.150924,4.364407
9,ENTERTAINMENT,102,0.078235,4.135294


In [27]:
app_category_info['Number of apps'].sum()

9659

In [13]:
# DATAFRAME CREATED, #2 DONE

apps.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
2263,HD Camera Ultra,PHOTOGRAPHY,4.3,462152,1.5,10000000,Free,0.0,"October 17, 2015"
7145,"10,000 Quotes DB (Premium)",BOOKS_AND_REFERENCE,4.1,70,3.5,500,Paid,0.99,"August 30, 2013"
2786,"Polaris Office - Word, Docs, Sheets, Slide, PDF",PRODUCTIVITY,4.3,549900,60.0,10000000,Free,0.0,"July 18, 2018"
5107,Sexy Hot Detector Prank,FAMILY,3.9,17067,2.7,5000000,Free,0.0,"February 13, 2018"
5976,SegPlay Mobile Paint by Number,FAMILY,3.7,1478,43.0,100000,Free,0.0,"April 22, 2017"
206,Call Blocker,BUSINESS,4.6,188841,3.2,5000000,Free,0.0,"June 21, 2018"
1291,Safeway,LIFESTYLE,4.3,33572,37.0,1000000,Free,0.0,"August 2, 2018"
6557,Pokémon TV,FAMILY,4.2,117461,,5000000,Free,0.0,"June 29, 2018"
2128,Rossmann PL,SHOPPING,4.0,15867,,5000000,Free,0.0,"August 6, 2018"
2855,Baby Care & Tracker,PARENTING,4.1,319,11.0,100000,Free,0.0,"July 12, 2018"



328 FREE finance apps, 345 total

In [14]:
score_apps_table = apps.merge(reviews, on='App', how='inner')

In [21]:
score_apps_table.sample(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
55224,ColorNote Notepad Notes,PRODUCTIVITY,4.6,2401017,,100000000,Free,0.0,"June 27, 2018",A useful particularly forgetful people like me...,Positive,0.191667
56490,HTC Gallery,VIDEO_PLAYERS,4.1,45744,,10000000,Free,0.0,"June 3, 2016",,,
27706,Cooking Fever,GAME,4.5,3197865,82.0,100000000,Free,0.0,"July 12, 2018",Why drawback game??? I upgraded 100% upgrading...,Negative,-0.4
47818,Cheapflights – Flight Search,TRAVEL_AND_LOCAL,4.4,47780,19.0,5000000,Free,0.0,"July 31, 2018",Very nice,Positive,0.78
1452,Filters for Selfie,BEAUTY,4.3,8572,25.0,1000000,Free,0.0,"May 10, 2018",Disgusting.... I think good reviews paid.... !...,Negative,-0.0625


In [28]:
finance_scores_table = score_apps_table[(score_apps_table['Category'] == 'FINANCE') & (score_apps_table['Type'] == 'Free')]

In [29]:
finance_scores_table

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
14112,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","Forget paying app, designed make fail payments...",Negative,-0.500000
14113,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","It's working expected, talking best bank Mexic...",Positive,0.400000
14114,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",It has many problems with Android 8.1. You can...,Positive,0.250000
14115,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","I changed my phone to a Xiaomi Redmi Note 5, t...",Positive,0.175000
14116,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",In her eagerness to make her look pretty with ...,Negative,-0.158333
14117,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",I can not activate my mobile Netkey because it...,Positive,0.116667
14118,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",The mobile netkey does not work on Android 8.1...,Neutral,0.000000
14119,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",The new update is frozen on the home screen wi...,Positive,0.168182
14120,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","Almost everything works well, only that the tr...",Neutral,0.000000
14121,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","In my case after the update stopped working, t...",Positive,0.243750


In [33]:
top_10_user_feedback = finance_scores_table.groupby('App')['Sentiment Score'].agg('mean')

In [36]:
pd.Dara(top_10_user_feedback.shape)

(46,)

In [32]:
top_10.groupby('App')['Sentiment Score'].agg('mean').sort_values(ascending = False)

NameError: name 'top_10' is not defined

In [246]:
top_10_user_feedback = pd.DataFrame(top_10_user_feedback.groupby('App')['Sentiment Score'].agg('mean').sort_values(ascending=False)[:10])

NameError: name 'top_10_user_feedback' is not defined

In [None]:
top_10_user_feedback