## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [34]:
import pandas as pd

In [35]:
apps_original = pd.read_csv('datasets/apps.csv')
reviews_original = pd.read_csv('datasets/user_reviews.csv')

In [36]:
apps_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
App             9659 non-null object
Category        9659 non-null object
Rating          8196 non-null float64
Reviews         9659 non-null int64
Size            8432 non-null float64
Installs        9659 non-null object
Type            9659 non-null object
Price           9659 non-null float64
Last Updated    9659 non-null object
dtypes: float64(3), int64(1), object(5)
memory usage: 679.2+ KB


In [37]:
reviews_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64295 entries, 0 to 64294
Data columns (total 4 columns):
App                   64295 non-null object
Review                37427 non-null object
Sentiment Category    37432 non-null object
Sentiment Score       37432 non-null float64
dtypes: float64(1), object(3)
memory usage: 2.0+ MB


In [39]:
reviews_original.head()

Unnamed: 0,App,Review,Sentiment Category,Sentiment Score
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25
2,10 Best Foods for You,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4
4,10 Best Foods for You,Best idea us,Positive,1.0


In [40]:
apps_original.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
1710,THE aMAZEing Labyrinth,FAMILY,3.9,1615,1.2,"10,000+",Paid,4.99,"February 9, 2016"
377,Full Screen Caller ID,COMMUNICATION,4.2,104990,10.0,"5,000,000+",Free,0.0,"May 15, 2018"
3904,Zombie Avengers:(Dreamsky)Stickman War Z,GAME,4.3,13604,96.0,"1,000,000+",Paid,0.99,"June 26, 2018"
2184,TouchNote: Cards & Gifts,PHOTOGRAPHY,4.1,19232,28.0,"1,000,000+",Free,0.0,"August 6, 2018"
6935,Morse Decoder for Ham Radio,COMMUNICATION,3.7,166,2.0,"5,000+",Paid,4.99,"March 14, 2017"
8039,Bejeweled Blitz,FAMILY,4.2,222664,98.0,"10,000,000+",Free,0.0,"August 1, 2018"
8356,"Period Tracker, Pregnancy Calculator & Calendar 🌸",HEALTH_AND_FITNESS,,0,,"10,000+",Free,0.0,"August 1, 2018"
82,Supervision service,AUTO_AND_VEHICLES,4.0,2155,15.0,"500,000+",Free,0.0,"July 30, 2018"
4130,Blurfoto : Auto blur photo background & DSLR f...,PHOTOGRAPHY,4.3,2215,12.0,"500,000+",Free,0.0,"July 27, 2018"
4141,AG Screen Recorder,TOOLS,3.7,7,2.7,"1,000+",Free,0.0,"April 3, 2018"


In [41]:
apps = apps_original.copy()

In [43]:
chars_to_replace = [',', '+']

for char in chars_to_replace:
    apps['Installs'] = apps['Installs'].str.replace(char, '', regex=False)
apps['Installs'] = pd.to_numeric(apps['Installs'], errors='raise')

In [44]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
App             9659 non-null object
Category        9659 non-null object
Rating          8196 non-null float64
Reviews         9659 non-null int64
Size            8432 non-null float64
Installs        9659 non-null int64
Type            9659 non-null object
Price           9659 non-null float64
Last Updated    9659 non-null object
dtypes: float64(3), int64(2), object(4)
memory usage: 679.2+ KB


In [45]:
apps.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
9477,Wallpapers FN Herstal FNP 9,PERSONALIZATION,,0,11.0,1,Free,0.0,"April 12, 2018"
4577,SNOW - AR Camera,PHOTOGRAPHY,4.3,1017237,,50000000,Free,0.0,"July 30, 2018"
783,Hamilton — The Official App,ENTERTAINMENT,4.5,1575,43.0,100000,Free,0.0,"July 13, 2018"
5436,B L Enterprises,FINANCE,,2,6.9,10,Free,0.0,"May 11, 2018"
1448,Run Sausage Run!,GAME,4.4,275447,,10000000,Free,0.0,"August 1, 2018"
9124,Sports Lite,SPORTS,4.4,16,4.9,1000,Free,0.0,"July 31, 2018"
9353,FK Oleksandria,SPORTS,,0,26.0,10,Free,0.0,"February 14, 2018"
1594,Draw.ly - Color by Number Pixel Art Coloring,FAMILY,4.4,18616,10.0,1000000,Free,0.0,"August 5, 2018"
955,DELISH KITCHEN - FREE recipe movies make food ...,FOOD_AND_DRINK,4.6,32997,13.0,1000000,Free,0.0,"August 4, 2018"
5328,BMH-BJ Congregation,LIFESTYLE,,0,5.1,10,Free,0.0,"June 29, 2018"


In [46]:
date = pd.to_datetime(apps['Last Updated'], format = '%B %d, %Y')

In [48]:
print(apps.head())

                                                 App        Category  Rating  \
0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
1                                Coloring book moana  ART_AND_DESIGN     3.9   
2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   

   Reviews  Size  Installs  Type  Price      Last Updated  
0      159  19.0     10000  Free    0.0   January 7, 2018  
1      967  14.0    500000  Free    0.0  January 15, 2018  
2    87510   8.7   5000000  Free    0.0    August 1, 2018  
3   215644  25.0  50000000  Free    0.0      June 8, 2018  
4      967   2.8    100000  Free    0.0     June 20, 2018  


In [49]:
apps['Last Updated'] = date

In [52]:
print(apps.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
App             9659 non-null object
Category        9659 non-null object
Rating          8196 non-null float64
Reviews         9659 non-null int64
Size            8432 non-null float64
Installs        9659 non-null int64
Type            9659 non-null object
Price           9659 non-null float64
Last Updated    9659 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(3), int64(2), object(3)
memory usage: 679.2+ KB
None


In [53]:
print(apps.sample(10))

                          App   Category  Rating  Reviews  Size  Installs  \
4287       Blackjack aj Poker       GAME     3.8        8   4.2      1000   
8308  Survival: Prison Escape       GAME     4.0   127810  33.0  10000000   
1808                 CareZone    MEDICAL     4.4    27524   NaN   1000000   
6695                CricQuick     SPORTS     5.0       17   1.5        50   
2522       iWnn IME for Nexus      TOOLS     3.2     2394   NaN   5000000   
6276                  CG Jobs     FAMILY     5.0        8  14.0        10   
7863                  DW VMAX      TOOLS     3.4      843   8.5    100000   
5270        Sisense Mobile BI   BUSINESS     4.9       23  28.0      1000   
843             Nedbank Money    FINANCE     4.2     6076  32.0    500000   
3150                  B-Dubs®  LIFESTYLE     3.1     3042   NaN    500000   

      Type  Price Last Updated  
4287  Free    0.0   2016-10-19  
8308  Free    0.0   2018-08-01  
1808  Free    0.0   2018-07-30  
6695  Free    0.0   