## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [2]:
# Use this cell to begin your analysis, and add as many as you would like!
import pandas as pd
import numpy as np

In [3]:
# Read the csv_file
apps = pd.read_csv("datasets/apps.csv")

In [4]:
print(apps.head())

                                                 App        Category  Rating  \
0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
1                                Coloring book moana  ART_AND_DESIGN     3.9   
2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   

   Reviews  Size     Installs  Type  Price      Last Updated  
0      159  19.0      10,000+  Free    0.0   January 7, 2018  
1      967  14.0     500,000+  Free    0.0  January 15, 2018  
2    87510   8.7   5,000,000+  Free    0.0    August 1, 2018  
3   215644  25.0  50,000,000+  Free    0.0      June 8, 2018  
4      967   2.8     100,000+  Free    0.0     June 20, 2018  


In [5]:
#Check data type
print(apps['Installs'].dtype)

object


In [6]:
#Remove '+' and ',' characters from the 'Installs' column and convert to integer
apps['Installs'] = apps['Installs'].str.replace('[+,]', '', regex=True).astype(int)

In [7]:
print(apps.head())

                                                 App        Category  Rating  \
0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
1                                Coloring book moana  ART_AND_DESIGN     3.9   
2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   

   Reviews  Size  Installs  Type  Price      Last Updated  
0      159  19.0     10000  Free    0.0   January 7, 2018  
1      967  14.0    500000  Free    0.0  January 15, 2018  
2    87510   8.7   5000000  Free    0.0    August 1, 2018  
3   215644  25.0  50000000  Free    0.0      June 8, 2018  
4      967   2.8    100000  Free    0.0     June 20, 2018  


In [8]:
# Group by 'Category' and calculate the count, mean of 'Price', and mean of 'Rating'
app_category_info = apps.groupby('Category').agg({
    'App': 'count',
    'Price': 'mean',
    'Rating': 'mean'
}).reset_index()

# Rename the columns
app_category_info.columns = ['Category', 'Number of apps', 'Average price', 'Average rating']

In [9]:
print(app_category_info.head())

              Category  Number of apps  Average price  Average rating
0       ART_AND_DESIGN              64       0.093281        4.357377
1    AUTO_AND_VEHICLES              85       0.158471        4.190411
2               BEAUTY              53       0.000000        4.278571
3  BOOKS_AND_REFERENCE             222       0.539505        4.344970
4             BUSINESS             420       0.417357        4.098479


In [10]:
#Read csv_file
finance = pd.read_csv('datasets/user_reviews.csv')

In [11]:
print(finance.head())

                     App                                             Review  \
0  10 Best Foods for You  I like eat delicious food. That's I'm cooking ...   
1  10 Best Foods for You    This help eating healthy exercise regular basis   
2  10 Best Foods for You                                                NaN   
3  10 Best Foods for You         Works great especially going grocery store   
4  10 Best Foods for You                                       Best idea us   

  Sentiment Category  Sentiment Score  
0           Positive             1.00  
1           Positive             0.25  
2                NaN              NaN  
3           Positive             0.40  
4           Positive             1.00  


In [12]:
apps_finance = merged_dataset = pd.merge(apps, finance, on='App', how='inner')

In [13]:
print(apps_finance.head())

                   App        Category  Rating  Reviews  Size  Installs  Type  \
0  Coloring book moana  ART_AND_DESIGN     3.9      967  14.0    500000  Free   
1  Coloring book moana  ART_AND_DESIGN     3.9      967  14.0    500000  Free   
2  Coloring book moana  ART_AND_DESIGN     3.9      967  14.0    500000  Free   
3  Coloring book moana  ART_AND_DESIGN     3.9      967  14.0    500000  Free   
4  Coloring book moana  ART_AND_DESIGN     3.9      967  14.0    500000  Free   

   Price      Last Updated                                             Review  \
0    0.0  January 15, 2018  A kid's excessive ads. The types ads allowed a...   
1    0.0  January 15, 2018                                         It bad >:(   
2    0.0  January 15, 2018                                               like   
3    0.0  January 15, 2018                                                NaN   
4    0.0  January 15, 2018                           I love colors inspyering   

  Sentiment Category  Sent

In [14]:
new_apps_finance = apps_finance[(apps_finance['Type'] == 'Free') & (apps_finance['Category'] == 'FINANCE')]

In [15]:
# Group by 'App' and calculate the average Sentiment Score
average_sentiment_scores = new_apps_finance.groupby('App')['Sentiment Score'].mean()

In [16]:
print(average_sentiment_scores.head())

App
A+ Mobile                         0.329592
ACE Elite                         0.252171
Acorns - Invest Spare Change      0.046667
Amex Mobile                       0.175666
Associated Credit Union Mobile    0.388093
Name: Sentiment Score, dtype: float64


In [17]:
# Sort by average Sentiment Score in descending order and select the top 10 apps
top_10_user_feedback = average_sentiment_scores.reset_index().nlargest(10, 'Sentiment Score')

In [18]:
print(top_10_user_feedback)

                                           App  Sentiment Score
6                                   BBVA Spain         0.515086
4               Associated Credit Union Mobile         0.388093
9                          BankMobile Vibe App         0.353455
0                                    A+ Mobile         0.329592
29   Current debit card and app made for teens         0.327258
7                               BZWBK24 mobile         0.326883
34  Even - organize your money, get paid early         0.283929
26                                Credit Karma         0.270052
39                Fortune City - A Finance App         0.266966
15                                      Branch         0.264230


In [19]:
top_10_user_feedback_test = average_sentiment_scores.reset_index().sort_values(by='Sentiment Score', ascending=False).head(10)

In [20]:
print(top_10_user_feedback_test)

                                           App  Sentiment Score
6                                   BBVA Spain         0.515086
4               Associated Credit Union Mobile         0.388093
9                          BankMobile Vibe App         0.353455
0                                    A+ Mobile         0.329592
29   Current debit card and app made for teens         0.327258
7                               BZWBK24 mobile         0.326883
34  Even - organize your money, get paid early         0.283929
26                                Credit Karma         0.270052
39                Fortune City - A Finance App         0.266966
15                                      Branch         0.264230
