## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [1]:
#importing important libraries
import pandas as pd
import numpy as np

In [2]:
#importing the dataset csv file
apps=pd.read_csv('apps.csv')

#showing the first few rows
apps.head() 

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,"10,000+",Free,0.0,"January 7, 2018"
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,"500,000+",Free,0.0,"January 15, 2018"
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,"5,000,000+",Free,0.0,"August 1, 2018"
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,"50,000,000+",Free,0.0,"June 8, 2018"
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,"100,000+",Free,0.0,"June 20, 2018"


In [3]:
#view some information about variables in the dataset
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   App           9659 non-null   object 
 1   Category      9659 non-null   object 
 2   Rating        8196 non-null   float64
 3   Reviews       9659 non-null   int64  
 4   Size          8432 non-null   float64
 5   Installs      9659 non-null   object 
 6   Type          9659 non-null   object 
 7   Price         9659 non-null   float64
 8   Last Updated  9659 non-null   object 
dtypes: float64(3), int64(1), object(5)
memory usage: 679.3+ KB


#### it looks like there are some missing values in the data set, let's see how much?!

In [4]:
apps.isnull().sum() #counting the number of missing values in each variable

App                0
Category           0
Rating          1463
Reviews            0
Size            1227
Installs           0
Type               0
Price              0
Last Updated       0
dtype: int64

In [5]:
#filling missing rating values with its mean value
apps.Rating.fillna(apps.Rating.mean(),inplace = True);

In [6]:
#filling missing size values with its mean value
apps.Size.fillna(apps.Size.mean() , inplace = True);

In [7]:
# checking again
apps.isnull().sum()

App             0
Category        0
Rating          0
Reviews         0
Size            0
Installs        0
Type            0
Price           0
Last Updated    0
dtype: int64

In [8]:
# cleaning some dirty variables
apps['Installs'] = apps['Installs'].str.replace(',', '') #removing commas in 'installs' columns
apps['Installs'] = apps['Installs'].str.replace('+', '') #removing plus signs in 'installs' columns

  apps['Installs'] = apps['Installs'].str.replace('+', '') #removing plus signs in 'installs' columns


In [9]:
#changing 'installs' column data type from string to intgers 
apps=apps.astype({'Installs': 'int64'})

In [10]:
apps.Installs.dtype

dtype('int64')

In [11]:
#counting the number of apps in each category
app_category=apps.groupby('Category')[['App']].count()

#sorting the categories in descending order by the number apps in a category
app_category.sort_values('App', ascending = False) 

Unnamed: 0_level_0,App
Category,Unnamed: 1_level_1
FAMILY,1832
GAME,959
TOOLS,827
BUSINESS,420
MEDICAL,395
PERSONALIZATION,376
PRODUCTIVITY,374
LIFESTYLE,369
FINANCE,345
SPORTS,325


In [12]:
#getting the mean price and rating of each category
Price_Rating_category=apps.groupby('Category')[['Price','Rating']].mean()
Price_Rating_category.sort_values('Rating', ascending = False) 

Unnamed: 0_level_0,Price,Rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
EDUCATION,0.150924,4.3628
EVENTS,1.718594,4.357682
ART_AND_DESIGN,0.093281,4.348746
BOOKS_AND_REFERENCE,0.539505,4.303972
PERSONALIZATION,0.400213,4.299237
PARENTING,0.159667,4.278874
BEAUTY,0.0,4.256711
GAME,0.296465,4.243736
WEATHER,0.41038,4.23687
SOCIAL,0.06682,4.236137


In [13]:
#instatiating a new data frame considering categories data
app_category_info=pd.DataFrame() 

In [14]:
#adding number of apps in each category to the new data frame
app_category_info['Number of apps']=app_category['App'] 

In [15]:
#adding mean price in each category to the new data frame
app_category_info['Average price']=Price_Rating_category['Price']

In [16]:
#adding mean rating apps in each category to the new data frame
app_category_info['Average rating']=Price_Rating_category['Rating']

In [17]:
app_category_info

Unnamed: 0_level_0,Number of apps,Average price,Average rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,64,0.093281,4.348746
AUTO_AND_VEHICLES,85,0.158471,4.187987
BEAUTY,53,0.0,4.256711
BOOKS_AND_REFERENCE,222,0.539505,4.303972
BUSINESS,420,0.417357,4.126427
COMICS,56,0.0,4.181187
COMMUNICATION,315,0.263937,4.131179
DATING,171,0.160468,4.014094
EDUCATION,119,0.150924,4.3628
ENTERTAINMENT,102,0.078235,4.135294


In [18]:
#resetting the index
app_category_info.reset_index(level=0, inplace=True)


In [19]:
app_category_info.head()

Unnamed: 0,Category,Number of apps,Average price,Average rating
0,ART_AND_DESIGN,64,0.093281,4.348746
1,AUTO_AND_VEHICLES,85,0.158471,4.187987
2,BEAUTY,53,0.0,4.256711
3,BOOKS_AND_REFERENCE,222,0.539505,4.303972
4,BUSINESS,420,0.417357,4.126427


In [20]:
#reading a new dataset to merge it with the existing one
user_reviews=pd.read_csv('user_reviews.csv')

In [21]:
#merging apps dataframe with user_reviews
apps_users=apps.merge(user_reviews,on='App')

In [22]:
#quering free finance appse
apps_free_finance=apps_users[(apps_users['Type']=='Free')&(apps_users['Category']=='FINANCE')]

#number of free finance apps
apps_free_finance.shape[0]

2200

In [23]:
#sortig the free finance apps by sentiment descending and App ascending
apps_users_srt=apps_free_finance.sort_values(['Sentiment Score','App'],ascending=[False,True])

In [24]:
apps_users_srt.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
58882,A+ Mobile,FINANCE,3.9,730,6.3,10000,Free,0.0,"June 26, 2018",This best bank ever. Don't take word stop A+ F...,Positive,1.0
58930,A+ Mobile,FINANCE,3.9,730,6.3,10000,Free,0.0,"June 26, 2018",AWESOME!!! Thank You.,Positive,1.0
58937,A+ Mobile,FINANCE,3.9,730,6.3,10000,Free,0.0,"June 26, 2018",APFCU greatest !!!,Positive,1.0
58938,A+ Mobile,FINANCE,3.9,730,6.3,10000,Free,0.0,"June 26, 2018",LOVE IT!!!!,Positive,1.0
58951,A+ Mobile,FINANCE,3.9,730,6.3,10000,Free,0.0,"June 26, 2018",Awesome,Positive,1.0


In [25]:
# getting mean sentiment score for each app
apps_users_group=apps_users_srt.groupby('App')[['Sentiment Score']].mean()


In [26]:
#sorting apps_users_group
apps_users_group_srt=apps_users_group.sort_values(['Sentiment Score','App']
                                                  ,ascending=[False,True])

In [27]:
#getting the top 10 free finance apps in mean sentiment scores
top_10_user_feedback=apps_users_group_srt[:10]
top_10_user_feedback


Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
BBVA Spain,0.515086
Associated Credit Union Mobile,0.388093
BankMobile Vibe App,0.353455
A+ Mobile,0.329592
Current debit card and app made for teens,0.327258
BZWBK24 mobile,0.326883
"Even - organize your money, get paid early",0.283929
Credit Karma,0.270052
Fortune City - A Finance App,0.266966
Branch,0.26423
