# Team 2 - Google Play Store

![](https://www.brandnol.com/wp-content/uploads/2019/04/Google-Play-Store-Search.jpg)

_For more information about the dataset, read [here](https://www.kaggle.com/lava18/google-play-store-apps)._

## Your tasks
- Name your team!
- Read the source and do some quick research to understand more about the dataset and its topic
- Clean the data
- Perform Exploratory Data Analysis on the dataset
- Analyze the data more deeply and extract insights
- Visualize your analysis on Google Data Studio
- Present your works in front of the class and guests next Monday

## Submission Guide
- Create a Github repository for your project
- Upload the dataset (.csv file) and the Jupyter Notebook to your Github repository. In the Jupyter Notebook, **include the link to your Google Data Studio report**.
- Submit your works through this [Google Form](https://forms.gle/oxtXpGfS8JapVj3V8).

## Tips for Data Cleaning, Manipulation & Visualization
- Here are some of our tips for Data Cleaning, Manipulation & Visualization. [Click here](https://hackmd.io/cBNV7E6TT2WMliQC-GTw1A)

_____________________________

## Some Hints for This Dataset:
- There are lots of null values. How should we handle them?
- Column `Installs` and `Size` have some strange values. Can you identify them?
- Values in `Size` column are currently in different format: `M`, `k`. And how about the value `Varies with device`?
- `Price` column is not in the right data type
- And more...


In [0]:
# L M F A O - Chinese hackers, working for Koreans

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

import warnings
warnings.filterwarnings('ignore')
sns.set_style('whitegrid')

import missingno as msno # missing data visualization module for Python
import pandas_profiling
import gc
import datetime
import re

In [0]:
link = './google-play-store.csv'
store = pd.read_csv(link)

## Cleaning data

In [3]:
store.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
App               10841 non-null object
Category          10841 non-null object
Rating            9367 non-null float64
Reviews           10841 non-null object
Size              10841 non-null object
Installs          10841 non-null object
Type              10840 non-null object
Price             10841 non-null object
Content Rating    10840 non-null object
Genres            10841 non-null object
Last Updated      10841 non-null object
Current Ver       10833 non-null object
Android Ver       10838 non-null object
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [4]:
store.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [5]:
store.isna().sum().sort_values(ascending=False)

Rating            1474
Current Ver          8
Android Ver          3
Content Rating       1
Type                 1
Last Updated         0
Genres               0
Price                0
Installs             0
Size                 0
Reviews              0
Category             0
App                  0
dtype: int64

In [6]:
store[store.isna().any(axis=1)].sort_values(by="Installs", ascending=False).head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,
1559,Young Speeches,LIBRARIES_AND_DEMO,,2221,2.4M,"500,000+",Free,0,Everyone,Libraries & Demo,"January 8, 2017",1.1,2.3 and up
6322,Virtual DJ Sound Mixer,TOOLS,4.2,4010,8.7M,"500,000+",Free,0,Everyone,Tools,"May 10, 2017",,4.0 and up
8062,Racing CX,GAME,,4,20M,500+,Free,0,Everyone,Racing,"June 6, 2017",1,2.2 and up
7332,Weather Data CH,WEATHER,,15,Varies with device,500+,Paid,$2.99,Everyone,Weather,"August 9, 2016",Varies with device,Varies with device
9515,Sanu Ek Pal Chain - Raid,TOOLS,,1,2.6M,500+,Free,0,Everyone,Tools,"May 12, 2018",2.0,4.4W and up
7312,Best CG Photography,FAMILY,,1,2.5M,500+,Free,0,Unrated,Entertainment,"June 24, 2015",5.2,3.0 and up
4564,Tutorials for R Programming Offline,FAMILY,,2,4.4M,500+,Free,0,Everyone,Education,"December 4, 2017",1.0,4.0 and up
9448,EJ.by,NEWS_AND_MAGAZINES,,10,2.3M,500+,Free,0,Everyone,News & Magazines,"October 27, 2015",1.2,4.0.3 and up
7371,Valmet CI Tool,BUSINESS,,1,3.7M,500+,Free,0,Everyone,Business,"March 13, 2018",0.0.6,4.0 and up


In [7]:
# remove NaN values
store.dropna(inplace=True)
store.isna().sum().sort_values(ascending=False)

Android Ver       0
Current Ver       0
Last Updated      0
Genres            0
Content Rating    0
Price             0
Type              0
Installs          0
Size              0
Reviews           0
Rating            0
Category          0
App               0
dtype: int64

In [8]:
store['Installs'].unique()

array(['10,000+', '500,000+', '5,000,000+', '50,000,000+', '100,000+',
       '50,000+', '1,000,000+', '10,000,000+', '5,000+', '100,000,000+',
       '1,000,000,000+', '1,000+', '500,000,000+', '100+', '500+', '10+',
       '5+', '50+', '1+'], dtype=object)

In [0]:
store['Reviews'] = store['Reviews'].apply(lambda x: int(re.sub("[^0-9]", "", x)))

In [0]:
store['Installs'] = store['Installs'].apply(lambda x: int(re.sub("[^0-9]", "", x)))

In [10]:
store['Installs'].unique()

array([     10000,     500000,    5000000,   50000000,     100000,
            50000,    1000000,   10000000,       5000,  100000000,
       1000000000,       1000,  500000000,        100,        500,
               10,          5,         50,          1])

In [11]:
store['Price'].unique()

array(['0', '$4.99', '$3.99', '$6.99', '$7.99', '$5.99', '$2.99', '$3.49',
       '$1.99', '$9.99', '$7.49', '$0.99', '$9.00', '$5.49', '$10.00',
       '$24.99', '$11.99', '$79.99', '$16.99', '$14.99', '$29.99',
       '$12.99', '$2.49', '$10.99', '$1.50', '$19.99', '$15.99', '$33.99',
       '$39.99', '$3.95', '$4.49', '$1.70', '$8.99', '$1.49', '$3.88',
       '$399.99', '$17.99', '$400.00', '$3.02', '$1.76', '$4.84', '$4.77',
       '$1.61', '$2.50', '$1.59', '$6.49', '$1.29', '$299.99', '$379.99',
       '$37.99', '$18.99', '$389.99', '$8.49', '$1.75', '$14.00', '$2.00',
       '$3.08', '$2.59', '$19.40', '$3.90', '$4.59', '$15.46', '$3.04',
       '$13.99', '$4.29', '$3.28', '$4.60', '$1.00', '$2.95', '$2.90',
       '$1.97', '$2.56', '$1.20'], dtype=object)

In [0]:
store['Price'] = store['Price'].apply(lambda x: float(re.sub("[^0-9/.]", "", x)))

In [14]:
store['Price'].unique()

array([  0.  ,   4.99,   3.99,   6.99,   7.99,   5.99,   2.99,   3.49,
         1.99,   9.99,   7.49,   0.99,   9.  ,   5.49,  10.  ,  24.99,
        11.99,  79.99,  16.99,  14.99,  29.99,  12.99,   2.49,  10.99,
         1.5 ,  19.99,  15.99,  33.99,  39.99,   3.95,   4.49,   1.7 ,
         8.99,   1.49,   3.88, 399.99,  17.99, 400.  ,   3.02,   1.76,
         4.84,   4.77,   1.61,   2.5 ,   1.59,   6.49,   1.29, 299.99,
       379.99,  37.99,  18.99, 389.99,   8.49,   1.75,  14.  ,   2.  ,
         3.08,   2.59,  19.4 ,   3.9 ,   4.59,  15.46,   3.04,  13.99,
         4.29,   3.28,   4.6 ,   1.  ,   2.95,   2.9 ,   1.97,   2.56,
         1.2 ])

In [15]:
for column in store.columns:
    print(column, "\n", store[column].unique(), "\n\n")

App 
 ['Photo Editor & Candy Camera & Grid & ScrapBook' 'Coloring book moana'
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps' ...
 'Fr. Mike Schmitz Audio Teachings' 'The SCP Foundation DB fr nn5n'
 'iHoroscope - 2018 Daily Horoscope & Astrology'] 


Category 
 ['ART_AND_DESIGN' 'AUTO_AND_VEHICLES' 'BEAUTY' 'BOOKS_AND_REFERENCE'
 'BUSINESS' 'COMICS' 'COMMUNICATION' 'DATING' 'EDUCATION' 'ENTERTAINMENT'
 'EVENTS' 'FINANCE' 'FOOD_AND_DRINK' 'HEALTH_AND_FITNESS' 'HOUSE_AND_HOME'
 'LIBRARIES_AND_DEMO' 'LIFESTYLE' 'GAME' 'FAMILY' 'MEDICAL' 'SOCIAL'
 'SHOPPING' 'PHOTOGRAPHY' 'SPORTS' 'TRAVEL_AND_LOCAL' 'TOOLS'
 'PERSONALIZATION' 'PRODUCTIVITY' 'PARENTING' 'WEATHER' 'VIDEO_PLAYERS'
 'NEWS_AND_MAGAZINES' 'MAPS_AND_NAVIGATION'] 


Rating 
 [4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.2 4.6 4.  4.8 4.9 3.6 3.7 3.2 3.3 3.4 3.5
 3.1 5.  2.6 3.  1.9 2.5 2.8 2.7 1.  2.9 2.3 2.2 1.7 2.  1.8 2.4 1.6 2.1
 1.4 1.5 1.2] 


Reviews 
 ['159' '967' '87510' ... '603' '1195' '398307'] 


Size 
 ['19M' '14M' '8.7M' '25

In [16]:
store.describe()

Unnamed: 0,Rating,Installs,Price
count,9360.0,9360.0,9360.0
mean,4.191838,17908750.0,0.961279
std,0.515263,91266370.0,15.82164
min,1.0,1.0,0.0
25%,4.0,10000.0,0.0
50%,4.3,500000.0,0.0
75%,4.5,5000000.0,0.0
max,5.0,1000000000.0,400.0


In [23]:
store[store['Category'] == 'BEAUTY'].sort_values(by='Installs', ascending=False).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
117,Beauty Camera - Selfie Camera,BEAUTY,4.0,113715,Varies with device,10000000,Free,0.0,Everyone,Beauty,"August 3, 2017",Varies with device,Varies with device
7021,Best Hairstyles step by step,BEAUTY,4.5,45452,9.2M,5000000,Free,0.0,Everyone,Beauty,"July 19, 2018",1.25,4.0 and up
107,Ulta Beauty,BEAUTY,4.7,42050,Varies with device,1000000,Free,0.0,Everyone,Beauty,"June 5, 2018",5.4,5.0 and up
119,Mirror Camera (Mirror + Selfie Camera),BEAUTY,4.1,9315,2.6M,1000000,Free,0.0,Everyone,Beauty,"November 21, 2017",1.4.2,4.0 and up
102,Mirror - Zoom & Exposure -,BEAUTY,3.9,32090,Varies with device,1000000,Free,0.0,Everyone,Beauty,"October 24, 2016",Varies with device,Varies with device


In [28]:
store[store['Category'] == 'BEAUTY']['Price'].unique()

array([0.])

**Ta có nhận xét rằng các ứng dụng  ở category 'BEAUTY' trong dataset đều FREE**

In [31]:
store.sort_values(by='Content Rating', ascending=True).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
3043,DraftKings - Daily Fantasy Sports,SPORTS,4.5,50017,41M,1000000,Free,0.0,Adults only 18+,Sports,"July 24, 2018",3.21.324,4.4 and up
298,Manga Master - Best manga & comic reader,COMICS,4.6,24005,4.9M,500000,Free,0.0,Adults only 18+,Comics,"July 4, 2018",1.1.7.0,4.1 and up
6424,Manga Books,COMICS,3.8,7326,Varies with device,500000,Free,0.0,Adults only 18+,Comics,"August 3, 2018",Varies with device,Varies with device
6454,BM Online OEC Verification,TOOLS,4.1,783,4.2M,100000,Free,0.0,Everyone,Tools,"August 5, 2016",1.0.1,4.0.3 and up
6453,BM Wallet,FINANCE,4.2,798,29M,50000,Free,0.0,Everyone,Finance,"January 3, 2018",1.0.46,4.0 and up


In [33]:
store[store['Content Rating'] == 'Adults only 18+']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
298,Manga Master - Best manga & comic reader,COMICS,4.6,24005,4.9M,500000,Free,0.0,Adults only 18+,Comics,"July 4, 2018",1.1.7.0,4.1 and up
3043,DraftKings - Daily Fantasy Sports,SPORTS,4.5,50017,41M,1000000,Free,0.0,Adults only 18+,Sports,"July 24, 2018",3.21.324,4.4 and up
6424,Manga Books,COMICS,3.8,7326,Varies with device,500000,Free,0.0,Adults only 18+,Comics,"August 3, 2018",Varies with device,Varies with device


In [36]:
store[store['Content Rating'] == 'Mature 17+'].groupby(by=['App'], as_index=False).max().sort_values(by='Installs', ascending=False).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
313,Twitter,NEWS_AND_MAGAZINES,4.3,11667403,Varies with device,500000000,Free,0.0,Mature 17+,News & Magazines,"July 30, 2018",Varies with device,Varies with device
292,Tango - Live Video Broadcast,SOCIAL,4.3,3806669,Varies with device,100000000,Free,0.0,Mature 17+,Social,"August 1, 2018",Varies with device,Varies with device
220,Modern Combat 5: eSports FPS,GAME,4.3,2903386,58M,100000000,Free,0.0,Mature 17+,Action,"July 24, 2018",3.2.1c,4.0 and up
271,Sniper 3D Gun Shooter: Free Shooting Games - FPS,GAME,4.6,7674252,Varies with device,100000000,Free,0.0,Mature 17+,Action,"August 2, 2018",Varies with device,Varies with device
295,Telegram,COMMUNICATION,4.4,3128611,Varies with device,100000000,Free,0.0,Mature 17+,Communication,"July 27, 2018",Varies with device,Varies with device


In [48]:
store[store['Reviews'] >= 1000].groupby(by=['App'], as_index=False).max().sort_values(by=['Rating', 'Reviews'], ascending=False).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
2324,JW Library,BOOKS_AND_REFERENCE,4.9,922752,Varies with device,10000000,Free,0.0,Everyone,Books & Reference,"June 15, 2018",Varies with device,Varies with device
3199,Period Tracker,MEDICAL,4.9,325738,Varies with device,10000000,Free,0.0,Everyone,Medical,"July 9, 2018",Varies with device,Varies with device
3710,Six Pack in 30 Days - Abs Workout,HEALTH_AND_FITNESS,4.9,272337,13M,10000000,Free,0.0,Everyone,Health & Fitness,"June 21, 2018",1.0.2,4.2 and up
4125,Tickets + PDA 2018 Exam,AUTO_AND_VEHICLES,4.9,197136,38M,1000000,Free,0.0,Everyone,Auto & Vehicles,"July 15, 2018",8.31,4.1 and up
2490,"Learn Japanese, Korean, Chinese Offline & Free",EDUCATION,4.9,133136,26M,1000000,Free,0.0,Everyone,Education;Education,"July 20, 2018",2.16.11.10,4.2 and up


In [49]:
store[store['Reviews'] >= 1000].groupby(by=['App'], as_index=False).max().sort_values(by=['Rating', 'Reviews'], ascending=True).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
1389,EB Mobile,FAMILY,1.7,1172,5.6M,10000,Free,0.0,Everyone,Education,"October 9, 2017",1.1.2,4.1 and up
3745,Smart-AC Universal Remote Free,FAMILY,1.8,3270,1.8M,500000,Free,0.0,Everyone,Entertainment,"August 18, 2015",1.0,2.2 and up
336,AppFinder by AppTap,TOOLS,2.0,2221,4.9M,5000000,Free,0.0,Everyone,Tools,"October 3, 2017",1.8.2.7,5.0 and up
846,Candy simply-Fi,LIFESTYLE,2.1,2390,35M,100000,Free,0.0,Everyone,Lifestyle,"July 17, 2018",1.8.4.7,4.4 and up
3167,PS4 Second Screen,FAMILY,2.4,11773,3.3M,1000000,Free,0.0,Everyone,Entertainment,"June 13, 2018",18.6.2,4.1 and up


In [53]:
store.groupby(by=['Installs'], as_index=False).mean().sort_values(by=['Reviews', 'Rating'], ascending=False).head(5)

Unnamed: 0,Installs,Rating,Reviews,Price
18,1000000000,4.258621,21336180.0,0.0
17,500000000,4.35,9957384.0,0.0
16,100000000,4.411491,4671250.0,0.0
15,50000000,4.351211,1232242.0,0.0
14,10000000,4.313419,362529.6,0.011957
