# Exploratory Data Analysis Googleplaystore

## Dataset Googleplaystore

Dataset Googleplaystore teridiri dari :
1. Googleplaystore Apps : googleplaystore
2. Googleplaystore user review : googleplaystore_user_review

Tipe kedua dataset tersebut adalah csv

## Importing library & Dataset

In [None]:
# Import library
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import numpy as np
import matplotlib.ticker as ticker
import pandas as pd

import plotly
# connected=True untuk memastikan versi plotyly uptodate
plotly.offline.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff

# Import data
dfGapps = pd.read_csv('dataset/googleplaystore.csv')
dfGUserReview = pd.read_csv('dataset/googleplaystore_user_reviews.csv')


## Googleplaystore Apps

Googleplaystore Apps adalah dataset mengenai aplikasi yang terdapat di googleplaystore dan terdiri dari :

In [None]:
# Baris beserta tipe data
dfGapps.dtypes

In [None]:
# Jumlah baris dan kolom
dfGapps.shape

In [None]:
# Menampilkan dataset
dfGapps.head()

## Googleplaystore User Review

Googleplaystore Usert Review adalah dataset mengenaai review dari tiap user terhadap aplikasi dalam Googleplaystore dan terdiri dari :

In [None]:
# Baris dan tipe data
dfGUserReview.dtypes

In [None]:
# Jumlah baris dan kolom
dfGUserReview.shape

In [None]:
# menampilkan dataset
dfGUserReview.head()

## Menghitung Keseluruhan Review dari dfGUserReview

In [None]:
count = dfGUserReview['App'].value_counts() 
print(count) 

### Membersihkan data kosong dari dataset Googleplaystore User Review 

Googleplaystore User Review memiliki banyak data yang nilainya kosong atau NaN dan tidak memiliki arti untuk disimpan kedalam dataset,
Sehingga data tersebut harus dibuang untuk meringankan pengolahan data.

In [None]:
# membuat tampungan baru untuk diolah
dfNaN = dfGUserReview

# menampilkan jumlah baris sebelum didrop
print("\nJumlah baris sebelum mendrop baris Null : ", len(dfNaN)) 

# mendrop baris yang memiliki nilai NaN minimum 4 dari tampungan ke-2
dfNaNDrop = dfNaN.dropna(axis = 0, how ='all', thresh=4) 

# Menampilkan perbandingan dari tampungan ke-1 dan tampungan ke-2 setelah di drop
print("Panjang dataframe Sebelumnya:", len(dfNaN), "\nPanjang dataframe Setelah didrop:",  
       len(dfNaNDrop), "\nJumlah baris dengan Nilai NaN minimum 4 : ", 
       (len(dfNaN)-len(dfNaNDrop)))

# memasukan nilai dari data yang sudah dibersihkan dari nilai NaN ke dfGUserReview
dfGUserReview = dfNaNDrop

# Mereset Index dari dfGUserReview
dfGUserReview.set_index('App', inplace=True)
dfGUserReview.reset_index(inplace=True)

## Mencari value unik dari dataset Googleplaystore Apps

Dalam Dataset GooglePlaystore terdapat value-value unik yabg dapat ditamplikan untuk mendapat gambaran mengenai value yang ada dari tiap kolom.
Berikut ini adalah value unik dari tiap kolom :

### Category

In [None]:
# nilai unik dari kolom kategori
dfGapps['Category'].unique()

Dalam kolom kategori ditemukan value yang tidak sesuai dengan keseluruhan data yang ada yaitu '1.9', maka harus ditemukan baris keberapa yang memiliki nilai tersebut.

In [None]:
dfGapps[dfGapps['Category'] == '1.9']

dalam baris tersebut terlihat ada pergeseran data sehingga baris dan kolom tidak sinkron satu dengan yang lain. untuk mengatasi hal tersebut digunakan shift method yang ada dalam pandas.

In [None]:
# Menggunakan metode shifting untuk mengeser data ke kanan
dfGapps.iloc[10472,1:] = dfGapps.iloc[10472,1:].shift(1)
dfGapps[10470:10475]

Setelah di lihat pada playstore App "Life Made WI-Fi Touchscreen Photo Frame" Memiliki Category Lifestyle

In [None]:
# Menambahkan Category LIFESTYLE ke Aplikasi "Life Made WI-Fi Touchscreen Photo Frame"
dfGapps.iloc[10472,1] = 'LIFESTYLE'
dfGapps[10470:10475]

In [None]:
# Mengecek ulang type dari kolom dfGapps
dfGapps.dtypes

In [None]:
# Mengubah kolom rating & Reviews kembali menjadi tipe float64
dfGapps['Rating'] = dfGapps['Rating'].apply(pd.to_numeric, errors='coerce')
dfGapps['Reviews'] = dfGapps['Reviews'].apply(pd.to_numeric, errors='coerce')

In [None]:
# membuat list untuk menampung data Category yang unik
data_cat = {'Category':dfGapps['Category'].unique()} 

# memasukan data kategori ke dataframe
dfCat = pd.DataFrame(data_cat)
dfCat.head(10)

Didalam Dataset Googleplaystore terdapat banyak aplikasi dengan berbagai category dan dengan mengitung jumlah apliasi dari tiap category dapat ditemukan category yang paling terkenal dalam Googleplaystore.

In [None]:
dfGapps.groupby('Category')['App'].count()

Hasil data diatas dapat divisualisasikan kedalam bentuk barchart seperti ini.

In [None]:
plt.figure(figsize=(12,12))
sns.countplot(y='Category',data = dfGapps)
plt.title("Jumlah Aplikasi berdasarkan Category")

### Rating

In [None]:
# nilai unik dari kolom Rating
dfGapps['Rating'].unique()

datatype 'Rating' bertipe object dan harus diubah menjadi numerik agar dapat disorting

In [None]:
# mengkonversi tipe data dari kolom Rating
dfGapps['Rating'] = pd.to_numeric(dfGapps['Rating'],errors='coerce')

In [None]:
# Menampilkan Aplikasi yang memiliki rating dari terendah ke tertinggi
dfRat = dfGapps
dfRat.sort_values(by='Rating',ascending=False)[['App','Rating','Category']].head()

### Installs

In [None]:
# nilai unik dari kolom Installs
dfGapps['Installs'].unique()

Setelah menemukan Nilai unik dari kolom Installs, dapat mengetahui jarak dari satu titik milestone ke titik milestone lainnya. 

In [None]:
# Menambahkan Kolom Install temp 

dfGapps.insert(6, "Installs temp", dfGapps['Installs'].apply(lambda x: x.replace(',', '') if ',' in str(x) else x), True)
dfGapps['Installs temp'] = dfGapps['Installs temp'].apply(lambda x: float(str(x).replace('+', '')) /1000 if '+' in str(x)else x)
dfGapps['Installs temp'] = dfGapps['Installs temp'].apply(lambda x: str(x))
dfGapps['Installs temp'] = dfGapps['Installs temp'].apply(lambda x: x.replace('.0', '') if '.0' in str(x) else x) + 'K+'

### Size

In [None]:
# Mengubah nilai 'Varies with device menjadi 'NaN' dan Mengkonversi value Size kedalam satuan Megabyte
dfGapps['Size'] = dfGapps['Size'].apply(lambda x: str(x).replace('Varies with device', 'NaN') if 'Varies with device' in str(x) else x)

dfGapps['Size'] = dfGapps['Size'].apply(lambda x: str(x).replace('M', '') if 'M' in str(x) else x)
dfGapps['Size'] = dfGapps['Size'].apply(lambda x: str(x).replace(',', '') if 'M' in str(x) else x)

# mengkonversi kilobyte menjadi byte dengan / 1000
dfGapps['Size'] = dfGapps['Size'].apply(lambda x: float(str(x).replace('k', '')) / 1000 if 'k' in str(x) else x)

# Mengubah type Size kembali menjadi Float
dfGapps['Size'] = dfGapps['Size'].apply(lambda x: float(x))


### Type

In [None]:
# nilai unik dari kolom Type
dfGapps['Type'].unique()

### Price

In [None]:
# nilai unik dari kolom Type
dfGapps['Price'].unique()

dengan mengetahui harga-harga yang dimiliki oleh tiap Aplikasi yang ada didalam googleplaystore dapat mengetahui 20 besar aplikasi termahal yang ada, namun sebelumnya harus dibuat konverter currency dengan kode berikut :

In [None]:
def convert_currency(val):
    
    new_val = val.replace(',','').replace('$', '')
    return float(new_val)

In [None]:
# Membuat dataframe baru yang berisikan nilai dari dataframe dfGapps
dfPrice = dfGapps
# mengaplikasikan method convert_currency kedalam dfPrice
dfPrice['Price'] = dfGapps['Price'].apply(convert_currency)

In [None]:
# Menampilkan 20 aplikasi termahal didalam googleplaystore
dfPrice.sort_values(by='Price',ascending=False)[['App','Price','Category']].head(20)

### Content Rating

In [None]:
# nilai unik dari kolom Content Rating
dfGapps['Content Rating'].unique()

### Genres

In [None]:
# nilai unik dari kolom Genres
dfGapps['Genres'].unique()

### Android Ver

In [None]:
# nilai unik dari kolom Android Ver
dfGapps['Android Ver'].unique()

## Visualisasi Free & Paid App dalam Barplot

In [None]:
type_group = dfGapps.groupby('Type').App.count()
type_group

In [None]:
plt.figure(figsize=(12,10))
ax = sns.countplot(x='Type',data = dfGapps, order=['Free','Paid'])
plt.title("Perbandingan App Free & Paid")
plt.xlabel('Type Aplikasi')

# Membuat variable ncount untuk panjan data dfGapps
ncount = len(dfGapps)

# Membuat twin Axis
ax2 = ax.twinx()

# Menukar Posisi count axis dan frequency
ax2.yaxis.tick_left()
ax.yaxis.tick_right()

# Menukar Posisi label dari count dan frequency
ax.yaxis.set_label_position('right')
ax2.yaxis.set_label_position('left')

ax2.set_ylabel('Frequency [%]')

for p in ax.patches:
    x=p.get_bbox().get_points()[:,0]
    y=p.get_bbox().get_points()[1,1]
    ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y), 
            ha='center', va='bottom')

# Menggunakan LinearLocator untuk memastikan jarak tiap tick benar    
ax.yaxis.set_major_locator(ticker.LinearLocator(11))

# Mengubah range frequensi menjadi 0 - 100
ax2.set_ylim(0,100)
ax.set_ylim(0,ncount)

# menggunakan multipicator untuk memastikan jarak tiap tick adalah 10
ax2.yaxis.set_major_locator(ticker.MultipleLocator(10))

# Mematikan grid pada ax2 agar gridline dapat terlihat
ax2.grid(None)

## Histogram Distribusi Rating

In [None]:
dfGapps["Rating"].plot(kind="hist",color="blue",bins=42,grid=True,alpha=0.65,label="Rating",figsize=(15,10))
plt.legend()
plt.xlabel("Rating")
plt.title("Distribusi Rating Seluruh App")

In [None]:
type_count = dfGapps.groupby('Type')

In [None]:
free_count_group = type_count.get_group('Free')
paid_count_group = type_count.get_group('Paid')

In [None]:
fig, axes = plt.subplots(nrows=2,ncols=1,figsize=(12,12))

free_count_group["Rating"].plot(kind="hist",color="blue",bins=42,grid=True,alpha=0.65,label="Rating",ax=axes[0])
axes[0].legend()
axes[0].set_xlabel('Rating')
axes[0].set_title("Distribusi Rating App Free")

paid_count_group["Rating"].plot(kind="hist",color="blue",bins=42,grid=True,alpha=0.65,label="Rating",ax=axes[1])
axes[1].legend()
axes[1].set_xlabel("Rating")
axes[1].set_title("Distribusi Rating App Paid")

In [None]:
count_cat = dfGapps.groupby('Category')

In [None]:
count_cat0 = count_cat.get_group('ART_AND_DESIGN')
count_cat1 = count_cat.get_group('AUTO_AND_VEHICLES')
count_cat2 = count_cat.get_group('BEAUTY')
count_cat3 = count_cat.get_group('BOOKS_AND_REFERENCE')
count_cat4 = count_cat.get_group('BUSINESS')
count_cat5 = count_cat.get_group('COMICS')
count_cat6 = count_cat.get_group('COMMUNICATION')
count_cat7 = count_cat.get_group('DATING')
count_cat8 = count_cat.get_group('EDUCATION')
count_cat9 = count_cat.get_group('ENTERTAINMENT')
count_cat10 = count_cat.get_group('EVENTS')
count_cat11 = count_cat.get_group('FAMILY')
count_cat12 = count_cat.get_group('FINANCE')
count_cat13 = count_cat.get_group('FOOD_AND_DRINK')
count_cat14 = count_cat.get_group('GAME')
count_cat15 = count_cat.get_group('HEALTH_AND_FITNESS')
count_cat16 = count_cat.get_group('HOUSE_AND_HOME')
count_cat17 = count_cat.get_group('LIBRARIES_AND_DEMO')
count_cat18 = count_cat.get_group('LIFESTYLE')
count_cat19 = count_cat.get_group('MAPS_AND_NAVIGATION')
count_cat20 = count_cat.get_group('MEDICAL')
count_cat21 = count_cat.get_group('NEWS_AND_MAGAZINES')
count_cat22 = count_cat.get_group('PARENTING')
count_cat23 = count_cat.get_group('PERSONALIZATION')
count_cat24 = count_cat.get_group('PHOTOGRAPHY')
count_cat25 = count_cat.get_group('PRODUCTIVITY')
count_cat26 = count_cat.get_group('SHOPPING')
count_cat27 = count_cat.get_group('SOCIAL')
count_cat28 = count_cat.get_group('SPORTS')
count_cat29 = count_cat.get_group('TOOLS')
count_cat30 = count_cat.get_group('TRAVEL_AND_LOCAL')
count_cat31 = count_cat.get_group('VIDEO_PLAYERS')
count_cat32 = count_cat.get_group('WEATHER')

In [None]:
fig, axes = plt.subplots(nrows=7,ncols=5,figsize=(25,35))
count_cat0["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[0][0])
axes[0][0].set_xlim(1.0,5.0)
axes[0][0].set_ylim(0,800)
axes[0][0].legend()
axes[0][0].set_xlabel('Rating')
axes[0][0].set_title("Category ART_AND_DESIGN")

count_cat1["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[0][1])
axes[0][1].set_xlim(1.0,5.0)
axes[0][1].set_ylim(0,800)
axes[0][1].legend()
axes[0][1].set_xlabel('Rating')
axes[0][1].set_title("Category AUTO_AND_VEHICLES")

count_cat2["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[0][2])
axes[0][2].set_xlim(1.0,5.0)
axes[0][2].set_ylim(0,800)
axes[0][2].legend()
axes[0][2].set_xlabel('Rating')
axes[0][2].set_title("Category BEAUTY")

count_cat3["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[0][3])
axes[0][3].set_xlim(1.0,5.0)
axes[0][3].set_ylim(0,800)
axes[0][3].legend()
axes[0][3].set_xlabel('Rating')
axes[0][3].set_title("Category BOOKS_AND_REFERENCE")

count_cat4["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[0][4])
axes[0][4].set_xlim(1.0,5.0)
axes[0][4].set_ylim(0,800)
axes[0][4].legend()
axes[0][4].set_xlabel('Rating')
axes[0][4].set_title("Category BUSINESS")

count_cat5["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[1][0])
axes[1][0].set_xlim(1.0,5.0)
axes[1][0].set_ylim(0,800)
axes[1][0].legend()
axes[1][0].set_xlabel('Rating')
axes[1][0].set_title("Category COMICS")

count_cat6["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[1][1])
axes[1][1].set_xlim(1.0,5.0)
axes[1][1].set_ylim(0,800)
axes[1][1].legend()
axes[1][1].set_xlabel('Rating')
axes[1][1].set_title("Category COMMUNICATION")

count_cat7["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[1][2])
axes[1][2].set_xlim(1.0,5.0)
axes[1][2].set_ylim(0,800)
axes[1][2].legend()
axes[1][2].set_xlabel('Rating')
axes[1][2].set_title("Category DATING")

count_cat8["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[1][3])
axes[1][3].set_xlim(1.0,5.0)
axes[1][3].set_ylim(0,800)
axes[1][3].legend()
axes[1][3].set_xlabel('Rating')
axes[1][3].set_title("Category EDUCATION")

count_cat9["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[1][4])
axes[1][4].set_xlim(1.0,5.0)
axes[1][4].set_ylim(0,800)
axes[1][4].legend()
axes[1][4].set_xlabel('Rating')
axes[1][4].set_title("Category ENTERTAINMENT")

count_cat10["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[2][0])
axes[2][0].set_xlim(1.0,5.0)
axes[2][0].set_ylim(0,800)
axes[2][0].legend()
axes[2][0].set_xlabel('Rating')
axes[2][0].set_title("Category EVENTS")

count_cat11["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[2][1])
axes[2][1].set_xlim(1.0,5.0)
axes[2][1].set_ylim(0,800)
axes[2][1].legend()
axes[2][1].set_xlabel('Rating')
axes[2][1].set_title("Category FAMILY")

count_cat12["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[2][2])
axes[2][2].set_xlim(1.0,5.0)
axes[2][2].set_ylim(0,800)
axes[2][2].legend()
axes[2][2].set_xlabel('Rating')
axes[2][2].set_title("Category FINANCE")

count_cat13["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[2][3])
axes[2][3].set_xlim(1.0,5.0)
axes[2][3].set_ylim(0,800)
axes[2][3].legend()
axes[2][3].set_xlabel('Rating')
axes[2][3].set_title("Category FOOD_AND_DRINK")

count_cat14["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[2][4])
axes[2][4].set_xlim(1.0,5.0)
axes[2][4].set_ylim(0,800)
axes[2][4].legend()
axes[2][4].set_xlabel('Rating')
axes[2][4].set_title("Category GAME")

count_cat15["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[3][0])
axes[3][0].set_xlim(1.0,5.0)
axes[3][0].set_ylim(0,800)
axes[3][0].legend()
axes[3][0].set_xlabel('Rating')
axes[3][0].set_title("Category HOUSE_AND_HOME")

count_cat16["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[3][1])
axes[3][1].set_xlim(1.0,5.0)
axes[3][1].set_ylim(0,800)
axes[3][1].legend()
axes[3][1].set_xlabel('Rating')
axes[3][1].set_title("Category LIBRARIES_AND_DEMO")

count_cat17["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[3][2])
axes[3][2].set_xlim(1.0,5.0)
axes[3][2].set_ylim(0,800)
axes[3][2].legend()
axes[3][2].set_xlabel('Rating')
axes[3][2].set_title("Category LIFESTYLE")

count_cat18["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[3][3])
axes[3][3].set_xlim(1.0,5.0)
axes[3][3].set_ylim(0,800)
axes[3][3].legend()
axes[3][3].set_xlabel('Rating')
axes[3][3].set_title("Category MEDICAL")

count_cat19["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[3][4])
axes[3][4].set_xlim(1.0,5.0)
axes[3][4].set_ylim(0,800)
axes[3][4].legend()
axes[3][4].set_xlabel('Rating')
axes[3][4].set_title("Category NEWS_AND_MAGAZINES")

count_cat20["Rating"].plot(kind="hist",color="blue",bins=20,grid=True,alpha=0.65,label="Rating",ax=axes[4][0])
axes[4][0].set_xlim(1.0,5.0)
axes[4][0].set_ylim(0,800)
axes[4][0].legend()
axes[4][0].set_xlabel('Rating')
axes[4][0].set_title("Category PARENTING")

count_cat21["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[4][1])
axes[4][1].set_xlim(1.0,5.0)
axes[4][1].set_ylim(0,800)
axes[4][1].legend()
axes[4][1].set_xlabel('Rating')
axes[4][1].set_title("Category PERSONALIZATION")

count_cat22["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[4][2])
axes[4][2].set_xlim(1.0,5.0)
axes[4][2].set_ylim(0,800)
axes[4][2].legend()
axes[4][2].set_xlabel('Rating')
axes[4][2].set_title("Category PHOTOGRAPHY")

count_cat23["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[4][3])
axes[4][3].set_xlim(1.0,5.0)
axes[4][3].set_ylim(0,800)
axes[4][3].legend()
axes[4][3].set_xlabel('Rating')
axes[4][3].set_title("Category PRODUCTIVITY")

count_cat24["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[4][4])
axes[4][4].set_xlim(1.0,5.0)
axes[4][4].set_ylim(0,800)
axes[4][4].legend()
axes[4][4].set_xlabel('Rating')
axes[4][4].set_title("Category SHOPPING")

count_cat25["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[5][0])
axes[5][0].set_xlim(1.0,5.0)
axes[5][0].set_ylim(0,800)
axes[5][0].legend()
axes[5][0].set_xlabel('Rating')
axes[5][0].set_title("Category SOCIAL")

count_cat26["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[5][1])
axes[5][1].set_xlim(1.0,5.0)
axes[5][1].set_ylim(0,800)
axes[5][1].legend()
axes[5][1].set_xlabel('Rating')
axes[5][1].set_title("Category SPORTS")

count_cat27["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[5][2])
axes[5][2].set_xlim(1.0,5.0)
axes[5][2].set_ylim(0,800)
axes[5][2].legend()
axes[5][2].set_xlabel('Rating')
axes[5][2].set_title("Category TOOLS")

count_cat28["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[5][3])
axes[5][3].set_xlim(1.0,5.0)
axes[5][3].set_ylim(0,800)
axes[5][3].legend()
axes[5][3].set_xlabel('Rating')
axes[5][3].set_title("Category TRAVEL_AND_LOCAL")

count_cat29["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[5][4])
axes[5][4].set_xlim(1.0,5.0)
axes[5][4].set_ylim(0,800)
axes[5][4].legend()
axes[5][4].set_xlabel('Rating')
axes[5][4].set_title("Category VIDEO_PLAYERS")

count_cat30["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[6][0])
axes[6][0].set_xlim(1.0,5.0)
axes[6][0].set_ylim(0,800)
axes[6][0].legend()
axes[6][0].set_xlabel('Rating')
axes[6][0].set_title("Category WEATHER")

count_cat31["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[6][1])
axes[6][1].set_xlim(1.0,5.0)
axes[6][1].set_ylim(0,800)
axes[6][1].legend()
axes[6][1].set_xlabel('Rating')
axes[6][1].set_title("Category HEALTH_AND_FITNESS")

count_cat32["Rating"].plot(kind="hist",color="blue",bins=10,grid=True,alpha=0.65,label="Rating",ax=axes[6][2])
axes[6][2].set_xlim(1.0,5.0)
axes[6][2].set_ylim(0,800)
axes[6][2].legend()
axes[6][2].set_xlabel('Rating')
axes[6][2].set_title("Category MAPS_AND_NAVIGATION")

axes[6][3].remove()
axes[6][4].remove()

### Install Count Category

In [None]:
fig, axes = plt.subplots(nrows=11,ncols=3,figsize=(29,90))
sns.countplot(x='Installs temp',data = count_cat0,ax=axes[0][0])
axes[0][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[0][0].set_xticklabels(axes[0][0].get_xticklabels(), rotation=30)
axes[0][0].set_ylabel('Frekuensi')
axes[0][0].set_title("Install Count ART_AND_DESIGN")
axes[0][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat1,ax=axes[0][1])
axes[0][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[0][1].set_xticklabels(axes[0][1].get_xticklabels(), rotation=30)
axes[0][1].set_ylabel('Frekuensi')
axes[0][1].set_title("Install Count AUTO_AND_VEHICLES")
axes[0][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat2,ax=axes[0][2])
axes[0][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[0][2].set_xticklabels(axes[0][2].get_xticklabels(), rotation=30)
axes[0][2].set_ylabel('Frekuensi')
axes[0][2].set_title("Install Count BEAUTY")
axes[0][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat3,ax=axes[1][0])
axes[1][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[1][0].set_xticklabels(axes[1][0].get_xticklabels(), rotation=30)
axes[1][0].set_ylabel('Frekuensi')
axes[1][0].set_title("Install Count BOOKS_AND_REFERENCE")
axes[1][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat4,ax=axes[1][1])
axes[1][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[1][1].set_xticklabels(axes[1][1].get_xticklabels(), rotation=30)
axes[1][1].set_ylabel('Frekuensi')
axes[1][1].set_title("Install Count BUSINESS")
axes[1][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat5,ax=axes[1][2])
axes[1][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[1][2].set_xticklabels(axes[1][2].get_xticklabels(), rotation=30)
axes[1][2].set_ylabel('Frekuensi')
axes[1][2].set_title("Install Count COMICS")
axes[1][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat6,ax=axes[2][0])
axes[2][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[2][0].set_xticklabels(axes[2][0].get_xticklabels(), rotation=30)
axes[2][0].set_ylabel('Frekuensi')
axes[2][0].set_title("Install Count COMMUNICATION")
axes[2][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat7,ax=axes[2][1])
axes[2][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[2][1].set_xticklabels(axes[2][1].get_xticklabels(), rotation=30)
axes[2][1].set_ylabel('Frekuensi')
axes[2][1].set_title("Install Count DATING")
axes[2][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat8,ax=axes[2][2])
axes[2][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[2][2].set_xticklabels(axes[2][2].get_xticklabels(), rotation=30)
axes[2][2].set_ylabel('Frekuensi')
axes[2][2].set_title("Install Count EDUCATION")
axes[2][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat9,ax=axes[3][0])
axes[3][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[3][0].set_xticklabels(axes[3][0].get_xticklabels(), rotation=30)
axes[3][0].set_ylabel('Frekuensi')
axes[3][0].set_title("Install Count ENTERTAINMENT")
axes[3][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat10,ax=axes[3][1])
axes[3][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[3][1].set_xticklabels(axes[3][1].get_xticklabels(), rotation=30)
axes[3][1].set_ylabel('Frekuensi')
axes[3][1].set_title("Install Count EVENTS")
axes[3][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat11,ax=axes[3][2])
axes[3][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[3][2].set_xticklabels(axes[3][2].get_xticklabels(), rotation=30)
axes[3][2].set_ylabel('Frekuensi')
axes[3][2].set_title("Install Count FAMILY")
axes[3][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat12,ax=axes[4][0])
axes[4][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[4][0].set_xticklabels(axes[4][0].get_xticklabels(), rotation=30)
axes[4][0].set_ylabel('Frekuensi')
axes[4][0].set_title("Install Count FINANCE")
axes[4][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat13,ax=axes[4][1])
axes[4][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[4][1].set_xticklabels(axes[4][1].get_xticklabels(), rotation=30)
axes[4][1].set_ylabel('Frekuensi')
axes[4][1].set_title("Install Count FOOD_AND_DRINK")
axes[4][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat14,ax=axes[4][2])
axes[4][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[4][2].set_xticklabels(axes[4][2].get_xticklabels(), rotation=30)
axes[4][2].set_ylabel('Frekuensi')
axes[4][2].set_title("Install Count GAME")
axes[4][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat15,ax=axes[5][0])
axes[5][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[5][0].set_xticklabels(axes[5][0].get_xticklabels(), rotation=30)
axes[5][0].set_ylabel('Frekuensi')
axes[5][0].set_title("Install Count HEALTH_AND_FITNESS")
axes[5][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat16,ax=axes[5][1])
axes[5][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[5][1].set_xticklabels(axes[5][1].get_xticklabels(), rotation=30)
axes[5][1].set_ylabel('Frekuensi')
axes[5][1].set_title("Install Count HOUSE_AND_HOME")
axes[5][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat17,ax=axes[5][2])
axes[5][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[5][2].set_xticklabels(axes[5][2].get_xticklabels(), rotation=30)
axes[5][2].set_ylabel('Frekuensi')
axes[5][2].set_title("Install Count LIBRARIES_AND_DEMO")
axes[5][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat18,ax=axes[6][0])
axes[6][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[6][0].set_xticklabels(axes[6][0].get_xticklabels(), rotation=30)
axes[6][0].set_ylabel('Frekuensi')
axes[6][0].set_title("Install Count LIFESTYLE")
axes[6][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat19,ax=axes[6][1])
axes[6][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[6][1].set_xticklabels(axes[6][1].get_xticklabels(), rotation=30)
axes[6][1].set_ylabel('Frekuensi')
axes[6][1].set_title("Install Count MAPS_AND_NAVIGATION")
axes[6][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat20,ax=axes[6][2])
axes[6][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[6][2].set_xticklabels(axes[6][2].get_xticklabels(), rotation=30)
axes[6][2].set_ylabel('Frekuensi')
axes[6][2].set_title("Install Count MEDICAL")
axes[6][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat21,ax=axes[7][0])
axes[7][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[7][0].set_xticklabels(axes[7][0].get_xticklabels(), rotation=30)
axes[7][0].set_ylabel('Frekuensi')
axes[7][0].set_title("Install Count NEWS_AND_MAGAZINES")
axes[7][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat22,ax=axes[7][1])
axes[7][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[7][1].set_xticklabels(axes[7][1].get_xticklabels(), rotation=30)
axes[7][1].set_ylabel('Frekuensi')
axes[7][1].set_title("Install Count PARENTING")
axes[7][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat23,ax=axes[7][2])
axes[7][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[7][2].set_xticklabels(axes[7][2].get_xticklabels(), rotation=30)
axes[7][2].set_ylabel('Frekuensi')
axes[7][2].set_title("Install Count PERSONALIZATION")
axes[7][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat24,ax=axes[8][0])
axes[8][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[8][0].set_xticklabels(axes[8][0].get_xticklabels(), rotation=30)
axes[8][0].set_ylabel('Frekuensi')
axes[8][0].set_title("Install Count PHOTOGRAPHY")
axes[8][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat25,ax=axes[8][1])
axes[8][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[8][1].set_xticklabels(axes[8][1].get_xticklabels(), rotation=30)
axes[8][1].set_ylabel('Frekuensi')
axes[8][1].set_title("Install Count PRODUCTIVITY")
axes[8][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat26,ax=axes[8][2])
axes[8][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[8][2].set_xticklabels(axes[8][2].get_xticklabels(), rotation=30)
axes[8][2].set_ylabel('Frekuensi')
axes[8][2].set_title("Install Count SHOPPING")
axes[8][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat27,ax=axes[9][0])
axes[9][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[9][0].set_xticklabels(axes[9][0].get_xticklabels(), rotation=30)
axes[9][0].set_ylabel('Frekuensi')
axes[9][0].set_title("Install Count SOCIAL")
axes[9][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat28,ax=axes[9][1])
axes[9][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[9][1].set_xticklabels(axes[9][1].get_xticklabels(), rotation=30)
axes[9][1].set_ylabel('Frekuensi')
axes[9][1].set_title("Install Count SPORTS")
axes[9][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat29,ax=axes[9][2])
axes[9][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[9][2].set_xticklabels(axes[9][2].get_xticklabels(), rotation=30)
axes[9][2].set_ylabel('Frekuensi')
axes[9][2].set_title("Install Count TOOLS")
axes[9][2].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat30,ax=axes[10][0])
axes[10][0].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[10][0].set_xticklabels(axes[10][0].get_xticklabels(), rotation=30)
axes[10][0].set_ylabel('Frekuensi')
axes[10][0].set_title("Install Count TRAVEL_AND_LOCAL")
axes[10][0].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat31,ax=axes[10][1])
axes[10][1].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[10][1].set_xticklabels(axes[10][1].get_xticklabels(), rotation=30)
axes[10][1].set_ylabel('Frekuensi')
axes[10][1].set_title("Install Count VIDEO_PLAYERS")
axes[10][1].set_ylim(0,300)

sns.countplot(x='Installs temp',data = count_cat32,ax=axes[10][2])
axes[10][2].yaxis.set_major_locator(ticker.LinearLocator(11))
axes[10][2].set_xticklabels(axes[10][2].get_xticklabels(), rotation=30)
axes[10][2].set_ylabel('Frekuensi')
axes[10][2].set_title("Install Count WEATHER")
axes[10][2].set_ylim(0,300)

### Size Vs Rating

In [None]:
# Size vs Rating
sns.set_style("darkgrid")
ax = sns.jointplot(dfGapps['Size'], dfGapps['Rating'])