# Google Play Store Datset

In [None]:
import pandas as pd
df = pd.read_csv("googleplaystore.csv")
df

## Chapter 1
Data Exploration using Google Play Store 10K records.

We will learn Pandas, and use it for Data Exploration.

In [None]:
df.info()

In [None]:
df.head()

In [None]:
df.head(n=15)

In [None]:
df.tail()

In [None]:
df['App'] # this line of code remind me of python dictionary

In [None]:
df['Installs'] # the output remind me of python list

In [None]:
df['Category'].describe() # descriptive statistics (for numeric columns)

In [None]:
df['Rating'].value_counts() # categorical values

In [None]:
df["Installs"].unique()

In [None]:
df["Installs"].value_counts()

### Pandas Plots

In [None]:
# magic function that renders the figure in a notebook 
# (instead of displaying a dump of the figure object).
%matplotlib inline

In [None]:
df["Category"].value_counts()

In [None]:
df["Category"].value_counts().plot(kind='bar', figsize=(10, 10))

In [None]:
# checking Rating
df["Rating"].plot(kind='box', figsize=(10, 5))

### Select some rows by...

In [None]:
df.iloc[10001:10010] # row data with list slicing

In [None]:
# how many rows, how many columns
df.shape

In [None]:
df[df.Rating >= 19.0] # filtering

In [None]:
df[df.Installs == '1,000,000,000+']

## Chapter 2
We will learn how to...

* remove duplicates
* reduce columns
* filtering by remove some rows that matched some rules
* remove NaN (what is NaN?)
* sorting

Derive your conclusion for App Profiling.

In [None]:
df[df.App == 'ROBLOX'] # filtering

### Remove Duplicates

In [None]:
# to remove duplicates
columns_to_be_selected = ['App', 'Category', 'Genres', 'Rating', 'Reviews', 'Price']
cleaned = df[columns_to_be_selected]

from pandas import DataFrame
cleaned = DataFrame.drop_duplicates(cleaned)
len(cleaned)

10356

In [None]:
cleaned[cleaned.App == 'ROBLOX'] # filtering

### Sorting Columns

In [None]:
# sorting
family = [12, 3, 4, 6, 7, 9]
print(sorted(family))  # from smallest to larger value (ASCENDING)
for item in reversed(sorted(family)):  # from largest value to smaller (DESCENDING)
    print(item)

In [None]:
cleaned.info()

In [None]:
cleaned.sort_values(by='Rating', ascending=False)

In [None]:
cleaned = cleaned.sort_values(by=['Reviews', 'Rating'], ascending=False)
cleaned.head(n=20)

In [None]:
# removed rows with free App
cleaned = cleaned[cleaned.Price != '0']
cleaned

In [None]:
# removed rows with NaN for Rating
cleaned = cleaned[pd.isnull(cleaned.Rating) == False]
cleaned

In [None]:
# select Top 20
cleaned.head(n=20)

### Data Error

If Rating = 19.0

In [None]:
df.iloc[10472]

In [None]:
#df = df.drop(df.index[10472]) # savi data error

### Interesting App Names

* Find letter count in each App Name
* Find word count in each App Name
* Add NEW columns to Pandas DataFrame

In [None]:
# app title
# step1: letter count in app title
# step2: word count in app title (string.split() method)

letter_count = []
word_count = []

for item in df['App']:
    #print(item)
    #print('letter count', len(item)) # including whitespace
    counter = 0
    for letter in item:
        if letter == ' ':
            pass
        else:
            counter = counter + 1
    
    #words = 0
    words = item.split()
    #print(item, ', with letter count', counter, 'with word count=', len(words))
    letter_count.append(counter)
    word_count.append(len(words))
    #break

In [None]:
print(len(letter_count))

10841


### Add NEW Columns to DataFrame

In [None]:
df['App_letterCnt'] = letter_count
df['App_wordCnt'] = word_count

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  int64  
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10841 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10839 non-null  object 
 13  App_letterCnt   10841 non-null  int64  
 14  App_wordCnt     10841 non-null  int64  
dtypes: float64(1), int64(3), object(11)
memory usage: 1.2+ MB


In [None]:
df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,App_letterCnt,App_wordCnt
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,38,9
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,17,3
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,41,10
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,17,5
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,31,7


## Chapter 3

* output to csv file for top 20 apps