# Pandas Mini Project

## Dataset Description

- App — Application Name
- Category — Category to which the application belongs
- Rating — User Rating
- Reviews — Number of User Reviews about the application
- Size — Application Size
- Installs — Number of application downloads/installs by users
- Type — Paid or Free application
- Price — Application Price
- Content Rating — Age group the application is targeted at
- Genres — Application's affiliation with multiple genres
- Last Updated — Date of the application's last update in the Play Store
- Current Ver — Current version of the application in the Play Store
- Android Ver — Minimum required Android version

## Tasks

### Task 1. 



Load data and store the first 3 lines and the last 3 lines of the dataset into variables `data_head` and `data_tail`.

In [1]:
import pandas as pd

In [2]:
playstore = pd.read_csv('05_data_mini_project.csv')

In [3]:
playstore.head()

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [4]:
data_head = playstore.head(3)
data_tail = playstore.tail(3)

### Task 2. 

Store the number of rows and the number of columns into variables `n_row` and `n_col`. 

In [5]:
n_row = playstore.shape[0]
n_col = playstore.shape[1]

### Task 3. 

Find a number of unique apps in our dataset. 

In [6]:
playstore.App.nunique()

9659

### Task 4. 

Find the number of missing data in a column `Rating`.

In [7]:
rating_missing = playstore['Rating'].isna().sum()

### Task 5. 

Make a new dataset consisted of lines 1-3, 6-8 and 16-19. 

In [8]:
first_three = playstore[['App', 'Size', 'Genres', 'Current Ver']].head(3)

In [9]:
second_part = playstore[['App', 'Size', 'Genres', 'Current Ver']].iloc[5:8]

In [10]:
third_part = playstore[['App', 'Size', 'Genres', 'Current Ver']].iloc[15:19]

In [11]:
first_three

Unnamed: 0,App,Size,Genres,Current Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,19M,Art & Design,1.0.0
1,Coloring book moana,14M,Art & Design;Pretend Play,2.0.0
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",8.7M,Art & Design,1.2.4


In [12]:
second_part

Unnamed: 0,App,Size,Genres,Current Ver
5,Paper flowers instructions,5.6M,Art & Design,1.0
6,Smoke Effect Photo Maker - Smoke Editor,19M,Art & Design,1.1
7,Infinite Painter,29M,Art & Design,6.1.61.1


In [13]:
third_part

Unnamed: 0,App,Size,Genres,Current Ver
15,Learn To Draw Kawaii Characters,2.7M,Art & Design,
16,Photo Designer - Write your name with shapes,5.5M,Art & Design,3.1
17,350 Diy Room Decor Ideas,17M,Art & Design,1.0
18,FlipaClip - Cartoon animation,39M,Art & Design,2.2.5


In [14]:
merged_data = pd.concat([first_three, second_part, third_part])

In [15]:
merged_data

Unnamed: 0,App,Size,Genres,Current Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,19M,Art & Design,1.0.0
1,Coloring book moana,14M,Art & Design;Pretend Play,2.0.0
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",8.7M,Art & Design,1.2.4
5,Paper flowers instructions,5.6M,Art & Design,1.0
6,Smoke Effect Photo Maker - Smoke Editor,19M,Art & Design,1.1
7,Infinite Painter,29M,Art & Design,6.1.61.1
15,Learn To Draw Kawaii Characters,2.7M,Art & Design,
16,Photo Designer - Write your name with shapes,5.5M,Art & Design,3.1
17,350 Diy Room Decor Ideas,17M,Art & Design,1.0
18,FlipaClip - Cartoon animation,39M,Art & Design,2.2.5


In [16]:
merged_data.to_csv('05_merged_data.csv', sep=',')

### Task 6. 

Drop duplicates from an `App` column and save the results into a new dataframe. Do not forget to make a reset index operation. 

In [17]:
unique_playstore = playstore.drop_duplicates('App').reset_index(drop=True)
unique_playstore.tail(3)

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
9656,10837,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
9657,10838,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device
9658,10839,iHoroscope - 2018 Daily Horoscope & Astrology,LIFESTYLE,4.5,398307,19M,"10,000,000+",Free,0,Everyone,Lifestyle,"July 25, 2018",Varies with device,Varies with device


### Task 7. 

Change names of the columns so they start from a lower letter and replace all spaces with an underscore sign. 

In [18]:
cols = playstore.columns.to_list()

In [19]:
cols_new = []
for col in cols:
    cols_new.append(str.lower(col).replace(' ', '_'))

In [20]:
playstore.columns = cols_new

In [21]:
playstore.head(3)

Unnamed: 0,unnamed:_0,app,category,rating,reviews,size,installs,type,price,content_rating,genres,last_updated,current_ver,android_ver
0,0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up


### Task 8. 

Find the proportion of free apps in the app store. 

In [22]:
free_apps = unique_playstore.Price.value_counts()[0]
round(free_apps / unique_playstore.shape[0], 2)

0.92

### Task 9. 

Filter only EDUCATION apps having reviews from more than 1000 users. 

In [23]:
education_playstore = playstore[(playstore['category'] == 'EDUCATION') & (playstore['reviews'] > 1000)].reset_index(drop=True)
education_playstore.tail(3)

Unnamed: 0,unnamed:_0,app,category,rating,reviews,size,installs,type,price,content_rating,genres,last_updated,current_ver,android_ver
132,850,Blinkist - Nonfiction Books,EDUCATION,4.1,16103,13M,"1,000,000+",Free,0,Everyone,Education,"July 31, 2018",5.7.1,4.1 and up
133,853,Toca Life: City,EDUCATION,4.7,31085,24M,"500,000+",Paid,$3.99,Everyone,Education;Pretend Play,"July 6, 2018",1.5-play,4.4 and up
134,854,Toca Life: Hospital,EDUCATION,4.7,3528,24M,"100,000+",Paid,$3.99,Everyone,Education;Pretend Play,"June 12, 2018",1.1.1-play,4.4 and up


### Task 10. 

Change `price` column data type from object to float. 

In [24]:
playstore['price'] = playstore.price.str.replace('$', '')

  playstore['price'] = playstore.price.str.replace('$', '')


In [25]:
playstore['price'] = playstore['price'].astype('float')

In [26]:
playstore.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10840 entries, 0 to 10839
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   unnamed:_0      10840 non-null  int64  
 1   app             10840 non-null  object 
 2   category        10840 non-null  object 
 3   rating          9366 non-null   float64
 4   reviews         10840 non-null  int64  
 5   size            10840 non-null  object 
 6   installs        10840 non-null  object 
 7   type            10839 non-null  object 
 8   price           10840 non-null  float64
 9   content_rating  10840 non-null  object 
 10  genres          10840 non-null  object 
 11  last_updated    10840 non-null  object 
 12  current_ver     10832 non-null  object 
 13  android_ver     10838 non-null  object 
dtypes: float64(2), int64(2), object(10)
memory usage: 1.2+ MB


### Task 11. 

Create a pivot table and reveal the mean price, rating and reviews by category and type (Free or Paid).

In [27]:
pivot_table = playstore.drop_duplicates('app').pivot_table(
    index=['category', 'type'], 
    values=['price', 'rating', 'reviews'], 
    aggfunc={'price': 'mean', 'rating': 'mean', 'reviews': 'mean'}
)

In [28]:
pivot_table = pivot_table.round({'price': 2, 'rating': 1, 'reviews': 2})

In [29]:
pivot_table.columns = ['mean_price', 'mean_rating', 'mean_reviews']

In [30]:
pivot_table.to_csv('05_pivoted_table.csv', sep=',')

In [31]:
pivot_table

Unnamed: 0_level_0,Unnamed: 1_level_0,mean_price,mean_rating,mean_reviews
category,type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ART_AND_DESIGN,Free,0.00,4.3,23230.11
ART_AND_DESIGN,Paid,1.99,4.7,722.00
AUTO_AND_VEHICLES,Free,0.00,4.2,14140.28
AUTO_AND_VEHICLES,Paid,4.49,4.6,1387.67
BEAUTY,Free,0.00,4.3,7476.23
...,...,...,...,...
TRAVEL_AND_LOCAL,Paid,4.16,4.1,1506.08
VIDEO_PLAYERS,Free,0.00,4.0,424347.18
VIDEO_PLAYERS,Paid,2.62,4.1,3341.75
WEATHER,Free,0.00,4.2,171249.62
