
# Google App Store EDA

## Assignmenet:
1. Download the dataset
2. Extract data from  zip file
3. read the data into python pandas
4. chcek each column
5. Define each column in MD Style



# 📊 Google Play Store Dataset - Complete Exploratory Data Analysis (EDA)

This notebook provides a full walkthrough of Exploratory Data Analysis on the **Google Play Store dataset** using the `ydata-profiling` package. The goal is to understand the dataset's structure, quality, and patterns through automated profiling.

---

## 📘 Step 1: Install Required Libraries

We will begin by installing the necessary Python libraries using pip.

- `pandas` — used for data manipulation
- `ydata-profiling` — used to generate the automated data profiling report



In [1]:
import pandas as pd
import ydata_profiling as yd

  from .autonotebook import tqdm as notebook_tqdm


## 📘 Step 2: Load the Dataset

We will load the dataset from a CSV file using `pandas.read_csv()`. It is important to use the correct path and file name. On Windows, make sure to use double backslashes `\\` in the file path.



In [2]:
df = pd.read_csv('F:\EDA PLAY STORE\googleplaystore.csv')   

## 📘 Step 3: View Basic Dataset Information

To get an overview of the data, we will:
- View the first few rows using `.head()`
- Check the data types and null values with `.info()`
- Review summary statistics with `.describe()`

This step helps us understand the general structure of the dataset.

In [11]:
df.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
2072,Thomas & Friends: Race On!,FAMILY,4.1,29319,26M,"5,000,000+",Free,0,Everyone,Casual;Action & Adventure,"May 18, 2018",2.4,4.4 and up
6656,Launcher Oreo 8.1,PERSONALIZATION,4.5,13466,3.4M,"500,000+",Free,0,Everyone,Personalization,"June 19, 2018",1.9,5.0 and up
7300,CG Districts,SOCIAL,3.8,14,Varies with device,"1,000+",Free,0,Everyone,Social,"May 3, 2018",Varies with device,Varies with device
4434,Drink-O-Tron The Drinking Game,GAME,4.1,140,45M,"50,000+",Free,0,Mature 17+,Card,"May 31, 2017",1.64,4.0.3 and up
802,Babbel – Learn Spanish,EDUCATION,4.4,54798,11M,"1,000,000+",Free,0,Everyone,Education,"July 30, 2018",20.7.2,4.4 and up
5811,Axe Man,GAME,3.7,53,14M,"1,000+",Free,0,Everyone,Adventure,"February 23, 2015",3.0,2.3.3 and up
10443,Signal Info,TOOLS,4.6,424,3.5M,"10,000+",Free,0,Everyone,Tools,"December 19, 2017",0.11,6.0 and up
8478,DK Eyewitness Audio Walks,TRAVEL_AND_LOCAL,2.3,9,70M,"1,000+",Free,0,Everyone,Travel & Local,"April 6, 2018",1.0.1.8,4.2 and up
4986,Alchemy Classic Ad Free,FAMILY,4.6,20178,9.0M,"100,000+",Free,0,Everyone,Puzzle,"May 26, 2014",1.7.3,1.6 and up
4242,"Fame Boom for Real Followers, Likes",SOCIAL,4.7,896118,6.6M,"5,000,000+",Free,0,Everyone,Social,"October 9, 2017",1.3.0,4.1 and up


In [12]:
df.head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [13]:
df.info

<bound method DataFrame.info of                                                      App             Category  \
0         Photo Editor & Candy Camera & Grid & ScrapBook       ART_AND_DESIGN   
1                                    Coloring book moana       ART_AND_DESIGN   
2      U Launcher Lite – FREE Live Cool Themes, Hide ...       ART_AND_DESIGN   
3                                  Sketch - Draw & Paint       ART_AND_DESIGN   
4                  Pixel Draw - Number Art Coloring Book       ART_AND_DESIGN   
...                                                  ...                  ...   
10836                                   Sya9a Maroc - FR               FAMILY   
10837                   Fr. Mike Schmitz Audio Teachings               FAMILY   
10838                             Parkinson Exercices FR              MEDICAL   
10839                      The SCP Foundation DB fr nn5n  BOOKS_AND_REFERENCE   
10840      iHoroscope - 2018 Daily Horoscope & Astrology            LIFESTYLE

In [14]:
df.describe()

Unnamed: 0,Rating
count,9367.0
mean,4.193338
std,0.537431
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,19.0


# 📘 Step 4: Generate the Profiling Report

We will use `ProfileReport()` from `ydata_profiling` to automatically generate an EDA report. This report will include:

- Column summaries
- Distribution plots
- Correlation analysis
- Missing value detection
- Data type insights
- Alerts and warnings

---


We will export the generated report to an HTML file. This allows you to:
- Share the report
- Open and view it in any browser
- Save it for future reference

The file will be saved with the name `google_play_store_profile_report.html` in your current working directory.


In [8]:
profile = yd.ProfileReport(df, title="Google Play Store Data Profiling Report", explorative=True)
profile.to_file("google_play_store_profile_report.html")

Summarize dataset: 100%|██████████| 24/24 [00:03<00:00,  7.57it/s, Completed]                      
Generate report structure: 100%|██████████| 1/1 [00:09<00:00,  9.89s/it]
Render HTML: 100%|██████████| 1/1 [00:01<00:00,  1.19s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 50.00it/s]


In [15]:
df['Size'].value_counts()

Varies with device    1695
11M                    198
12M                    196
14M                    194
13M                    191
                      ... 
902k                     1
44k                      1
862k                     1
41k                      1
190k                     1
Name: Size, Length: 462, dtype: int64

In [16]:
df['Size'].isnull().sum()

0

In [17]:
df['Installs'].value_counts()

1,000,000+        1579
10,000,000+       1252
100,000+          1169
10,000+           1054
1,000+             907
5,000,000+         752
100+               719
500,000+           539
50,000+            479
5,000+             477
100,000,000+       409
10+                386
500+               330
50,000,000+        289
50+                205
5+                  82
500,000,000+        72
1+                  67
1,000,000,000+      58
0+                  14
Free                 1
0                    1
Name: Installs, dtype: int64

In [22]:
df['Price'].value_counts()

0          10040
$0.99        148
$2.99        129
$1.99         73
$4.99         72
           ...  
$299.99        1
$1.50          1
$2.95          1
$15.46         1
$74.99         1
Name: Price, Length: 93, dtype: int64

# Without ydata Profiling