<a href="https://colab.research.google.com/github/shakirsayeed/PlayStore_DataAnalysis/blob/main/EDA_Project_Work_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Member  -** Syed Shakir Sayeed


# **Project Summary -**

The "Play Store App Review Analysis using Exploratory Data Analysis - EDA" project focuses on extracting valuable insights and patterns from app reviews available on the Google Play Store. The project aims to uncover trends, sentiments, and user feedback related to various mobile applications. By employing exploratory data analysis techniques, the project seeks to provide a comprehensive understanding of user sentiments, popular features, and potential areas for improvement for app developers.
Steps Involved in developing the Project:
1. Data Collection
2. Data Cleaning and Preprocessing
3. Implementing Exploratory Data Analysis
4. Data Visualization

# **GitHub Link -**

https://github.com/shakirsayeed/PlayStore_DataAnalysis.git

# **Problem Statement**


1. What are the top categories on Play Store?
2. Are majority of the apps Paid or Free?
3. How importance is the rating of the application?
4. Which categories from the audience should the app be based on?
5. Which category has the most no. of installations?
6. How does the count of apps varies by Genres?
7. How does the last update has an effect on the rating?
8. How are ratings affected when the app is a paid one?
9. How are reviews and ratings co-related?
10. What is the percentage of review sentiments?
11. Does Last Update date has an effects on rating?
12. Distribution of Paid and Free app updated over the Month.


#### **Define Your Business Objective?**

## <b><i> The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market. </b>

## <b> Each app  has values for category, rating, size, and more. Another dataset contains customer reviews of the android apps.</b>

## <b> Explore and analyze the data to discover key factors responsible for app engagement and success. </i></b>

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd # used for Data Processing
import numpy as np # used to access builtin Numerical Methods and Functions
import matplotlib.pyplot as plt
import seaborn as sns # Used for Data Visualization
import warnings
from datetime import datetime

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')


In [None]:
psd_path="/content/drive/MyDrive/DS_Notes/EDA_Project/Dataset/Play_Store_Data.csv"
urd_path="/content/drive/MyDrive/DS_Notes/EDA_Project/Dataset/User_Reviews.csv"
psd_df=pd.read_csv(psd_path)
urd_df=pd.read_csv(urd_path)


### Dataset First View

In [None]:
# Dataset First Look of PlayStore Data
dataview_playstore= pd.concat([psd_df.head(),psd_df.tail()])
dataview_playstore

In [None]:
# Dataset First Look of UserReview Data
dataview_user_review= pd.concat([urd_df.head(),urd_df.tail()])
dataview_user_review

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count of PlayStore Data
print(psd_df.columns)
rows=psd_df.shape[0]
cols= psd_df.shape[1]
print(f"The number of rows are {rows} and columns are {cols}")

In [None]:
# Dataset Rows & Columns count of
print(urd_df.columns)
rows=urd_df.shape[0]
cols= urd_df.shape[1]
print(f"The number of rows are {rows} and columns are {cols}")

### Dataset Information

In [None]:
# Dataset Info PlayStore Data
print("=*=*=*=*=*=*=*=*=*Data information=*=*=*=*=*=*=*=*=*=*=*=*=")
psd_df.info()
print("=*=*=*=*=*=*=*=*=*Data Describe=*=*=*=*=*=*=*=*=*=*=*=*=")
psd_df.describe(include='all')

In [None]:
# Dataset Info UserReview Data
print("=*=*=*=*=*=*=*=*=*Data information=*=*=*=*=*=*=*=*=*=*=*=*=")
urd_df.info()
print("=*=*=*=*=*=*=*=*=*Data Describe=*=*=*=*=*=*=*=*=*=*=*=*=")
urd_df.describe()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count PlayStore Data
def Playstore_data():
  temp=pd.DataFrame(index=psd_df.columns)
  temp['Datatypes']=psd_df.dtypes
  temp['not null Values']=psd_df.count()
  temp['null Values']=psd_df.isnull().sum()
  temp['% ratio of Null Values']=psd_df.isnull().mean()
  temp['Unique Values']=psd_df.nunique()
  temp["Duplicate Values"]=psd_df.duplicated().sum()
  return temp
Playstore_data()

In [None]:
# Dataset Duplicate Value Count UserReview Data
def Playstore_data():
  temp=pd.DataFrame(index=urd_df.columns)
  temp['Datatypes']=urd_df.dtypes
  temp['not null Values']=urd_df.count()
  temp['null Values']=urd_df.isnull().sum()
  temp['% ratio of Null Values']=urd_df.isnull().mean()
  temp['Unique Values']=urd_df.nunique()
  temp["Duplicate Values"]=urd_df.duplicated().count()
  return temp
Playstore_data()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count PlayStore Data
print(psd_df.isnull().sum())

In [None]:
# Missing Values/Null Values Count UserReview Data
print(urd_df.isnull().sum())

In [None]:
from matplotlib.figure import cbar
# Visualizing the missing values PlayStore
sns.heatmap(psd_df.isnull(),cbar=False)

In [None]:
# Visualizing the missing values UserReview
sns.heatmap(urd_df.isnull(),cbar=False)

### What did you know about your dataset?

In the bove dataset we have seen that


1.   Rating column is having 1474 missing values
2.   Type is having 1 missing values
3.   Content Rtingis having 1 missing values
4.   Current Ver is having 8 missing values
5.   Android Ver is having 3 missing values

So, here  in these rows of datasets we have missing values in these columns, in order to analyze the dataset, it is important to handle the missing values.



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns of PlayStore Data
psd_df.columns


 The 13 columns in the playstore dataset are identified as below:
1. **App** - It tells us about the name of the application with a short description.
2. **Category** - It gives the category to the application.
3. **Rating** - It contains the average rating of the respective app received from its users.
4. **Reviews** - It tells us about the total number of users who have given a review for the application.
5. **Size** - It tells us about the size being occupied by the application on the mobile phone.
6. **Installs** - It tells us about the total number of downloads for an application.
7. **Type** - IIt states whether an app is free to use or is it a paid.
8. **Price** - It gives the price payable to install the app. For free type apps, the price is zero.
9. **Content Rating** - It states whether or not an app is suitable for all age groups or not.
10. **Genres** - It tells us about the various other categories to which an application belongs to.
11. **Last Updated** - It tells us about the when the application was updated.
12. **Current Ver** - It tells us about the current version of the android application.
13.**Android Ver** - It tells us about the android version which support the application on its platform.

In [None]:
# Dataset Describe
psd_df.describe()
# psd_df.describe(include='all')# It is used to display  statistical information of all the columns

### Variables Description

Here, it will show the short description of Statistical Information


*   **Count:** The number of non null values are 9367
*   **Mean:**The mean value of column is 4.193338
*   **Standard Deviation:** Th std of the column is 0.537431
*   **Minimum Value:** The min value of column is 1.0000000
*   **25% value:** The 25th percentile of the column is 4.0000000
*   **50% value:**The 50th percentile of the column is 4.3000000
*   **75% value:**The 75th percentile of the column is 4.5000000
*   **Maximum Value:**The max value of column is 19.0000000


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in psd_df.columns.to_list():
  print("Unique values in" ,i, "is",psd_df[i].nunique())

In above unique values are more in App column which is around 9660

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***