<a href="https://colab.research.google.com/github/sakshantG/Play-Store-App-Review-Analysis/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Play Store Analysis



##### **Project Type**    - Analysis of Play Store
##### **Contribution**    - Team
##### **Team Member 1 -** Sakshant Gongal
##### **Team Member 2 -** Dhawal Khandait


# **Project Summary -**

Play Store, also branded as the Google Play Store and formerly Android Market, is a digital distribution service operated and developed by Google. Applications are available through Play Store either free of charge or at a cost. They can be downloaded directly on an Android device through the proprietary Play Store mobile app.

We are provided with the two datasets one containing the information about apps and the other consist of the user reviews and their sentiments about the apps. Our goal is to analyze the dataset and visualize the trends and relations between app features. There are many questions an app developer could come across while developing an app and our study will help in answering those questions. Our analysis is divided into three phases; understanding data, data preparation and data visualization.

This will involve various steps:

*  **Loading the dataset as dataframe**
*  **Cleaning and preparing the data**
*  **Extracting essential data from the dataset**
*  **Exploratory analysis and visualizations**
*  **Conclusion** 

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

### Dataset Loading

In [None]:
# Load Dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# loading the User Review dataset as pandas dataframe
User_reviews_df = pd.read_csv("/content/drive/MyDrive/Play Store Analysis/User Reviews.csv")

In [None]:
# loading the Play Store app dataset as pandas dataframe
Play_store_df = pd.read_csv("/content/drive/MyDrive/Play Store Analysis/Play Store Data.csv")

### Dataset First View

In [None]:
# Dataset First Look
# Fetching the First 5 rows
Play_store_df.head()

In [None]:
# Fetching the last 5 rows
Play_store_df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
Play_store_df.shape

### Dataset Information

In [None]:
# Dataset Info
Play_store_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
Play_store_df[Play_store_df.duplicated()].count()

## **Insight of the data set**

Dataset has 13 columns which are the parameters of the apps. Let's look at each column -

* **App** - name of the app
* **Category** - category of the app
* **Rating** - app's rating by the   users out of 5
* **Reviews** - number of the app's reviews
* **Size** - size of the app
* **Installs** - number of installs of the app
* **Type** - whether the app is free or paid
* **Price** - price of the app in 
* **Content Rating** - target audience of the app
* **Genres** - genre of the app
* **Last Updated** - date the app updated last time
* **Current Ver** - current version of the app
* **Android Ver** - android version required to run the app

After getting familliar with the dataset we should prepare our data and look  for the missing and duplicate values. Detecting and treating/removing such values helps in efficient analysis, limits errors and inaccuracies.

## **Data Preparation**
Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. It is the most important step. Good data preparation allows for efficient analysis, limit errors and inaccuracies that can occur to data during processing.

# **Handling Missing data**

Cleaning up the data is traditionally the most time consuming part of the data preparation process, but it’s crucial for removing faulty data and filling in gaps. Missing values are caused by incomplete data. It is important to handle missing values effectively, as they can lead to inaccurate inferences and conclusions.

Starting with checking for duplicates and missing values.

**Duplicate Values**

In [None]:
# checking for duplicates
Play_store_df['App'].duplicated().any()

In [None]:
# droping the duplicates
Play_store_df.drop_duplicates("App", inplace =True)

In [None]:
# Shape of dataset after droping duplicates
Play_store_df.shape

In [None]:
# Checking the columns which have missing values
missing_value_of_each_column= Play_store_df.columns[Play_store_df.isna().any()]
Play_store_df[missing_value_of_each_column].isnull().sum()

### What did you know about your dataset?

Looking at number of missing values above, we notice that the Rating column has a lot of missing values in it. The other columns do have missing values but they are less than 10.

Let us analyze features one by one so we can figure out why the data is missing. This is the point at which we get into the part of data science. It can be a frustrating part of data science, especially if you're newer to the field and don't have a lot of experience. For dealing with missing values, we'll need to use our intution to figure out why the value is missing.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
Play_store_df.columns

In [None]:
# Dataset Describe
Play_store_df.describe()

### Variables Description 

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
Play_store_df['App'].unique()

#### Lets Start with **Type** column

In [None]:
# looking at the missing Values for each variable. 
Play_store_df[Play_store_df['Type'].isna()]

There is only one missing value. These value is probably missing because it is not recorded, rather than because it doesn't exist. So, it would make sense for us to try and guess what it would be rather than just leaving it as NaN.

After cross-checking in the dataset the app's price is 0 it means the missing value should be 'Free', so filling the missing value.

In [None]:
# filling the missing data in type column.
Play_store_df['Type'].fillna('Free', inplace= True)

In [None]:
# looking at missing value in column Content Rating
Play_store_df[Play_store_df['Content Rating'].isna()]

Looking at missing value we cannot conclude why this data is missing. It seems that values in the row are recorded in wrong column i.e starting from Category column the value of next column is recorded in previous column.It means value in Category column is missing and we can not figure out it's value so better idea will be to drop the row.

In [None]:
# droping the missing value row
Play_store_df.dropna(subset= ['Content Rating'], inplace= True)

Sometimes there would be many columns that we never use in such cases dropping is the only solution. In this case, the columns **Android Ver** and **Current Ver** doesn't make any sense to us so we just dropped for this instance.

In [None]:
# droping unnecessary columns
Play_store_df.drop(['Android Ver', 'Current Ver'], axis = 1, inplace = True)

Now looking at the missing values of Rating column. There are 1463 values which are missing. This ratings are given by the users, so we cannot predict anything to fill.

Let's see the distribution of the ratings.

In [None]:
# plot distribution of rating 
sns.distplot(Play_store_df['Rating'])

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.


### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***