<a href="https://colab.research.google.com/github/pravin2072/Play_store_app/blob/main/18_4_23Play_store_App_review_EDA_Submission_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Play Store App Data Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Team (D_a_t_a{Scientist})
##### **Team Member 1 -**Pravin Wedpathak
##### **Team Member 2 -**Shivprasad Sawant
##### **Team Member 3 -**Anmol Paswal
##### **Team Member 4 -**Shubham Gotphode



# **Project Summary -**

Play Store is a popular digital distribution platform for Android operating system, offering millions of applications to users worldwide. App data analysis of the Play Store involves examining the app's performance based on several factors such as user reviews, downloads, ratings, and more.

One of the essential components of Play Store app data analysis is understanding user behavior patterns. This includes examining the popularity of specific app categories, the frequency of app downloads. Analyzing user feedback and reviews is also crucial in assessing an app's performance.

The first step in the project will be to clean and preprocess the data. This will involve removing any missing or incorrect values, converting data types, and dealing with outliers. Once the data has been cleaned, we can move on to the exploratory data analysis phase.

During the EDA, we will explore the relationships between different variables in the dataset, such as the relationship between app category and rating, or between price and number of downloads. We will also visualize the data using graphs and charts to help us better understand the trends and patterns in the data.

Overall, this project will provide valuable insights into the trends and patterns of apps on the Google Play Store. These insights can be used by app developers to inform their decisions around app development and marketing.

# **GitHub Link -**

https://github.com/pravin2072/Play_store_app

# **Problem Statement**


**Business Problem Overview**

PlayStore App is a source of downloading and updating applications for Android System, allmost all Android users use it. An Android Application developer or a Company wants to make an application and make it a hit app on playstore but the developer dosent know on what parameters should he/she should make that app on.
 
In this project we will analyse on what parameters should an App developer or an App developer company develop apps so that they would be succesful to capture the Android Market.

#### **Define Your Business Objective?**

Find insightful factors which would make apps successful on Playstore.

# ***Let's Begin !***

## ***1. Knowing Data***

### Importing Libraries

In [None]:
# Import Libraries
import numpy as np # importing numpy library
import pandas as pd # importing pandas library
import matplotlib.pyplot as plt # importing matplot library for visualisations
%matplotlib inline
import seaborn as sns #importing seaborn library for visualisations
import plotly.express as px #importing plotly library for visualisations
from datetime import datetime
from matplotlib.ticker import FormatStrFormatter # for ticks formatting

### Dataset Loading

In [None]:
from google.colab import drive #mounted drive 
drive.mount('/content/drive')

In [None]:
df=pd.read_csv('/content/drive/MyDrive/Play Store Data (1).csv')
review_df=pd.read_csv('/content/drive/MyDrive/User Reviews (1).csv')

### Dataset First View

In [None]:
df.head() #reading dataframe of playstore app data

In [None]:
review_df.head() #reading user review dataframe

### Dataset Rows & Columns count

In [None]:
# df Rows & Columns count
df.shape 

In [None]:
#reviews_df Rows & Columns count
review_df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

In [None]:
review_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

In [None]:
# Dataset Duplicate Value Count
len(review_df[review_df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count in df
df.isnull().sum()

In [None]:
# Missing Values/Null Values Count in review df
review_df.isnull().sum()

In [None]:
# Visualizing the missing values of df
null_visualisation=df.isnull().sum()
null_visualisation.plot(kind='bar')

In [None]:
# Visualizing the missing values of review_df
na_visualisation=review_df.isnull().sum()
na_visualisation.plot(kind='bar')

### What did you know about your dataset?

The dataset that has been provided is of various apps existing on PlayStore and we have to obtain insights from this data ,which will answer our question of how to make our apps a hit success on Playstore?.

There are two datasets one containing app details and another contains reviews from people.As of now lets consider the former dataset.It contains 10841 rows and 13 columns.Out of which there are 483 duplicate rows,columns like rating, type,content rating,current version, android version have null values.Apart from Rating column all other columns have data type object.

The second dataset containing reviews has 64295 rows and 5 columns,it has 33616 rows which are duplicate and 26868 rows which have null values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns of df
df.columns

In [None]:
# Dataset Columns of review_df
review_df.columns

In [None]:
# Dataset Describe of df
df.describe() # only rating column is in numerical format so describe shows only results of rating column

In [None]:
# Dataset Describe of review_df
review_df.describe()

### Variables Description 

* **App:** Name of application
* **Category:**Category to which application belongs wiz games,weather etc
* **Rating:** User provided average rating out of five
* **Reviews:** Number of reviews given by app users
* **Size:** Memory size of the app in megabytes and kilobytes
* **Installs:**Number of times the app has been downloaded and installed .
* **Type:**Type of app free or paid
* **Price:** Price of the paid app in dollars
* **Content Rating:**describes the maturity level of content in apps
* **Genres:**Name of genre to which the app belong
* **Last Updated:**Date of the latest updated version of app
* **Current Ver:**Version number of the app according to the developer
* **Android Ver:**version of android on which the app will work


* **App:**Name of App in review dataset
* **Translated_Review:**exact review of the user translated in english 
* **Sentiment:**classfying the sentiment positive or negative
* **Sentiment_Polarity:**Sentiment grading from 0 to 1, 0=totally negative review and 1=totally positive review.
* **Sentiment_Subjectivity:**Sentiment grading of review according to its relativity from the app content

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df['App'].unique()

In [None]:
df['Category'].unique()

In [None]:
df['Rating'].unique()

In [None]:
df['Reviews'].unique()

In [None]:
df['Size'].unique()

In [None]:
df['Installs'].unique()

In [None]:
df['Type'].unique()

In [None]:
df['Price'].unique()

In [None]:
df['Content Rating'].unique()

In [None]:
df['Genres'].unique()

In [None]:
df['Last Updated'].unique()

In [None]:
df['Current Ver'].unique()

In [None]:
df['Android Ver'].unique()

In [None]:
review_df['App'].unique()

In [None]:
review_df['Translated_Review'].unique()

In [None]:
review_df['Sentiment'].unique()

In [None]:
review_df['Sentiment_Polarity'].unique()

In [None]:
review_df['Sentiment_Subjectivity'].unique()

## ***3.Data Wrangling***

### Data Wrangling Code

####making copy of dataframe

In [None]:
# Write your code to make your dataset analysis ready.
df1=df.copy() #making a copy of dataframe so that initial dataframe remains intact

####dropping duplicates

In [None]:
df1.drop_duplicates(subset='App',keep='first',inplace=True,ignore_index=True)

####changing data type from object to numeric data type

In [None]:
df1.iloc[9300,:] #here we can see that the row has correct data but has been fed in wrong columns

In [None]:
#row 9300 has data in which App name is correct and other data needs to be shifted to right
for i in range(12,1,-1):
  df1.iloc[9300,i]=df1.iloc[9300,i-1]
#value of category in 9300 is 19.0 which cannot be true, from internet info it show that app belongs to category lifestyle 
df1.iloc[9300,1]='LIFESTYLE'

In [None]:
df1['Rating']=pd.to_numeric(df1['Rating'])    # rating column has been converted into numeric data

In [None]:
df1['Reviews']=pd.to_numeric(df1['Reviews'])    # reviews column has been converted into numeric data

In [None]:
#to convert size column into numeric we have to remove M(Megabyte) and K(Kilobyte) notation from each row
#1 Megabyte=1024Kb or 1024K ,so lets consider Megabyte as a unit for the column 'Size'
#As 'Size' column has no null values, values are in form of 'M','K' and 'varies with device'.
import re
for index,rows in df1.iterrows():   #itetrating through dataframe so as to just get index numbers for all rows
  if df1.loc[index,'Size'][-1]=='M':  #this loop is for megabyte
    a=re.sub('\D', '',df1.loc[index,'Size'])  #extracting out 'M' from the data so that we just get the numbers in form of string
    b=int(a)    #converting numbers from string into integer 
    df1.loc[index,'Size']=b #storing this value back into the column
  elif df1.loc[index,'Size'][-1]=='K':    #this loop is for kilobyte
    c=re.sub('\D', '',df1.loc[index,'Size'])    #extracting out 'K' from the data so that we just get the numbers in form of string
    d=int(c)/1024    #converting numbers from string into integer and then dividing by 1024 to convert into Megabyte
    df1.loc[index,'Size']=d    #storing this value back into the column
  else:    #this loop is for 'varing with device'
    df1.loc[index,'Size']=np.nan   #storing nan values in place of 'varies with device' so that they can be replaced by median later

In [None]:
#Defining a function to convert object data type into numerical data type
def converting_object_to_numerical(row):
  a=re.sub('\D','',row)
  b=int(a)
  return b

In [None]:
df1.loc[:,'Installs']=df1.loc[:,'Installs'].apply(converting_object_to_numerical)   #converted Installs into numerical data type

In [None]:
df1.loc[:,'Price']=df1.loc[:,'Price'].apply(converting_object_to_numerical)  #converted Price into numerical data type

In [None]:
df1['Last Updated']=pd.to_datetime(df1['Last Updated'])    #converted last updated into datetime object

#### replacing nan values

In [None]:
df1=df1.fillna({'Rating':df1['Rating'].median(),'Size':df1['Size'].median()})    # replacing nan values of rating and size column by the median of respective column

In [None]:
df1.iloc[8028,:]# here we can see that type has Nan value but Price says type is 'Free'

In [None]:
#'type' column has one nan value at row index=8028 ,the price is zero which means Type='Free'
df1.loc[8028,'Type']='Free'

In [None]:
# 'current version' and 'android version' column has numerical data and nan values but we arent going to use them in any data visualisations
# type of app paid or unpaid is understood from 'price' so no need to use column 'Type'.
# Genres column has one nan value at row index=9300.

#### Data manipulation

In [None]:
df1.info()

In [None]:
#1)Whats the Category of Apps having more than one billion installs?
#there are 20 apps which have more than 1 billion installs
df2=df1.copy().loc[df1.copy()['Installs']==1000000000,'Category']   #creating a copy so as to avoid copy warning  

In [None]:
#2)Whats the Category of Apps having Installs between 50 Million to 1 Billion?
df3=df1.copy().loc[df1.copy()['Installs']==500000000,'Category']

In [None]:
#3)Whats the Genre of Apps having more than 1 billion installs?
df4=df1.copy().loc[df1['Installs']==1000000000,'Genres']

In [None]:
#4)which are 10 most reviewed apps?
df5=df1.copy().loc[:,['App','Reviews']]

In [None]:
#5)Analyzing the Numerical Data columns and deriving correlation between them via heatmap
# df1 is used directly

In [None]:
#6)Correlation between Ratings and Reviews ?
df6=df1.copy().loc[:,['Rating','Reviews']]

In [None]:
#zoomed plot between 1 star to 3.5 star ratings and 20k reviews
df7=df1.copy().loc[df1.copy()['Rating']<3.5,['Rating','Reviews']]

In [None]:
#7)what should be name of the App?
df8=df1.copy().loc[:,['App','Installs']]
df8['Character_length']=df8['App'].apply(lambda x:len(x))

In [None]:
#8)distrubution of categories in terms of installs
#there are total 33 categories.
df9=df1.copy().loc[:,['Category','Installs']]

In [None]:
#9)what should be the size of app based on popularity?
df10=df1.copy().loc[:,['Size','Installs']]

In [None]:
#10)Should the App be free or paid?
df11=df1.copy().loc[:,['Type','Installs']]

In [None]:
#11)What should be content rating of the App
df12=df1.copy().loc[:,['Content Rating','Installs']]

### What all manipulations have you done and insights you found?

* Droped duplicates and kept the data which apperead first.
* Changed Data Type from Object data type to numeric data type of numeric columns.
* Specifically changed and entered data at row number 9300 and row number 8028.
* Converted Size column having mix units Kb and Mb into just Mb(Megabyte).
* Created a dataframe for each question.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

We have to devlop insights which will help a developer to make apps.According to the Data available we have to answer 6 questions.
* 1)What should be the name of the app?
* 2)What should be the Category of the app?
* 3)What should be the size of the app?
* 4)Should the app be free or paid?
* 5)What should be the content rating?

### What is the Category of Apps having more than 1 billion Installs?

#### Chart - 1

In [None]:
#Chart - 1 visualization code
fig,ax = plt.subplots()
df2.value_counts().plot(ax=ax,kind='bar',x='Category',y='Installs')
ax.set_ylabel('Number of Apps')
ax.set_xlabel('Categories')
ax.set_title('Number of Apps in Categories having more than One billion Installs')

##### 1. Why did you pick the specific chart?

* The data to be compared was categorical and discrete.
* Also the subgroups or categories were countable.

So using a bar chart was the best choice.


##### 2. What is/are the insight(s) found from the chart?

* Out of 33 Categories only **11** Categories have atleast One App having 1 billion+ Installs.
* **Communication** Category has the most number(6) of Apps having 1 Billion + Installs.
* **Social** Category has 3 Apps and **travel and local**,**video_players** category have 2 Apps each above 1 Billion + Installs.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

If the App to be developed is from Category like **communication or social** there are chances that it might get 1 billion + Installs.

### What is the Category of Apps having Installs between 50million to 1 billion?

#### Chart - 2

In [None]:
#Chart - 2 visualization code
fig,ax = plt.subplots()
df3.value_counts().plot(ax=ax,kind='bar',x='Category',y='Installs')
ax.set_ylabel('Number of Apps')
ax.set_xlabel('Categories')
ax.set_title('Number of Apps in Categories having Installs between 50 million to 1 billion')

##### 1. Why did you pick the specific chart?

* The data to be compared was categorical and discrete.
* Also the subgroups or categories were countable.

So using a bar chart was the best choice.

##### 2. What is/are the insight(s) found from the chart?

* Out of 33 Categories only **8** Categories have atleast One App having 50 million+ to 1 billion installs.
* **Communication** and **Tools** category have **5** apps each, similarly **Games** and **Productivity** category have **4** apps each having 50 million to 1 billion installs.
* 7 Categories **Communication, Social, Video Players, Game, Tools, Productivity, News and Magazines** are common in both datasets wiz 1 billion+ and 50 million to 1 billion Installs.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* To get more than 50 million Installs the App to be developed should be from **Communication, Social, Video Players, Game, Tools, Productivity, News and Magazines** Category.
* If the App developed is from **Communication** category there are maximum chances that it will get 50 million+ Installs.

### What is the Genre of Apps having more than 1 billion Installs?

#### Chart - 3

In [None]:
#Chart - 3 visualization code
fig,ax = plt.subplots()
df4.value_counts().plot(kind='bar',x='Category',y='Genres') 
ax.set_ylabel('Number of Apps')
ax.set_xlabel('Genres')
ax.set_title('Number of Apps in Genres having more than 1 billion Installs')

##### 1. Why did you pick the specific chart?

* The data to be compared was categorical and discrete.
* Also the subgroups or categories were countable.

So using a bar chart was the best choice.

##### 2. What is/are the insight(s) found from the chart?

* Out of 118 Genres only **11** Genres have atleast One App having 1 billion+ Installs.
* **Communication** Genre has the most number(6) of Apps having 1 Billion + Installs.
* **Social** Genre has 3 Apps and **travel and local**, **video_players and Editors** Genre have 2 Apps each above 1 Billion + Installs.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

If the App to be developed is from Genres like **communication** or **social** there are chances that it might get 1 billion + Installs.

### Which are 10 most reviewed apps?

#### Chart - 4

In [None]:
# Chart - 4 visualization code
fig,ax = plt.subplots()
df5=df1.copy().loc[:,['App','Reviews']]
df5.nlargest(10,'Reviews').plot(ax=ax,kind='bar',x='App',y='Reviews')
ax.set_ylabel('Number of Reviews')
ax.set_xlabel('Name of Apps')
ax.set_title('10 most reviewed Apps')
#ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))

##### 1. Why did you pick the specific chart?

* The data to be compared was discrete.
* Also the number of Apps were countable and few.

So using a bar chart was the best choice.

##### 2. What is/are the insight(s) found from the chart?

* Apps like **Facebook**,**Whats App**,**Instagram** are the Top 3 most reviewed .
* There are 4 **Communication** ,3 **Games** , 2 **Tools** and 1 **Social** App

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

These Apps have been most reviewed by its users,so if an App developer is developing app which has features similar to above apps, constant development of new updates will be necessary.

###Analyzing the Numerical Data columns and deriving correlation between them via heatmap

#### Chart - 5

In [None]:
# Chart - 11 visualization code
sns.heatmap(df1.corr(), annot = True, linewidths=1.0, fmt=".3f")
plt.title("Heatmap for numerical columns", size=15)

##### 1. Why did you pick the specific chart?

* Correlation was required to be found between columns

##### 2. What is/are the insight(s) found from the chart?

* Installs and Reviews columns have a good correlation of 0.625.
* Other columns do not have any significant relation between each other

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* As the number of Installs go on increasing ,the number of reviews also go on increasing, more feedback to reviews and frequent app updates are necessary.

### Is there any correlation between Ratings and Reviews.

#### Chart - 6

In [None]:
# Chart - 5 visualization code
fig,(ax1,ax2) = plt.subplots(1,2)
df6.plot(ax=ax1,kind='scatter',y='Reviews',x='Rating',figsize=(15,8),alpha=0.2)
ax1.set_ylabel('Number of Reviews')
ax1.set_xlabel('Ratings')
ax1.set_title('Relation between Ratings and Reviews')
ax1.xaxis.set_ticks(np.arange(1,5.25,0.25))
df7.loc[df7['Reviews']<20000,['Rating','Reviews']].plot(ax=ax2,kind='scatter',y='Reviews',x='Rating',alpha=0.2)
ax2.set_ylabel('Number of Reviews')
ax2.set_xlabel('Ratings')
ax2.set_title('Relation between Ratings<3.5 and Reviews<20 thousand')
fig.tight_layout()# inorder to maintain a proper gap

##### 1. Why did you pick the specific chart?

* The data was to be correlated between two variables.
* Each App corresponds to one dot 

Hence to showcase all points scatter plot was the best choice.

##### 2. What is/are the insight(s) found from the chart?

* Most of the Apps have Ratings between 3.5 and 5.0 and Reviews less than 10 million.
* In second subplot which is a zoomed in plot of the first we can see that there are very few apps that have less rating (< 3.0) and have more number of reviews.
* Apps which have exact 5 Star Rating also dont have reviews more than 10 million.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.


*   When the App will be deployed on Playstore the App should be able to maintain Rating between 3.5 and 4.75.
*   Some Apps have less Ratings and Reviews ,which means that if the App is not good enough users dont bother to rate or review it. 





### What should be name of the App?

#### Chart - 7

In [None]:
fx,(ax)=plt.subplots(figsize=(15,5))
sns.boxplot(data=df8,x='Installs',y='Character_length',ax=ax)
ax.yaxis.set_ticks(np.arange(0,210,10))
plt.grid()
plt.xticks(rotation='vertical')

##### 1. Why did you pick the specific chart?

* It was required to find range in character lengths.
* Also we needed to compare different subgroups in number of installs.

To do both things together it was necessary to go for a box plot.

##### 2. What is/are the insight(s) found from the chart?

* All subgroups have quartile ranges in between 10 to 30 letters.
* In all subgroups Minimum is 1 letter and Maximum is 50 letters.
* All subgroups above 5lakhs i.e 0.5 million installs dont have outliers above 50 letters.
* Median of character lengths is 21 letters. 


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* The Name of App to be developed should have character lengths less than 30.
* In a max case it could be 50 letters.

### What should be the Category of the App?

#### Chart - 8

In [None]:
# Chart - 8 visualization code
fig,(ax) = plt.subplots()
df9.groupby(by='Category').mean().plot(kind='bar',figsize=(15,10),ax=ax)
ax.yaxis.set_ticks(np.arange(0,3.5*10**(7)+0.25*10**(7),0.25*10**(7)))
plt.grid()
plt.xticks(rotation='vertical')

##### 1. Why did you pick the specific chart?

* Data was to be compared Categorically.
* Each  Category had a discrete value.
So using a bar plot was the best choice.

##### 2. What is/are the insight(s) found from the chart?

* **Communication** , **Video_players** , **Social** are the top three categories which have most average Installs.



##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

If we see in Chart1, Chart2 & Chart 4 and even in above chart **COMMUNICATION** stands out as the most promising category in which we can develop App so as to get good number of user base.

### What should be the Size of the App?

#### Chart - 9

In [None]:
# Chart - 9 visualization code
fx,(ax)=plt.subplots(figsize=(15,5))
sns.boxplot(data=df10,x='Installs',y='Size',ax=ax)
ax.yaxis.set_ticks(np.arange(0,110,10))
plt.grid()
plt.xticks(rotation='vertical')

##### 1. Why did you pick the specific chart?

* It was required to find range in size of App.
* Also we needed to compare different subgroups in number of installs.

To do both things together it was necessary to go for a box plot.

##### 2. What is/are the insight(s) found from the chart?

* Allmost all apps have their Interquartile ranges in between 20 to 55 Megabytes.
* Median is 33 Megabytes.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The App to be developed should be in the size range of 20 to 55 Mb, smaller the size better it would be.

### Should the App be Free or Paid?

#### Chart - 10

In [None]:
# Chart - 10 visualization code
fx,(ax)=plt.subplots(figsize=(10,8))
df11.groupby(by='Type').count().plot(kind='pie',x='Type',y='Installs',ax=ax,startangle=45,explode=(0,0.1),autopct='%1.1f%%',shadow=True)

##### 1. Why did you pick the specific chart?

* Data to be compared had only two categories, also the values were discrete.
* Data was required to be compared in percentages.
So to go for the pie plot was the best option.

##### 2. What is/are the insight(s) found from the chart?

* Only 7.8 percent of the Apps Installed are paid apps.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The App to be developed should be free as 92.2% of Installs are from Free Apps. 

#### Chart - 11

In [None]:
# Chart - 11 visualization code
amount =df1.groupby(['Type','Category'],as_index=False)['App'].count()
px.sunburst(amount, values='App', path=['Type','Category'], title='Amount Of Apps in Paid and Free Category', color='Category')

##### 1. Why did you pick the specific chart?

* Hierarchical data was to be plotted.
* Each level of hierarchy was again subdivided into categories.
* Also scrolling over the plot we get more data about the specific category.

##### 2. What is/are the insight(s) found from the chart?

* Family,Medical,Game are top three categories in paid apps.
* Family,Game,Tools are top three categories in free apps.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* If a paid app is to be made it should be in a Family, Game or Tool category.

### What should be the content Rating?

#### Chart - 12

In [None]:
# Chart - 12 visualization code
fx,(ax)=plt.subplots(figsize=(10,8))
df12.groupby(by='Content Rating').sum().plot(kind='pie',x='Content Rating',y='Installs',ax=ax,startangle=45,autopct='%1.1f%%',shadow=True)

##### 1. Why did you pick the specific chart?

* Data to be compared had only two categories, also the values were discrete.
* Data was required to be compared in percentages.

##### 2. What is/are the insight(s) found from the chart?

* **Everyone** ,**Teen** Content rated apps cover 91.4 percent(69.5%+21.9%) of the Apps.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The App to be developed should have content Rating as Everyone or Teen



## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

In order to make an app to be developed a hit the app should be made according to following constraints
* The name of the app should be upto 50 letters ,less the number of letters better it would be.
* The app should be made in communication category.
* The size of the app should be less than 55 Mb,lesser the size better it would be.
* The App should be free.
* The content rating should be everyone or teen.



# **Conclusion**

* Some Apps have less Ratings and Reviews ,which means that if the App is not good enough users dont bother to rate or review it.It means inshort the apps having less rating dont have much reviews either.
* To get more number of installs the name length should be small, size should be small, category should be communication or atleast social, content rating should be everyone or teen and app shouldnt be paid.
* More the reviews more updates are required for the app.
* Categories other than communication ,social have some apps that are hit.
* Installs and reviews are the only two columns kind of coorelated to each other other columns are not even closely related to each other.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***