<a href="https://colab.research.google.com/github/palbharti151/Capstone_Project_EDA/blob/main/Individual_Play_Store_App_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Play Store App Review Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
 **Name - Pal Bharti**


# **Project Summary -**

The Play Store App EDA (Exploratory Data Analysis) project is an in-depth analysis of the Google Play Store app dataset. The primary objective of this project is to gain insights into the various aspects of apps available on the Play Store and understand the factors that contribute to their popularity and success.


The project begins with data collection, where a comprehensive dataset containing information about different apps, such as their categories, ratings, reviews, sizes, and download counts, is gathered. This dataset serves as the foundation for the subsequent analysis.



During the exploratory data analysis phase, various statistical and visual techniques are employed to understand the distribution, trends, and patterns within the dataset. Key metrics such as the most popular app categories, average ratings, and the relationship between app size and download counts are examined. Moreover, the project aims to uncover any outliers, missing values, or data quality issues that may impact the analysis.



Overall, the Play Store App EDA project aims to provide valuable insights into the Play Store app ecosystem. The findings can be utilized by app developers, marketers, and stakeholders to make informed decisions regarding app development, marketing strategies, and monetization.






# **GitHub Link -**

https://github.com/palbharti151/Capstone_Project_EDA/blob/72cedc50bc1971703fbf8f7f94836fe57d020d1e/Individual_Play_Store_App_EDA.ipynb

# **Problem Statement**


The Play Store App EDA (Exploratory Data Analysis) project addresses the need for comprehensive insights into the Google Play Store app ecosystem. The availability of millions of apps on the Play Store presents app developers, marketers, and stakeholders with numerous challenges when it comes to understanding user preferences, identifying successful app features, and making data-driven decisions.

The problem this project aims to solve is the lack of a comprehensive analysis of the Play Store app dataset. While there is abundant data available, there is a need to extract meaningful insights and patterns from it. The project focuses on addressing the following key questions:

1. App Categorization:
2. App Ratings and Reviews
3. App Size and Download Counts
4. App Features and Popularity


By addressing these questions through exploratory data analysis, this project aims to provide actionable insights that can guide app developers, marketers, and stakeholders in making informed decisions. The analysis will enable them to understand user preferences, identify successful app attributes, and develop effective strategies for app development, marketing, and monetization.


#### **Define Your Business Objective?**

The business objective of the Play Store App EDA (Exploratory Data Analysis) project is to leverage the insights gained from the analysis to drive informed decision-making and strategic planning within the app development and marketing ecosystem. The primary goals of this project are as follows:

* App Development Strategy
* Marketing and User Acquisition:
* Monetization Strategies
* Competitive Analysis


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns  
from datetime import datetime

# plotly
import plotly 
plotly.offline.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import warnings

#sns.set(font_scale=1.5)
warnings.filterwarnings("ignore")

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

# Reading the dataset
play_store = pd.read_csv("Play Store Data.csv")
user_review = pd.read_csv("User Reviews.csv")



### Dataset First View

In [None]:
# Dataset First Look
play_store.head()


In [None]:
# Dataset First Look
user_review.head()


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

play_store.shape

In [None]:
# Dataset Rows & Columns count

user_review.shape

### Dataset Information

In [None]:
# Dataset Info
play_store.info()

In [None]:
# Dataset Info
user_review.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

duplicate_values = play_store['App'].duplicated().sum()
duplicate_values


In [None]:
# Dataset Duplicate Value Count

duplicate_values1 = user_review['App'].duplicated().sum()
duplicate_values1


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

print(play_store.isnull().sum())

In [None]:
# Missing Values/Null Values Count

print(user_review.isnull().sum())

In [None]:
# Visualizing the missing values

# Checking Null Value of play store by plotting Heatmap
sns.heatmap(play_store.isnull(), cbar=False, cmap='viridis')

In [None]:
# Visualizing the missing values

# Checking Null Value of User Review by plotting Heatmap

sns.heatmap(user_review.isnull(), cbar=False, cmap='viridis')

### What did you know about your dataset?

Play Store app review analysis EDA project, the dataset typically consists of information related to user reviews of various apps available on the Play Store. The dataset may include the following key features:

* App Information
* User Reviews:
* App Version
* User Information:
* App Metadata

These are general features that can be found in a Play Store app review dataset, but the specific attributes and structure may vary depending on the source and scope of the dataset

When analyzing this dataset, we can explore various aspects such as the distribution of ratings, sentiment analysis of user reviews, trends in review sentiment over time, most commonly mentioned keywords or topics in reviews, and any correlations between app features (such as app size, category) and user reviews.

 * Additionaly , Play Store dataset has 10841 rows and 13 columns and  User Reviews dataset has 64295 rows and 5 columns.

* In play store dataset total 483 duplicate values and in user review dataset total 33616 values.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
play_store.columns

In [None]:
# Dataset Columns

user_review.columns

In [None]:
# Dataset Describe
play_store.describe(include='all')

In [None]:
# Dataset Describe
user_review.describe(include='all')

### Variables Description 

**Variables Description of Play Store csv**

* App name : The name or title of the app.
* Category : The category or genre of the app 
* Rating : The numerical rating given by the user (usually on a scale of 1 to 5 stars).

* Reviews : The text content of the user's review.
* Size : The size of the app in terms of storage space.
* Installs : The approximate number of times the app has been installed
* Type : Types of app like ( Free or Paid )
* Price : Price of application.
* Content Rating :  The price or pricing model of the app 

* Genres : Play Store apps belong, providing classification based on functionality or content.

* Last Updated : Update info when we updated application.
* Current Version : Current version of the app available on the Play Store.
* Android Version : Android operating system version required to run the app.


**Variables Description of User Review csv**

* App: The name or title of the app for which the review was provided.

* Sentiment: The sentiment label associated with the review

* Translated_Review: The translated version of the user's review text.

* Sentiment_Polarity: The polarity or sentiment score of the review, indicating the sentiment as positive, negative, or neutral.

* Sentiment_Subjectivity: The subjectivity score of the review, representing the extent to which the review is subjective or objective

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in play_store.columns.tolist():
  print(i,"is",play_store[i].nunique())

In [None]:
# Check Unique Values for each variable.
for i in user_review.columns.tolist():
  print(i,"is",user_review[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
dataset = play_store.copy()
dataset1 = user_review.copy()


In [None]:
# Fill missing values (Play store dataset)

dataset['Rating'].fillna(dataset['Rating'].mean(),inplace = True)
dataset['Rating'].isnull().sum()

In [None]:
# Fill missing values (User Revview dataset)

dataset1 = dataset1[~dataset1['Sentiment'].isnull()]
dataset1.dropna(inplace = True)
dataset1.isnull().sum()

In [None]:
# Drop Null values (Play store dataset)

dataset.drop_duplicates(inplace = True)
dataset.drop_duplicates(subset='App',inplace = True)
dataset.duplicated().value_counts()

In [None]:
# Drop Null values (User Revview dataset)


dataset1.drop_duplicates(inplace = True)
dataset1.drop_duplicates(subset='App',inplace = True)
dataset1.duplicated().value_counts()

In [None]:
# Drop all null values 

dataset.dropna(inplace = True)
dataset1.dropna(inplace = True)



 **1**. **Let's find the top 3 most installed apps**





In [None]:
# Let's find the top 3 most installed apps

top_category_app = dataset['Category'].value_counts().head(3)
print("Top 3 most installed apps:")
print(top_category_app)

**2. Identify the % distribution of app types**

In [None]:
# Find the % distribution of app types ( Free or Paid)

categorydf = play_store.groupby(['Type'])['Type'].count()

total_apps = categorydf.sum()
free_apps = categorydf['Free']
paid_apps = categorydf['Paid']

per_free = (free_apps / total_apps) * 100
per_paid = (paid_apps / total_apps) * 100

print( per_free ,"% Free apps")
print( per_paid, "% Paid apps")


**3. Find mean of App Ratings**

In [None]:
from numpy.ma.extras import average
# Fidn mean of app ratings

ratings = dataset['Rating']
average_rating = dataset.mean()
print("Average of App Rating:", average_rating)

**4. Find the number of Count of Neutral Sentiment**




In [None]:
# Count of Neutral Sentiment

Neutral_apps_count = user_review[user_review['Sentiment'] == 'Neutral']['Sentiment'].count()
Neutral_apps_count


**5. Find the number of Count of Positive Sentiment**

In [None]:
Positive_apps_count = user_review[user_review['Sentiment'] == 'Positive']['Sentiment'].count()
Positive_apps_count


**6. What are the top 10 Applications based on rating**

In [None]:
# The top 10 applications based on ratings

best_reviewed_apps = play_store.sort_values(by='Rating', ascending=False)

print("Top 10 highest rated applications are:")
for app_name in best_reviewed_apps['App'].head(10):
    print(app_name)


**7. What are the 5 most expensive applications**

In [None]:
# most expensive applications
df4 = play_store.sort_values(by='Price', ascending=False)
print("The top 5 most expensive apps are:")
for app_name in df4['App'].head(5):
    print(app_name)


**8. What are the top 5 most installed Free Applications**

In [None]:
# top 5 most installed Free Applications

df2 = play_store[play_store['Type'] == 'Free']

print("Top 5 most installed free applications are:")
for app_name in df2.sort_values(by='Installs', ascending=False).head(5)['App']:
    print(app_name)


**9. What are the top 5 most downloaded applications that are paid**

In [None]:
# the top 5 most downloaded applications that are paid

df1 = play_store[play_store['Type'] == 'Paid']

print("Top 5 most installed paid applications are:")
for app_name in df1.sort_values(by='Installs', ascending=False).head(5)['App']:
    print(app_name)


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 -  Pie Chart

In [None]:
# Pie Chart - Showing distribution of various application category
category= play_store.groupby(['Category'])['Category'].count()
category

# Plot pie chart with various categories

labels =category.index[1:]
print(labels)
values=category.tolist()[1:]
print(values)
plt.pie(values, labels =labels, radius=4,autopct='%1.1f%%')
plt.xticks(rotation=90)
plt.gcf().subplots_adjust(bottom=0.4)
plt.show

##### 1. Why did you pick the specific chart?

  I have choose Pie Chart for Showing distribution of various application category because pie chart is commonly used to represent parts of a whole or to show the distribution of a categorical variable. It is effective in displaying relative proportions or percentages.

##### 2. What is/are the insight(s) found from the chart?

Chart show the insights of distribution of various application category. This data show the very clear distribution of application category and uknown person can understand the whole consept of data and it helps a lot for app development.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes Obviously, The insights from the pie chart can highlight emerging or growing categories. Businesses can leverage this information to identify new market trends and capitalize on untapped opportunities. Investing in these growing categories can result in positive business growth and expansion.

By analyzing the distribution of application categories, businesses can assess the diversity and balance of their product portfolio. They can identify any gaps or over-reliance on specific categories and make informed decisions to optimize their offerings. A well-balanced and diverse product portfolio can mitigate risks and ensure long-term sustainability.

#### Chart - 2 - Bar Graph

In [None]:
# Bar Graph -Application Type Distribution
play_store = play_store[play_store['Type'].isin(['Free', 'Paid'])]

# Convert 'Type' column to a categorical variable
play_store['Type'] = play_store['Type'].astype('category')

# Plot the countplot
plt.figure(figsize=(10,5))
sns.countplot(data=play_store, x='Type')
plt.title('Type Distribution')
plt.ylabel('Number of Apps')
plt.show()

##### 1. Why did you pick the specific chart?

Because a bar chart allows for a clear visual comparison between the two application types (Free and Paid). The length of each bar can represent the proportion or count of applications in each category, making it easy to compare the distribution.  Bar charts are well-suited for displaying categorical data, where each bar represents a distinct category.



##### 2. What is/are the insight(s) found from the chart?

The bar chart allows for a clear comparison of the proportions or counts of Free and Paid applications. The lengths of the bars provide a visual representation of the relative distribution between the two categories. The chart can reveal the potential revenue sources for app developers or businesses. The proportion of Paid applications compared to Free applications can indicate the potential market for generating revenue through app purchases or subscriptions.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes Why not!, The insights from the bar chart can guide businesses in optimizing their revenue strategies. By understanding the distribution between Free and Paid applications, businesses can align their pricing models, identify opportunities for upselling or offering premium features, and potentially increase their revenue streams. The chart can help businesses make informed decisions about monetizing their applications. They can assess the market demand for paid applications and determine whether it is viable to develop and launch paid apps.

#### Chart - 3 - Pair plot

In [None]:
# Pair plot - 3 visualization code

Rating = play_store['Rating']
Size = play_store['Size']
Installs = play_store['Installs'].str.replace(',', '').str.replace('+', '').astype(float).apply(np.log)
Reviews = play_store['Reviews'].astype(float).apply(np.log10)
Type = play_store['Type']
Price = play_store['Price']

p = sns.pairplot(pd.DataFrame(list(zip(Rating, Size, Installs, Reviews, Price, Type)),
                              columns=['Rating', 'Size', 'Installs', 'Reviews', 'Price', 'Type']), hue='Type')
p.fig.suptitle("Pairwise Plot - Rating, Size, Installs, Reviews, Price", x=0.5, y=1.0, fontsize=16)



##### 1. Why did you pick the specific chart?

I picked the pair plot because it allows us to visualize the relationships between multiple variables in a single plot. With the pair plot, we can understand the best set of features to explain the relationship between two variables and identify any patterns or clusters.

The pair plot provides a matrix of scatter plots where each variable is plotted against every other variable. It helps us analyze the pairwise relationships and observe any correlations, trends, or clusters within the data.

##### 2. What is/are the insight(s) found from the chart?

The pair plot helps us identify any distinct clusters or patterns in the data. If we notice groups of data points that are tightly clustered together in some scatter plots, it suggests that these variables have a strong relationship and may contribute to forming separate clusters. This can be useful for identifying different categories or segments within the data. The pair plot provides a comprehensive view of the relationships between variables and allows us to identify correlations, clusters, outliers, and important features. These insights can help us understand the underlying patterns in the data and make informed decisions for further analysis or modeling.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the pair plot can potentially help create a positive business impact by providing valuable information about the relationships between variables and the formation of clusters. Understanding the relationships between variables and identifying important features can aid in making informed business decisions. For example, if the pair plot reveals a strong positive correlation between the 'Rating' and 'Installs' variables, it suggests that improving the app's rating can potentially lead to increased installations, guiding the business to focus on improving user satisfaction and app quality.

#### Chart - 4 - Heatmap

In [None]:
# Heatmap -  Plotting a corelation graph to find corelation between ating, price, reviews and installs
corr_df=play_store.corr()
import seaborn as sns
import matplotlib.pyplot as plt

cols = ['Rating', 'Price', 'Reviews', 'Installs']
correlation_matrix = play_store.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

The heatmap chart was chosen to visualize the correlation between rating, price,reviews and installs because it provides an intuitive and concise representation of the correlation matrix.


##### 2. What is/are the insight(s) found from the chart?

There is a moderately positive correlation between rating and reviews. This suggests that apps with higher ratings tend to have more reviews.There is a weak positive correlation between rating and installs. It implies that higher-rated apps generally have a higher number of installs, although the correlation is not very strong.There is a stronger positive correlation between reviews and installs. This indicates that apps with more reviews tend to have a higher number of installs.There is no significant correlation between price and rating, reviews, or installs. It suggests that the price of an app does not have a direct impact on its rating, reviews, or number of installs.



##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The heatmap does not reveal any negative correlations or insights that would directly lead to negative growth, it emphasizes the importance of focusing on factors other than just price, such as quality, features, user experience, and positive reviews, to create a positive business impact.

#### Chart - 5 - Pie Circle Chart

In [None]:
# Pie Circle Chart - Percentage of apps belonging to each category in the playstore
plt.figure(figsize=(12,12))
plt.pie(play_store.Category.value_counts(), labels=play_store.Category.value_counts().index, autopct='%1.2f%%')
my_circle = plt.Circle((0,0), 0.3, color='white')  # Decrease the radius to 0.3 (adjust as needed)
p = plt.gcf()
p.gca().add_artist(my_circle)
plt.title('% of apps share in each Category', fontsize=10)
plt.show()


##### 1. Why did you pick the specific chart?

The specific chart chosen, which is a pie chart with a circle in the center, was selected to visualize the percentage of apps belonging to each category in the play store. The pie chart is effective in displaying the relative proportions of different categories, and the circle in the center helps to create a visually appealing and balanced representation of the data. 

##### 2. What is/are the insight(s) found from the chart?

The insight from the pie circle chart for the percentage of apps belonging to each category in the play store is the distribution of apps across different categories. The chart provides a visual representation of the proportion of apps in each category, allowing us to identify the categories with the highest and lowest percentages. By examining the chart, we can determine which categories have a larger presence in the play store and which categories are relatively less represented. This information can be useful for understanding the popularity and demand for different app categories and can potentially guide business decisions related to app development, marketing, and targeting specific categories.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the pie circle chart showing the percentage of apps belonging to each category in the play store can potentially help create a positive business impact. By understanding the distribution of apps across categories, businesses can make informed decisions regarding app development, marketing strategies, and targeting specific categories that have a higher demand and popularity. They can focus their efforts on categories that have a larger share of apps and potentially a larger user base.



#### Chart - 6 - Scatterplot

In [None]:
# scatterplot - 6 scatterplot of sentiment polarity and sentiment subjectivity
plt.figure(figsize=(6,6))
sns.scatterplot(x=user_review['Sentiment_Subjectivity'], y=user_review['Sentiment_Polarity'],
                hue=user_review['Sentiment'], edgecolor='white', palette="inferno")
plt.title("Google Play Store Reviews Sentiment Analysis", fontsize=20)
plt.show()


##### 1. Why did you pick the specific chart?

I chose the scatterplot chart to visualize the relationship between sentiment polarity and sentiment subjectivity because it is effective in showing the distribution and correlation between two continuous variables. Scatterplots provide a clear way to observe patterns, clusters, or trends in the data, and they are particularly useful when comparing two numerical variables.



##### 2. What is/are the insight(s) found from the chart?

The scatterplot helps us visualize the distribution of sentiment polarity and subjectivity. We can see the concentration of data points across the range of sentiment polarity and subjectivity values.By examining the scatterplot, we can identify any potential patterns or trends between sentiment polarity and subjectivity. For example, we can observe if there is a tendency for highly subjective reviews to have extreme polarity (either very positive or very negative).

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the scatterplot of sentiment polarity and sentiment subjectivity can potentially help create a positive business impact.By analyzing the sentiment polarity and subjectivity, businesses can gain valuable insights into customer sentiment towards their products or services. This understanding can help identify areas of strength and weakness, allowing businesses to make informed decisions for improving their offerings.

#### Chart - 7 - Line Chart

In [None]:
# Line Chart - 7 Distribution of App update over the Year
play_store['Last Updated'] = pd.to_datetime(play_store['Last Updated'])

play_store['Update Year'] = play_store['Last Updated'].dt.year

update_counts = play_store['Update Year'].value_counts().sort_index()

plt.figure(figsize=(10, 6))
plt.plot(update_counts.index, update_counts, marker='o')
plt.xlabel('Year')
plt.ylabel('Number of Apps')
plt.title('Distribution of App Updates Over the Years')
plt.show()



##### 1. Why did you pick the specific chart?

I picked the line chart for the distribution of app updates over the years because it is an effective way to visualize the trend and changes in the number of app updates over time. The line chart allows us to observe the overall pattern, identify any increasing or decreasing trends, and compare the update distribution across different years. It provides a clear and intuitive representation of how app updates have been distributed over the years, making it suitable for analyzing the distribution of app updates over time.

##### 2. What is/are the insight(s) found from the chart?

The chart shows the general trend of app updates over the years. It helps identify whether the number of updates has been increasing, decreasing, or staying relatively stable.

The chart enables the observation of year-to-year variations in app updates. It helps identify specific years with significant spikes or drops in the number of updates.

By analyzing the slope and direction of the line, we can determine the long-term growth or decline in app updates. A positive slope indicates increasing updates over the years, while a negative slope indicates a decreasing trend.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The chart allows businesses to identify years or periods with increasing app updates. This insight can help companies capitalize on the growing demand for updates and leverage it as an opportunity for business growth.

the gained insights from the line chart can help businesses make data-driven decisions, optimize their app development strategies, and enhance customer satisfaction, leading to a positive business impact.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

**Based on the analysis of the provided visualizations and insights, there are several suggestions for the client to achieve their business objectives:**

* **Focus on Popular App Categories**: The pie chart showcasing the distribution of apps across various categories provides insights into the most popular categories in the Play Store. The client can consider developing and promoting apps in these categories to target a larger user base and increase potential downloads.

* **Emphasize Free Apps** : The bar chart comparing the distribution of free and paid apps indicates that free apps have a significantly higher presence in the Play Store. To maximize visibility and reach a broader audience, the client should consider offering free apps, while incorporating monetization strategies such as in-app advertisements or optional in-app purchases to generate revenue.

* **Leverage User Reviews and Sentiment Analysis** : The scatterplot representing sentiment polarity and subjectivity of user reviews provides insights into user sentiments towards the client's apps. The client can analyze user feedback, identify areas for improvement, and address any negative sentiments. Incorporating user feedback into the app development process can lead to enhanced user satisfaction and positive reviews, ultimately driving higher app ratings and increasing downloads.

* **Optimize App Updates**: The line chart displaying the distribution of app updates over the years highlights the importance of consistent and timely updates. The client should focus on providing regular updates to their apps, incorporating new features, addressing bugs, and improving performance. This approach demonstrates active development and responsiveness to user needs, leading to improved customer engagement, satisfaction, and retention.

* **Monitor Market Trends and Competition**: The line chart comparing the update distribution between paid and free apps can help the client understand market trends and benchmark their performance against competitors. It is crucial to stay updated with industry developments, monitor competitor strategies, and continuously innovate to maintain a competitive edge in the Play Store.

* **Explore Strategic Partnerships and Collaborations**: Identify potential partnerships with complementary apps or brands that target a similar user base. Collaborating with other app developers or brands can lead to cross-promotion opportunities, expanding the reach of the client's apps and attracting new users.

* **Enhance App Store Optimization (ASO)**: Pay attention to optimizing the app's metadata, including app title, description, keywords, and screenshots. Conduct keyword research to identify relevant and high-volume search terms within the Play Store. By improving the app's visibility and discoverability through effective ASO strategies, the client can increase organic traffic and attract more potential users.

# **Conclusion**

**By analyzing the data closely, we have inferred a few of the observations.**

• Google PlayStore has 2 types of applications, mainly Free and Paid.

• There are various categories amongst which there are multiple applications- Three major apps are Family, Tools and Games.

• Applications receive both Ratings and Reviews and can be respectively graded too.

• Applications are installed by various users and mostly installed applications can be found, along with the information of their category, rating, type etc.

• Business, Game, Family and Tools have the highest number of applications.

• Free applications make 98% of the total of Applications, with a total count of more than 8000.

• Communication, Games and Tools are the most installed Application category.

• Most of the apps with higher rating range of 4.0 4.7 are having high amount of reviews and installs. In terms of price, it doesn't reflect a direct relationship with rating, as we could see a fluctuation in term of pricing even at the range of high

• Install Counts and Reviews are the most correlated and influence a lot of the downloads

Thus a lot of factors play an important role in the user usage and popularity of any application. It can be because of the price of the application, based of the rating and reviews, also because of the number of installs, which in turn also affects the decision making of any app.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***