<a href="https://colab.research.google.com/github/lakshmi-rsl/Project1/blob/main/Shenbagalakshmi__EDA_Submission_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Exploratory Data Analysis on Google Playstore App data





```
# This is formatted as code
```

##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

The Play Store is a key player in the digital markets, influencing the app ecosystem for both users and developers.Conducting an exploratory data analysis (EDA) on Play Store data becomes crucial to unravel patterns, trends, and dynamics within this vast collection of mobile applications. This analysis aims to do three things: first, it will show how popular apps are across different categories; second, it will show which factors affect user reviews and ratings; and third, it will investigate the complex relationships that exist between app features like price, size, and user engagement.The analysis attempts to answer important research issues, such as identifying the most popular app categories and figuring out how many installations there are for each rating. Through the use of correlation analysis, data visualisation methods, and descriptive statistics, the study aims to provide insightful information about how the dynamics of the Play Store ecosystem are changing.The expected results have the capacity to direct developers, educate users, and aid in strategic decision-making in the constantly changing field of mobile applications. We hope that our thorough investigation will help to clarify the Play Store's current situation as well as its potential future directions and the ramifications they may have for various parties involved in the online market.






# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The Play Store has significant impact on the app ecosystem in the ever-changing world of digital markets.But it's critical to identify subtle patterns and connections in the vast universe of mobile applications.This necessitates a thorough exploratory data analysis (EDA).This approach attempts to examine the intricate relationships between app features, identify factors that impact user reviews and ratings, and understand the popularity of apps across several categories. Key research questions are addressed via correlation analysis, data visualisation, and descriptive statistics, such as identifying preferred app categories and measuring installations per rating. The information needed could help consumers, developers, and strategists make informed decisions in the constantly changing field of mobile applications.






#### **Define Your Business Objective?**

To Utilize exploratory data analysis findings to enhance app popularity, user satisfaction, and revenue streams, ensuring sustained success and competitiveness within the Play Store ecosystem.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')



In [None]:
df_app = pd.read_csv('/content/drive/My Drive/Project_Module2/Play Store Data.csv')

In [None]:
df_rev = pd.read_csv('/content/drive/My Drive/Project_Module2/User Reviews.csv')

### Dataset First View

In [None]:
# Dataset First Look
df_app.head()

In [None]:
df_rev.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df_app.shape

In [None]:
df_rev.shape

### Dataset Information

In [None]:
# Dataset Info
df_app.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df_app[df_app.duplicated()])

Dropping the duplicate Apps from the dataset

In [None]:
df_app.drop_duplicates(subset="App", inplace = True)

In [None]:
#checking whether the dunplicates were dropped from the dataset
len(df_app[df_app.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df_app.isnull().sum())

It is understood that the columns like "Rating","Type","Content rating", "Current Ver"  and ""Android Ver" have missing values which are to be addressed in the later sections.


In [None]:
# Visualizing the missing values as bar chart
df_app.isnull().sum().plot(kind='bar')

In [None]:
#Visualizing the missing values using heatmap
sns.heatmap(df_app.isnull(),cbar=False)

### What did you know about your dataset?

The datasets consist of two CSV files which are play store data.csv and user reviews.csv.

The play store data.csv has 10,841 observations and 13 variables about details of the applications on Google Play.

The user reviews.csv has 64,295 observations and 5 variables about the most relevant 100 reviews for each app and sentiment informations for each review.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df_app.columns

In [None]:
df_rev.columns

In [None]:
# Dataset Describe
df_app.describe()

### Variables Description

**Play store data.csv**

**App**: Application name

**Rating**: Overall user rating of the app

**Reviews**: Number of user reviews for the app

**Size**: Size of the app

**Price**: Price of the app

**Installs**: Number of user downloads/installs for the app

**Type**: Paid or Free

**Content Rating**: Age group the app is targeted at - Children / Mature 21+ / Adult

**Genres**: An app can belong to multiple genres (apart from its main category)

**Last Updated**: Date of the last app update.

**Current Ver**': Current version of the app.

**Android Ver**': Minimum Android version required.

**User reviews.csv**

**Genres**: An app can belong to multiple genres (apart from its main category)

**App**: Name of app

**Translated_Review**: User review (Preprocessed and translated to English)

**Sentiment:** Positive/Negative/Neutral (Preprocessed)

**Sentiment_Polarity**: Sentiment polarity score (>0 - positive, <0 - negative)
'**Sentiment_Subjectivity**': Numerical score indicating the subjectivity of the review

**Quick Check for Outliers**

**On studying the dataset further, it was found that there was  data with some kind of weird anomaly. Let us find out the row in the data and purge it.**

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for "Rating"
df_app["Rating"].unique()

In [None]:
#Checking the unique value for "Size"
df_app["Size"].unique()

In [None]:
# Checking the unique value for the column "Installs"
df_app["Installs"].unique()

In [None]:
# Checking the unique value for the column "Price"
df_app["Price"].unique()

## 3. ***Data Wrangling***

**Rating column**

We have an app rating of 19 which is out of range and needs to be dropped

In [None]:
df_app[df_app["Rating"]==19]

As we can see that this entry of our dataset is having a Rating of 19.0 which is way higher than the maximum rating of 5.0. Also, the value in the Reviews column has an alphabet which makes it a lone entry to have so. Hence we are removing this particular row to make our analysis easier.**Dropping the row that has incorrect values for our features**.

In [None]:
index_to_drop = 10472
df_app = df_app.drop(index = index_to_drop)

In [None]:
#To check whether the column is dropped or not
df_app[df_app['Rating']==19]

**Installs Column**

**Now we have dropped the column with rating as 19.Next we have to remove '+' and ',' from 'Installs' and also we need to make it numeric**

In [None]:
df_app["Installs"]=df_app["Installs"].apply(lambda x:x.replace("+"," ")if "+" in str(x) else x)

In [None]:
df_app["Installs"]=df_app["Installs"].apply(lambda x: x.replace(' ',' '))

In [None]:
df_app["Installs"]=df_app["Installs"].apply(lambda x:x.replace(',',' ')if ',' in str(x)else x)

In [None]:
# Removing extra spaces and non-numeric characters from "Installs" column
df_app["Installs"] = df_app["Installs"].str.replace(' ', '').str.replace(',', '')

In [None]:
# converting "Installs" column to numeric
df_app["Installs"]=df_app["Installs"].apply(lambda x: int(x))

**Size Column**

**We need to remove 'M' and 'k' from the column and to convert them into bytes. Also we need to remove the term "Varies with device"**

In [None]:
def convert_size(size):
  if isinstance(size, str):
    if "M" in size:
      return float(size.replace("M"," "))*1024*1024
    elif "k" in size:
      return(float(size.replace('k'," ")))*1024
    elif "Varies with device" in size:
      return np.nan
  return size

In [None]:
df_app["Size"]=df_app["Size"].apply(convert_size)

**Now we have removed the anamolies from the three columns viz., Rating, Installs and Size and converterd them into numeric.Lets check them**

In [None]:
df_app.describe()

**Price Column**

**Now we need to remove the "$" symbol from price column and convert them to float data type**

In [None]:
df_app["Price"]=df_app["Price"].apply(lambda x: x.replace("$"," ")if "$" in str(x) else x)

In [None]:
df_app["Price"]=df_app["Price"].apply(lambda x: float(x))

**Reviews Column**

In [None]:
# Converting the Reviews column to int data type
df_app["Reviews"] = df_app["Reviews"].apply(lambda x: int(x))

**We have to ensure whether all the corresponding columns are converted into the required data types**

In [None]:
df_app.dtypes

In [None]:
df_app.describe()

**Now it is found that the columns Rating, Reviews, Size, Installs and Price are converted to numeric columns of required datatypes.**

**Dealing with the Missing Values**

In [None]:
df_app.isnull().sum()

**It is understood that there are 1463 null values in the Rating column, 1227 null values in the Size column, 1 null value in Type column, 8 null values in Current Ver column and finally 2 null values in Android Ver column**

In [None]:
# Let's remove the missing values in the Current Ver, Android Ver and Type columns as they are very smaller numbers
df_app.dropna(subset=['Current Ver','Android Ver','Type'], inplace = True)

In [None]:
# Let's check whether the null values are dropped in the corresponding columns
df_app.isnull().sum()

**Now we have only two columns namely Rating and Size with null values.We can replace them with their mean and median values respectively.**

In [None]:
# Replacing null values in 'Rating'  column with the mean of the column
df_app['Rating'].fillna(df_app['Rating'].mean(), inplace=True)

In [None]:
# Replacing null values in 'Size' column with the median of the column
df_app['Size'].fillna(df_app['Size'].median(), inplace=True)

In [None]:
#To check whether there are no null values in the dataset
df_app.isnull().sum()

In [None]:
df_app.to_csv('/content/drive/My Drive/playstoredata_cleaned.csv', index=False)

### What all manipulations have you done and insights you found?

**The columns such as Installs, Price, Rating and size contained some symbols and weired anamolies. We have identified everything, removed them and converted them into the integer and float datatypes repectively. Especially in the size column, we have converted the values in terms of bytes to make it uniform.We have replaced the Rating column with their corresponding mean value and size column with its median.**


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 - Pie Chart(1.	How many number of apps are free and paid?)

In [None]:
# Chart - 1 visualization code
apps_count = df_app["Type"].value_counts()
apps_count

In [None]:
plt.figure(figsize=(8, 8))
plt.pie(apps_count, labels=apps_count.index,autopct='%1.1f%%', startangle=90, colors=['skyblue', 'lightcoral'])
plt.title('Distribution of Free and Paid Apps')
plt.show()

##### 1. Why did you pick the specific chart?

The ultimate aim is to find out the distribution of paid and free apps in the dtataset. I have chosen "pie chart" since it is effective and it gives the percentage distribution of both the free apps and paid apps.It gives us the comparative visualiazation highlighting their relative sizes.Moreover it is quite straight forward,enhances the readability and well suitable for the

##### 2. What is/are the insight(s) found from the chart?

It is found that the free apps are larger in counts as compared with paid ones.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Chart - 2(2.1.	What are the most popular app categories?)

In [None]:
# Chart - 2 visualization code
category_counts = df_app["Category"].value_counts()
most_popular_categories = category_counts.head(5)
print("The most popular app categories in Play Store app data are:")
print(most_popular_categories)

In [None]:
# Plotting
plt.figure(figsize=(10, 6))
most_popular_categories.plot(kind='barh', color='skyblue')
plt.xlabel('Number of Apps')
plt.ylabel('App Category')
plt.title('Most Popular App Categories in Play Store')
plt.gca().invert_yaxis()  # Invert y-axis to have the highest count at the top
plt.show()

##### 1. Why did you pick the specific chart?

I have chosen this bar chart to visualise the most popular app category, because it effectively displays the app categories and we can make a comparison among all the categories easily. The benefits of horizontal bar charts include, clarity, readability, space efficacy and effecient comparison.

##### 2. What is/are the insight(s) found from the chart?

We understand that the top most popular apps are from the category family, game and tools followed by business and medical.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights into popular app categories provide businesses with strategic advantages, guiding decision-making and resource allocation. By identifying trends and market opportunities, businesses can tailor their offerings to meet consumer demand effectively. This understanding enables them to differentiate from competitors, maximize revenue potential, and optimize monetization strategies. Informed decisions based on popular categories lead to the development of high-quality apps that resonate with users, fostering customer engagement and loyalty. Overall, leveraging insights about app category popularity facilitates positive business impact by driving growth, market relevance, and competitive advantage in the dynamic app market landscape.

#### Chart - 3(3.	Is there a correlation between ratings and the number of installs?)

In [None]:
 #Calculate Pearson correlation coefficient
correlation_coefficient = np.corrcoef(df_app['Rating'], df_app['Installs'])[0, 1]

# Interpret the correlation coefficient
if correlation_coefficient > 0:
    print("There is a positive correlation between ratings and the number of installs.")
elif correlation_coefficient < 0:
    print("There is a negative correlation between ratings and the number of installs.")
else:
    print("There is no correlation between ratings and the number of installs.")

In [None]:
# Chart - 3 visualization code
# Plotting
plt.figure(figsize=(8, 6))
plt.scatter(df_app['Rating'], df_app['Installs'], color='skyblue', alpha=0.5)
plt.title('Correlation between Ratings and Installs')
plt.xlabel('Ratings')
plt.ylabel('Installs')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is chosen to visualize the correlation between ratings and installs due to its effectiveness in displaying the relationship between two continuous variables. It allows for the observation of patterns, trends, and outliers, facilitating the assessment of correlation strength and direction. Scatter plots also accommodate large datasets without clutter, enabling easy visualization of data density across the variable ranges. Their flexibility in customization further enhances clarity and understanding. Overall, scatter plots provide a comprehensive and intuitive method for exploring the relationship between ratings and installs, making them the ideal choice for this correlation analysis.

##### 2. What is/are the insight(s) found from the chart?

There is a positive correlation between the rating and installation of the apps. With a positive correlation between ratings and installs, it suggests that higher-rated apps tend to attract more installations. This insight indicates that user satisfaction, as reflected in ratings, plays a significant role in driving app popularity and adoption. Businesses can focus on improving app quality and user experience to potentially increase installation numbers and overall success in the market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes it will create a positive business impact.Insights from the positive correlation between ratings and installs enable businesses to prioritize user satisfaction and improve app quality, leading to increased adoption and retention. Leveraging this understanding allows for targeted marketing efforts, optimized monetization strategies, and a competitive edge in the app market, ultimately driving positive business impact and success.

#### Chart - 4(4.	Which is the highest rated app in each category?)

In [None]:
highest_rated_apps = df_app.loc[df_app.groupby('Category')['Rating'].idxmax()]
highest_rated_apps

In [None]:
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Rating', data=df_app)
plt.title('Distribution of Ratings Within Each Category')
plt.xlabel('Category')
plt.ylabel('Rating')
plt.xticks(rotation=90)  # Rotate category labels for better readability
plt.show()

##### 1. Why did you pick the specific chart?

A violin plot can be a suitable choice for visualizing the distribution of ratings within each category, which will be more useful for understanding the spread of ratings and identifying any potential multimodal distributions or outliers.


##### 2. What is/are the insight(s) found from the chart?

The highest-rated apps span across a wide range of categories, including House & Home, Libraries & Demo, Lifestyle, Maps & Navigation, Medical, News & Magazines, Parenting, Personalization, Photography, Productivity, Shopping, Social, Sports, Tools, Travel & Local, Video Players & Editors, and Weather. This diversity suggests that high ratings are not limited to a specific genre or niche but can be found across various app categories.The fact that these apps received high ratings indicates that they likely offer quality content, useful features, or innovative solutions that resonate well with users.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Despite belonging to different categories, these apps have garnered significant user engagement, as indicated by the number of reviews and installs.The high ratings and positive user feedback for these apps suggest that they have successfully met users' expectations and provided a satisfactory experience. This positive sentiment can contribute to the app's reputation, user retention, and potential for future growth.Overall, leveraging the insights gained from identifying the highest-rated app categories can help businesses make informed decisions, optimize their app development and marketing strategies, and ultimately drive positive business outcomes and success in the competitive app market.

#### Chart - 5(5.	Which is the lowest rated app in each category?)

In [None]:
# Chart - 5 visualization code
lowest_rated_apps = df_app.loc[df_app.groupby('Category')['Rating'].idxmin()]

# Create a horizontal bar plot
plt.figure(figsize=(10, 6))
plt.barh(lowest_rated_apps['Category'], lowest_rated_apps['Rating'], color='skyblue')
plt.xlabel('Rating')
plt.ylabel('Category')
plt.title('Lowest Rated App in Each Category')
plt.show()

In [None]:
lowest_rated_apps

##### 1. Why did you pick the specific chart?

A horizontal bar plot allows for easy visual comparison of the lowest-rated apps across different categories. The horizontal orientation of the bars makes it simple to compare the ratings of the lowest-rated apps side by side.The length of each bar in the plot directly represents the rating of the lowest-rated app in the respective category. This clear representation enables viewers to quickly discern the relative ratings of the lowest-rated apps within each category.

##### 2. What is/are the insight(s) found from the chart?

The lowest-rated apps span across a wide range of categories, including Art & Design, Auto & Vehicles, Beauty, Books & Reference, Business, Comics, Communication, Dating, Education, Entertainment, Events, Family, Finance, Food & Drink, Games, Health & Fitness, House & Home, Libraries & Demo, Lifestyle, Maps & Navigation, Medical, News & Magazines, Parenting, Personalization, Photography, Productivity, Shopping, Social, Sports, Tools, Travel & Local, Video Players & Editors, and Weather. This diversity indicates that low ratings are not limited to specific genres but can occur across various app categories.The low ratings reflect user dissatisfaction or disappointment with the overall app experience. It indicates that users did not find the apps sufficiently valuable, enjoyable, or functional to warrant higher ratings, leading to negative feedback and lower overall ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By addressing shortcomings identified in low-rated apps, businesses can differentiate themselves from competitors and gain a competitive advantage in the market. Offering a superior user experience, unique features, or better customer support can help businesses stand out and attract users away from competitors.

In [None]:
summary_stats = df_app.groupby('Category')['Size'].describe()
summary_stats

#### Chart - 6(6.	What is the distribution of app sizes across different categories?)


In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(10, 6))
df_app.boxplot(column='Size', by='Category', figsize=(12, 8), rot=90)
plt.title('Distribution of App Sizes Across Different Categories')
plt.xlabel('Category')
plt.ylabel('App Size (in Bytes)')
plt.xticks(rotation=90)
plt.yscale('log')
plt.show()

##### 1. Why did you pick the specific chart?

Box plots offer a concise visual representation of data distribution, facilitating comparisons between groups or categories within a dataset. They condense large datasets, highlighting central tendency, spread, and outliers effectively. Particularly useful for handling large datasets, box plots provide insights into symmetry, skewness, and outlier detection. They aid in understanding key characteristics of data distributions, enabling easy identification of differences or similarities between groups. Overall, box plots serve as a valuable tool in exploratory data analysis, offering intuitive visualization of data distribution and aiding in the interpretation of complex datasets.

##### 2. What is/are the insight(s) found from the chart?

The average size of most of the apps falls above the range of 10^7 bytes. It is also observed that majority of the apps are optimised between the sizes of approximately 2MB and 40MB. It is well optimised in such a way that the apps are neither too large in size nor smaller in size.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Utilizing insights that most apps fall between 2MB and 40MB, businesses can optimize app sizes for improved user experiences, cost efficiency, and performance. By ensuring apps are neither too large nor too small, companies can enhance download speeds, reduce data usage, and cater to users with limited storage or slower internet connections. This optimization fosters higher user satisfaction, retention rates, and potentially lower operational costs. Moreover, offering well-optimized apps provides a competitive advantage, attracting more users and establishing brand loyalty. Overall, leveraging these insights can lead to positive business impacts such as cost savings, enhanced user experiences, and a stronger market position.

#### Chart - 7(7.Which category has the higher in size apps and how are they rated?)

In [None]:
# Chart - 7 visualization code
# Filter apps with size greater than 40 MB
large_apps_df = df_app[df_app['Size'] > 40 * 1024 * 1024]
# Extract categories of large apps
large_apps_categories = large_apps_df['Category'].value_counts()
# Print the number of large-sized apps category-wise
print("Number of large-sized apps category-wise:")
for category, count in category_counts.items():
    print(f"{category}: {count}")


In [None]:
# Create bar chart
plt.figure(figsize=(10, 6))
colors = ['skyblue', 'orange', 'green', 'red', 'purple']
large_apps_categories.plot(kind='bar', color=colors)

# Add labels and title
plt.xlabel('Categories')
plt.ylabel('Number of Large-Sized Apps')
plt.title('Number of Large-Sized Apps Category-wise')

# Rotate x-axis labels if needed
plt.xticks(rotation=45, ha='right')

# Display plot
plt.tight_layout()  # Adjust layout to prevent clipping of labels
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart will be more straightforward and effective. These types of charts directly compare the average or total size of apps within each category, allowing for quick visual comparison and easy identification of the category with larger-sized apps.

##### 2. What is/are the insight(s) found from the chart?

The larger-sized apps primarily fall into the Family and Game categories, and they tend to have favorable ratings. Both categories boast an average rating surpassing 4.5, suggesting that app quality remains high regardless of size.Despite their larger sizes, Family and Game category apps demonstrate consistent user satisfaction, as reflected by their consistently high ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights suggest that larger-sized apps in the Family and Game categories maintain high ratings, indicating user satisfaction irrespective of size. This means companies can focus on making these apps even better without worrying too much about making them smaller.

#### Chart - 8(8.	Are there specific genres that are more likely to have paid apps?)

In [None]:
# Chart - 8 visualization code
Paid_apps=df_app[df_app['Price']>0]
paid_apps_per_genre = Paid_apps['Genres'].value_counts()
# Print the result
print("Genres more likely to have paid apps:")
print(paid_apps_per_genre)


In [None]:
top_10_genres = paid_apps_per_genre.sort_values(ascending=False).head(10)

# Create a donut chart
plt.figure(figsize=(8, 8))
plt.pie(top_10_genres, labels=top_10_genres.index, autopct='%1.1f%%', wedgeprops=dict(width=0.4), startangle=90)

# Add a circle at the center to create a hole
center_circle = plt.Circle((0, 0), 0.6, color='white', linewidth=0)
plt.gca().add_artist(center_circle)

# Add a title
plt.title('Top 10 Genres More Likely to Have Paid Apps')

# Equal aspect ratio ensures that pie is drawn as a circle
plt.axis('equal')

# Display plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A donut chart is highly significant due to its ability to facilitate easy comparison of segments with the total count, placing emphasis on proportions rather than absolute values. Its visually pleasing design enhances viewer engagement, with the central hole providing a clear reference point for the overall data. The ring-like structure simplifies the distinction between segments while allowing space for additional information. Overall, donut charts offer an attractive and informative way to present data, fostering comprehension and engagement among viewers.

##### 2. What is/are the insight(s) found from the chart?

Medical Genre (19.6%): This indicates a relatively high proportion of paid apps within the Medical genre, suggesting a potential demand for specialized medical applications or tools.

Personalization Genre (18.9%): Similarly, the high percentage of paid apps in the Personalization genre suggests a willingness among users to invest in customizing their devices or personalizing their digital experiences.

Tools Genre (18.2%): The Tools genre also shows a significant presence of paid apps, indicating a demand for utility and productivity-focused applications among users.

The remaining genres constitutes about 5%, 6% or 7% of the paid apps in total respectively.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Indeed, the insights gained from the donut chart percentages hold promise for positive business impact. The notable presence of paid apps within the Medical, Personalization, and Tools genres signifies lucrative opportunities. Developers and businesses can leverage this data to tailor their offerings, focusing on developing high-quality apps within these genres to cater to user demand. By capitalizing on these insights, businesses can enhance revenue streams, boost user engagement, and establish a competitive edge in the market, ultimately leading to a positive impact on their bottom line and overall business success.The insight that the remaining genres constitute only about 5%, 6%, or 7% of the total paid apps may potentially lead to negative growth for businesses operating within those genres

#### Chart - 9(Do certain categories dominate the market in terms of the number of installs?)


In [None]:
# Chart - 9 visualization code
# Group by category and sum the installs
installs_per_category = df_app.groupby('Category')['Installs'].sum().sort_values(ascending=False)

# Identify dominant categories (e.g., top 5 categories)
top_categories = installs_per_category.head(5)

# Create a bar chart to visualize the distribution of installs across categories
plt.figure(figsize=(10, 6))
colors = ['skyblue', 'lightgreen', 'salmon', 'gold', 'lightcoral']
plt.bar(top_categories.index, top_categories.values, color=colors)
plt.yscale('log')
plt.xlabel('Category')
plt.ylabel('Total Installs')
plt.title('Top Categories Dominating the Market in Terms of Installs')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

# Optional: Print the total installs per category
print("Total Installs per Category:")
print(installs_per_category)


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

1.The surge in gaming, communication, and tools categories reflects what users crave – fun, connectivity, and productivity boosters.
2.Uncovering the dominant players in these fields aids in crafting our unique selling proposition amidst fierce competition.
3.Keep a lookout for burgeoning categories like productivity and social – they may hold untapped potential for innovation and expansion.
4.Understanding which categories are in vogue guides our product positioning, ensuring resonance with evolving consumer preferences.
5.Armed with insights from top categories, we can steer our strategies intelligently, deploying resources effectively to maintain our edge in the dynamic market terrain.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can indeed lead to a positive business impact by informing strategic decisions, enhancing market positioning, and fostering innovation. For instance, understanding consumer preferences and dominant market trends enables businesses to align their products and services with evolving demands, potentially increasing customer satisfaction and loyalty. Additionally, insights into emerging categories offer opportunities for expansion and diversification, contributing to revenue growth and market competitiveness.

Conversely, insights indicating low market share or declining demand in certain categories may lead to negative growth prospects. For instance, if analysis reveals dwindling consumer interest or heightened competition within specific sectors, businesses operating in those categories may face challenges in maintaining market relevance and sustaining profitability. This could necessitate strategic pivots, investments in product differentiation, or exploration of alternative revenue streams to mitigate the risks associated with negative growth trajectories.







#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***