<a href="https://colab.research.google.com/github/sunandakhatri21-hub/Play-Store-App-Review-Analysis2/blob/main/copy_of_capstone_project2_dataset2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

**Objective**

The aim of this project is to analyze user reviews from the Google Play Store to understand customer sentiment and identify key factors influencing app ratings and popularity. This can help app developers improve user satisfaction, address concerns promptly, and enhance overall app performance.

**Dataset Overview**

The dataset contains the following key variables:

- App – Name of the application reviewed.

- Translated_Review – User review text translated into English.

- Sentiment – Categorical label of review sentiment (Positive, Negative, Neutral).

- Sentiment_Polarity – Numerical score from -1 (most negative) to 1 (most positive).

- Sentiment_Subjectivity – Numerical score from 0 (objective/factual) to 1 (subjective/opinionated).

**Methodology**

Data Cleaning

- Removed missing or duplicate reviews.

- Standardized text data for analysis.

Exploratory Data Analysis (EDA)

- Count of reviews by sentiment category.

- Distribution of polarity and subjectivity scores.

- Most frequently reviewed apps and their sentiment patterns.

Visualization

- Bar charts for sentiment distribution.

- Box/violin plots to compare sentiment polarity across apps.

- Scatter plot for subjectivity vs polarity to spot sentiment trends.

Insights Extraction

- Identified apps with the highest and lowest average sentiment polarity.

- Observed relationship between subjectivity and polarity in reviews.

- Highlighted patterns in neutral vs extreme opinions.

**Key Findings**

- Positive reviews dominate in most popular apps, but certain apps have higher negative sentiment percentages indicating quality or usability issues.

- High subjectivity scores often correlate with both very positive and very negative polarities, suggesting emotional or opinion-heavy feedback.

- Neutral reviews often have mid-range polarity values but low engagement in app ratings.

**Business Implications**

- For Developers: Focus on negative sentiment analysis to identify and fix common user complaints.

- For Marketing Teams: Highlight positive sentiment trends in promotional content.

- For Product Strategy: Use polarity-subjectivity patterns to distinguish between genuine feature requests (objective) and emotional feedback (subjective).

**Conclusion**

The sentiment analysis of Google Play Store reviews offers valuable insights into user perception and satisfaction. By continuously monitoring polarity and subjectivity metrics, developers and businesses can enhance app quality, improve user retention, and maintain a competitive advantage.






# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


For your Google Play Store sentiment analysis project, a problem statement should clearly define:

- What is the challenge or issue?

- Why does it matter?

- What is the scope of your analysis?

Here’s a polished problem statement you can use:

**Problem Statemen**

The Google Play Store hosts millions of mobile applications, and user reviews play a critical role in shaping an app’s reputation, influencing downloads, and guiding development priorities. However, with the large volume of reviews generated daily, it becomes challenging for developers and businesses to manually analyze user feedback to extract meaningful insights.

Without an automated sentiment analysis approach, valuable information about user satisfaction, feature requests, and pain points may go unnoticed. This can lead to missed opportunities for improvement, poor customer engagement, and decreased app ratings.

This project aims to analyze translated user reviews from the Google Play Store dataset to determine sentiment (positive, negative, neutral), measure sentiment polarity and subjectivity, and identify patterns that can help developers and stakeholders make data-driven decisions for enhancing app quality and user experience.

#### **Define Your Business Objective?**

**Business Objectives**
1. Understand User Sentiment

  - Analyze customer reviews to determine whether user feedback is predominantly positive, negative, or neutral.

2. Identify Improvement Areas

   - Pinpoint recurring complaints or negative feedback to guide app enhancement and bug-fixing priorities.

3. Enhance User Experience

   - Use sentiment polarity and subjectivity metrics to better understand user emotions and expectations, leading to improved app features and usability.

4. Support Marketing Strategy

  - Highlight positive sentiment trends for promotional campaigns and brand positioning.

5. Data-Driven Decision Making

  - Provide actionable insights that help developers, product managers, and business stakeholders make informed choices to improve app performance and retention rates.

6. Monitor App Reputation

  - Track changes in sentiment over time to assess the impact of updates, new features, or marketing campaigns.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset
data = pd.read_csv("/content/User Reviews (2).csv")

### Dataset First View

In [None]:
# Dataset First Look
data.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
data.duplicated().sum()


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
data.isnull().sum()

In [None]:
# Visualizing the missing values
data.isnull().sum().plot(kind = 'bar', figsize= (10,5))
plt.title("Missing Value")
plt.ylabel("count of missing values")
plt.show()

### What did you know about your dataset?

Understanding the Google Play Store Review Dataset
This dataset contains 64,295 user reviews for various apps listed on the Google Play Store. Each review has been translated to English and analyzed for its sentiment and tone. The dataset includes 5 columns:

🧾 Column-wise Description:

  -App:
  The name of the app being reviewed. This helps identify which app the review is about.

  -Translated_Review:
  The actual user review, translated into English. This provides qualitative feedback from users.

  -Sentiment:
   The general sentiment of the review — usually Positive, Negative, or Neutral — indicating how the user feels about the app.

  -Sentiment_Polarity:
    A numerical score between -1 and 1, where:

    -1 = strongly negative

    0 = neutral

    1 = strongly positive
This gives more granularity to the "Sentiment" column.

- Sentiment_Subjectivity:
   A score between 0 and 1 showing how subjective the review is.

0 = very objective (factual)

1 = very subjective (opinion-based)

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe(include = 'all')

### Variables Description

- App – Name of the application reviewed.

- Translated_Review – User review text translated into English.

- Sentiment – Overall sentiment of the review (Positive, Negative, Neutral).

- Sentiment_Polarity – Numerical score indicating sentiment strength (-1 to 1).

- Sentiment_Subjectivity – Numerical score indicating subjectivity of the review (0 to 1).

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
data.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# step 1: Drop rows where Translated_Review and Sentiment are missing.
data.dropna(subset = ['Translated_Review', 'Sentiment'], inplace = True)

In [None]:
data.head()

In [None]:
# step 2: fill the missing value in sentiment_polarity and sentiment_subjectivity, with mean value.
data['Sentiment_Polarity'].fillna(data['Sentiment_Polarity'].mean(), inplace = True)
data['Sentiment_Subjectivity'].fillna(data['Sentiment_Subjectivity'].mean(), inplace = True)

In [None]:
data.head(3)

In [None]:
# step 3 : remove spaces or convert into lower case
data['App'] = data['App'].str.strip().str.lower()
data['Translated_Review'] = data['Translated_Review'].str.strip().str.lower()
data['Sentiment'] = data['Sentiment'].str.strip().str.lower()


In [None]:
# step 4 : checking the outliers
# for this plotting a box plot
data.boxplot(column= 'Sentiment_Polarity')
plt.title('Sentiment_Polarity')
plt.show()

In [None]:
data.boxplot(column= 'Sentiment_Subjectivity')
plt.title('Sentiment_Subjectivity')
plt.show()

In [None]:
# step 5 : Filter outliers
# Check for any impossible values

# Calculate Q1 and Q3
Q1 = data['Sentiment_Polarity'].quantile(0.25)
Q3 = data['Sentiment_Polarity'].quantile(0.75)
IQR = Q3-Q1

# Define outliers bounds
Lower_wisker = Q1 - 1.5*IQR
Upper_wisker = Q3 + 1.5*IQR

# Filter out the outliers
data_no_outliers = data[(data['Sentiment_Polarity'] >= Lower_wisker) & (data['Sentiment_Polarity'] <= Upper_wisker)]

# print how many rows were removed

print("Original rows: ", len(data))
print("Rows after removing outliers: ", len(data_no_outliers))
print("Rows removed: ", len(data) - len(data_no_outliers))


### What all manipulations have you done and insights you found?

Data Wrangling:

We performed data wrangling to prepare the dataset for analysis. This included handling missing values in Translated_Review and Sentiment, filling numerical nulls with mean for polarity and subjectivity, removing duplicates, standardizing text, and filtering outliers. After cleaning, the dataset was saved for further visualization and sentiment analysis.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1   Countplot – Frequency of Sentiments (univerient)

In [None]:
# Chart - 1 visualization
sns.countplot(x = 'Sentiment', data = data, palette = 'Set2')
plt.title('Frequency of sentiments')
plt.xlabel('Sentiments')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

We chose a countplot because:

- It is ideal for visualizing categorical data, like Sentiment (Positive, Negative, Neutral).

- It clearly shows the frequency (number of reviews) for each sentiment type.

- It's simple, yet effective for spotting patterns in sentiment distribution across customer reviews.

This chart helps answer:

   “What do users feel most often about the apps?”

##### 2. What is/are the insight(s) found from the chart?

(Assuming the usual pattern in datasets like this — you can adjust based on your chart)

✅ Positive sentiment is the most frequent — indicating overall user satisfaction.

⚠️ Negative sentiment still has a significant presence — suggesting pain points in the user experience.

🟡 Neutral sentiment is the smallest group — most users lean toward a clear opinion.

So, we see a dominant positive experience, but also noticeable dissatisfaction from a portion of users.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, absolutely. Here's how:

- Apps with high positive sentiment can be promoted more confidently in marketing.

- Apps or reviews with negative sentiment can be used to identify and fix product issues, improve UI/UX, or enhance customer support.

- Helps prioritize product updates based on what’s frustrating users.

Example: If many negative reviews mention crashes or bugs, fixing them directly improves user satisfaction and retention.

Are there any insights that lead to negative growth? Justify with specific reasons.
Yes — if negative sentiment is high or rising, it could indicate risk of negative growth, such as:

❌ Churn: Unhappy users are more likely to uninstall or stop using the app.

💬 Bad reputation: Poor sentiment spreads through reviews and reduces new user installs.

📉 Lower ratings: Negative reviews lower overall app store ratings, affecting visibility and trust.

Justification: If the company doesn’t act on user feedback reflected in negative sentiment, it can lead to bad reviews, low ratings, poor user experience, and revenue loss.

#### Chart - 2  barchart to finding top 10 apps by No. of reviews  (Univerient)

In [None]:
# Chart - 2 visualization code
top_apps = data['App'].value_counts().head(10)
top_apps.plot(kind='bar', figsize=(10,6), color= 'blue')
plt.title('Top 10 Apps by Number of Reviews')
plt.xlabel('App')
plt.ylabel('Number of Reviews')

##### 1. Why did you pick the specific chart?

We used a bar chart because:

- It's ideal for comparing the number of reviews across different app names.

- It clearly shows which apps are getting the most user engagement via reviews.

- It’s simple and direct — great for identifying top performers at a glance.

This helps answer:

   “Which apps are getting the most attention from users?”

##### 2. What is/are the insight(s) found from the chart?

(Assuming what’s typically seen — adjust based on your actual chart)

- A small number of apps receive significantly more reviews than the rest.

- These top apps are likely more popular, widely used, or have more active user bases.

- Review count may reflect user engagement, both positive and negative.

For example, if “App A” has 5x more reviews than others, it could be a flagship app or experiencing rapid growth.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Here’s how:

📈 You can prioritize popular apps for investment, promotion, or premium features.

🔍 Analyze why top apps are successful (good UX? marketing?) and replicate that across other apps.

📢 Apps with high visibility (lots of reviews) can be promoted more — they already have strong user traction.

Strategic Action: Focus retention campaigns or ad spend on the top-performing apps.

🚨 Are there any insights that lead to negative growth? Justify with specific reason.
Potentially, yes. Here's how:

- If one of the top-reviewed apps has a large number of negative reviews, it could indicate a serious issue that users are loudly reporting.

- If a business relies too heavily on just a few popular apps, it risks revenue concentration — other apps may be underperforming or neglected.

- If the top apps receive reviews but no improvement in ratings or sentiment, the visibility might be driving dissatisfaction instead of growth.

Justification: High review volume without user satisfaction (seen via sentiment) might lead to bad PR, lower retention, and long-term brand damage.

#### Chart - 3  Histogram for Sentiment_polarity (Univerient)

In [None]:
# Chart - 3 visualization code
sns.histplot(data['Sentiment_Polarity'], bins = 10, color = 'green')
plt.title('Distribution of sentiment polarity')
plt.xlabel('Sentiment polarity')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

We used a histogram with KDE because:

- It’s ideal for visualizing the distribution of a continuous numeric variable (Sentiment_Polarity).

- It helps us understand how user opinions are spread — are they more positive, negative, or neutral?

- The KDE curve shows the smoothed shape of the distribution — providing deeper insight than a basic histogram.

This chart answers:

“How is user sentiment polarity distributed across all reviews?

##### 2. What is/are the insight(s) found from the chart?

(Interpret based on typical output — adjust as per your actual plot)

- The polarity distribution is skewed toward positive values (more reviews around 0.1 to 1.0).

- There are fewer strongly negative reviews (polarity closer to -1.0).

- Many reviews are centered around neutral to mildly positive sentiment (close to 0.0 to 0.4).

- The peak in the KDE line shows where most review sentiments lie (e.g., around 0.2 or 0.3).

This means: users generally leave positive or slightly positive feedback on average.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, significantly. Here’s how:

- You learn that users are generally happy or satisfied, which is a positive brand indicator.

- It helps in benchmarking user satisfaction over time — track if sentiment improves after new updates.

- You can confidently promote your product with messaging like:

“Most users feel positively about our app!”

- Internally, you can use this distribution to train sentiment prediction models, validate review quality, or flag abnormal spikes in negativity.

🚨 Are there any insights that lead to negative growth? Justify with specific reason.
Yes — but conditionally:

- If there’s a significant spike in negative polarity (e.g., -0.8 to -1.0), that’s a warning signal.

- It indicates that some users are extremely dissatisfied, which can lead to:

     - Uninstalls

     - Bad word-of-mouth

     - Decline in ratings

Justification: If ignored, these users could impact your app’s overall reputation. It’s vital to analyze those negative reviews for recurring complaints.

#### Chart - 4 Box plot on sentiment subjectivity (univerient)

In [None]:
# Chart - 4 visualization code
sns.boxplot(data['Sentiment_Subjectivity'], color = 'green')
plt.title('Distribution of sentiment subjectivity')
plt.xlabel('Sentiment subjectivity')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

We used a box plot because:

- It shows the spread and distribution of Sentiment_Subjectivity, a continuous numeric variable.

- It highlights key statistics: minimum, 1st quartile, median, 3rd quartile, and maximum.

- Most importantly, it clearly identifies outliers, which are values far from the rest of the data.

This chart helps answer:
“Are there reviews that are highly objective or extremely subjective compared to the norm?”

##### 2. What is/are the insight(s) found from the chart?

(You can adjust based on your actual plot, but typically for Sentiment_Subjectivity):

- The median (middle line in the box) may fall around 0.5, meaning many reviews are moderately subjective.

- There's often a wide interquartile range (IQR) — reviewers express varied levels of opinion vs fact.

- Any dots outside the whiskers are outliers, showing extremely subjective reviews (values close to 1).

Interpretation: Most users give balanced opinions, but some reviews are very emotionally expressed (very subjective).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, here’s how:

- Understanding subjectivity helps evaluate review quality — highly subjective reviews may be less informative or more emotional.

- Platforms can flag overly subjective reviews for moderation, improving overall review reliability.

- Insights on subjectivity could improve sentiment models, helping businesses respond to feedback better.

- Business can focus on objective feedback for making product improvements.

🚨 Are there any insights that lead to negative growth? Justify with specific reason.
Yes — possible concern:

- If the plot shows many reviews are extremely subjective (close to 1.0), it may mean:

   - Users are leaving emotion-driven feedback, not factual or constructive.

  - It may lead to bias in review analysis, affecting decision-making.

Justification: Business actions based on emotionally charged reviews (without substance) may mislead teams, hurting product direction or user trust.

#### Chart - 5 Histogram on Sentiment_subjectivity (univerient)

In [None]:
# Chart - 5 visualization code
sns.histplot(data['Sentiment_Subjectivity'], bins = 10, color = 'orange')
plt.title('Distribution of sentiment subjectivity')
plt.xlabel('sentiment subjectivity')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

This histogram with a KDE curve was chosen because:

- It provides a clear view of how the Sentiment_Subjectivity values are distributed across all the reviews.

- Since subjectivity is a continuous variable ranging from 0 to 1, a histogram is ideal for understanding the frequency and spread of these values.

- The added KDE (Kernel Density Estimation) curve helps visualize the underlying distribution pattern more smoothly than just using bars.



##### 2. What is/are the insight(s) found from the chart?

- Most of the reviews have moderate to high subjectivity values (usually between 0.4 to 0.8).

- There are fewer reviews with very low subjectivity (i.e., close to 0), meaning very few users write fully objective or fact-based reviews.

- This implies that the majority of users share personal opinions, feelings, or experiences in their reviews rather than neutral descriptions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Yes, the insights can help in several ways:

- Understanding that reviews are mostly subjective helps the business recognize that customer perception and emotion strongly influence the app ratings and feedback.

- This allows product teams and support teams to focus on emotional tone and user sentiment, improving the way they respond to feedback.

- Marketing and UX teams can use this to adjust communication strategies—ensuring they address user feelings and expectations rather than just functional details.

⚠️ Potentially Yes, there are some cautionary insights:

- If users are highly subjective, it means their feedback is more emotion-driven than fact-driven, which can be risky.

- A single negative experience (even minor) could lead to strongly negative reviews, impacting overall app rating.

- New users might form opinions based on emotional bias in reviews, possibly reducing trust in the app.

Justification:

For example, if one user had a slow loading experience and subjectively feels the app is “useless,” that sentiment—although not fully factual—could influence hundreds of potential users reading the review.

#### Chart - 6 Violin plot: Polarity by Sentiment (Biverient)

In [None]:
# Chart - 6 visualization code
sns.violinplot(x = 'Sentiment', y = 'Sentiment_Polarity', palette = 'cool', data = data)
plt.title('polarity by sentiment')
plt.xlabel('sentiment')
plt.ylabel('sentiment polarity')
plt.show()

##### 1. Why did you pick the specific chart?

A violin plot is ideal when comparing the distribution of a numerical variable (Sentiment_Polarity) across different categories (Sentiment).

- Unlike a boxplot, it also shows the density of values—how spread or concentrated they are.

- Here, we want to understand how sentiment polarity behaves for Positive, Neutral, and Negative sentiments.

So, the violin plot gives both:

- Shape of the distribution (like a KDE plot),

- Statistical summary (like a boxplot inside the violin).

##### 2. What is/are the insight(s) found from the chart?

- Positive sentiment reviews have polarity values mostly concentrated between 0.1 to 0.6, with some even close to 1.0. This means they express a strong or moderate positive opinion.

- Negative sentiment reviews have polarity values mostly between -1.0 to -0.1, indicating strong to mild negative tone.

- Neutral sentiment reviews are flat at 0, indicating no polarity (as expected).

- The width of each violin shows how common a certain polarity range is within each sentiment type.

- Positive reviews have a wider and taller violin, suggesting a greater variety of positive opinions.

- Negative reviews are also varied but slightly more concentrated around -0.4 to -0.1.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive Business Impact:
- This plot validates the sentiment classification — polarity values align correctly with their sentiment labels.

- Product and customer teams can understand:

   - What range of polarity corresponds to each sentiment label.

   - If the NLP model or sentiment detection tool is working as expected.

- It helps in targeted response strategies:

   - Highly negative polarity (e.g., ≤ -0.6) → critical feedback → prioritize response.

  - Slightly negative → mild dissatisfaction → can be converted into loyalty.

  - Strong positive polarity → identify happy customers for testimonials or marketing.

⚠️ Potential Negative Growth Insight:
- The density of negative polarity around -0.2 to -0.5 shows many reviews are not very harsh, but still dissatisfied.

- If not addressed, these moderately negative sentiments can pile up and affect overall ratings over time.

- Neutral reviews are few and flat, possibly indicating that users rarely leave balanced or objective feedback — which means ratings might be highly emotional and need careful handling.

#### Chart - 7  Box Plot : Sentiment vs Subjectivity (Biverient)

In [None]:
# Chart - 7 visualization code

sns.scatterplot(x='Sentiment_Subjectivity', y='Sentiment_Polarity', hue='Sentiment', data=data, alpha=0.6)

plt.title('Subjectivity VS Polarity')
plt.xlabel('Subjectivity')
plt.ylabel('Polarity')
plt.grid(True)
plt.show()



##### 1. Why did you pick the specific chart?

A scatter plot is ideal for visualizing the relationship between two numerical variables:

   - Sentiment_Subjectivity (X-axis)

   - Sentiment_Polarity (Y-axis)

- We also added color encoding (hue='Sentiment') to show the sentiment class (Positive, Neutral, Negative), making this a multivariate plot.

This chart helps answer:
➤ “Do subjective reviews tend to be more positive or negative?”
➤ “Where do neutral reviews fall on the subjectivity-polarity scale?”



##### 2. What is/are the insight(s) found from the chart?

Positive Reviews (Blue):

- Concentrated in the upper half (polarity > 0).

- Most are subjective (>0.4), and many reach strong positive polarity (~1.0).

- Shows users expressing positive emotions in personal/opinionated ways.

Negative Reviews (Green):

- Spread in the lower half (polarity < 0).

- Also highly subjective — clustered from 0.4 to 1.0.

- Indicates users are emotionally expressive when frustrated or dissatisfied.

Neutral Reviews (Orange):

- Almost all points lie at polarity = 0 (Y=0 line), regardless of subjectivity.

- Suggests neutral reviews have no emotional tone, even if subjective (e.g., "App is okay").

There’s no strong linear correlation between subjectivity and polarity, but:

- Strong polarity (±) is usually paired with high subjectivity.

- Objective reviews are rare and mostly neutral.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

💡 Positive Business Impact:
- Helps identify high-emotion customers — both happy and unhappy:

- Positive + high subjectivity → potential brand advocates or reviewers

- Negative + high subjectivity → need for quick support or retention efforts

This enables:

- Prioritized response strategy

- Better understanding of user tone vs. sentiment

- More accurate sentiment-driven feedback analysis

⚠️ Insights That Could Signal Negative Growth:
Concentration of highly subjective, negative reviews:

- These can amplify dissatisfaction and influence others’ perception.

- Even small bugs or delays could cause overly harsh feedback.

- If unaddressed, they could lead to loss of new users or bad app ratings.

Justification:

A subjective review like “Worst app ever! Never using it again!” with polarity near -1.0 will strongly impact new users' perception — even if the issue is fixable.

#### Chart - 8  Bar plot: Sentiment_Polarity VS App (Biverient)

In [None]:
top_apps = data['App'].value_counts().head(10).index
data_top = data[data['App'].isin(top_apps)]

# Group by app and calculate average polarity
avg_polarity = data_top.groupby('App')['Sentiment_Polarity'].mean().reset_index()

# Plot
sns.barplot(x='App', y='Sentiment_Polarity', data=avg_polarity)
plt.xticks(rotation=45)
plt.title('Average Polarity for Top Apps')
plt.xlabel('App')
plt.ylabel('Average Polarity')
plt.show()


##### 1. Why did you pick the specific chart?

The bar chart was chosen because:

- It clearly compares numerical values (average sentiment polarity) across categorical variables (app names).

- Bar charts are effective in showing differences in averages between groups.

- This chart lets us quickly identify which apps have the most positive or neutral/negative feedback on average.

##### 2. What is/are the insight(s) found from the chart?

Here are the insights from the chart:

🔹 Positive Insights:
“10 Best Foods for You” has the highest average polarity, meaning reviews are very positive.

Apps like "Calorie Counter - Macros", "Candy Crush Saga", and "Duolingo" also show a generally positive reception.

🔹 Neutral/Negative Insights:
“8 Ball Pool” and “Angry Birds Classic” show very low or slightly negative polarity, indicating users gave neutral or mildly negative reviews.

Apps like “Bowmasters” and “Garena Free Fire” have relatively low polarity, suggesting mixed or average feedback.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

🔸 For Positive-Polarity Apps:
- Capitalize on Strengths: Apps like "10 Best Foods for You" can use this insight to market their positive user experience, potentially increasing installs.

- Identify Best Practices: Developers of other apps can study what these apps are doing right—UI, functionality, engagement—to replicate their success.

🔸 For Low-Polarity Apps:
- User Experience Improvements: Apps like “8 Ball Pool” or “Angry Birds Classic” can be analyzed to understand why sentiment is low—bugs, outdated design, ads, or user frustration.

- Feature Updates: Developers may introduce improvements based on feedback, which could boost ratings and retention.

**Are there any insights that lead to negative growth? Justify with specific reason.**

Yes — the apps with lower or near-zero polarity (like “8 Ball Pool” or “Angry Birds Classic”) may face:

🔻 Risks:
- Loss of Active Users: Poor or neutral sentiment may indicate user dissatisfaction, which could lead to uninstalls or reduced engagement.

- Negative Reviews Visibility: Bad reviews can hurt app store visibility, discouraging new users from downloading.

- Revenue Decline: Less engagement can also reduce ad impressions or in-app purchases.

✅ Justification:
Apps with a Sentiment_Polarity near 0 or negative mean users are giving neutral or poor emotional feedback. This is a clear signal to re-evaluate the app experience.



#### Chart - 9  Pie Chart : On sentiment (Univerient)

In [None]:
# Chart - 9 visualization code
data['Sentiment'].value_counts().plot.pie(autopct='%1.1f%%', startangle=90, colors=['lightgreen', 'lightblue', 'lightcoral'])
plt.title('Sentiment Proportion')
plt.ylabel('')
plt.show()


##### 1. Why did you pick the specific chart?

- The pie chart was chosen because it clearly shows the proportion of each sentiment type (positive, negative, neutral) in a visual and easy-to-interpret way.

- It’s an effective choice when the goal is to compare parts to the whole.

- The percentages make it simple to quickly see which sentiment dominates.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
- Positive sentiment dominates with 64.1% of reviews being positive.

- Negative sentiment accounts for 22.1%, showing that while most feedback is good, there is still a significant amount of dissatisfaction.

- Neutral sentiment makes up 13.8%, meaning a portion of users are indifferent or undecided.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Yes, definitely.

- The high positive sentiment is a strong marketing point — companies can highlight this in promotions.

- It indicates a generally good customer experience, which can attract new users and improve retention.

- The presence of 22.1% negative feedback also provides a clear improvement opportunity — fixing common complaints can further boost the positive share.

**Are there any insights that lead to negative growth? Justify with specific reason.**

⚠ Yes.

- The negative sentiment (22.1%) could hinder growth if not addressed.

- In app stores, a high proportion of negative reviews may reduce ratings, which can lower visibility in search results and discourage downloads.

- Neutral reviews (13.8%) also represent missed opportunities — converting them to positive through better features, usability, or customer support can help improve overall perception.

#### Chart - 10 Bar plot : Top Reviewd Apps (Bivarient)

In [None]:
top_reviewed = data['App'].value_counts().head(10)
sns.barplot(x=top_reviewed.index, y=top_reviewed.values)
plt.xticks(rotation=45)
plt.title('Top 10 Most Reviewed Apps')
plt.xlabel('App')
plt.ylabel('Number of Reviews')
plt.show()


##### 1. Why did you pick the specific chart?

- A bar chart is ideal for comparing the number of reviews across different apps.

- Since App is a categorical variable and review count is numerical, this format provides a clear, easy-to-read comparison.

- It helps identify the most engaged or popular apps based on review volume.



##### 2. What is/are the insight(s) found from the chart?

🔹 Popularity Ranking by Review Volume:
- “Bowmasters” stands out with the highest number of reviews, indicating high user engagement or install base.

- “Helix Jump” and “Angry Birds Classic” follow closely behind—these are well-known games likely to attract many users.

- Apps like “Calorie Counter - Macros” and “10 Best Foods for You” are among the lowest in the top 10—though still strong performers, they may cater to niche audiences.

🔹 Gaming Apps Dominate:
- A majority of apps in the top 10 are games (e.g., Bowmasters, Helix Jump, Angry Birds, Free Fire, 8 Ball Pool), highlighting that gaming apps tend to receive more user interaction and feedback.

🔹 Health and Utility Apps Are Present:
- Apps like “Calorie Counter - MyFitnessPal” and “Duolingo” also rank highly, showing that not just entertainment but functional apps with daily utility also generate user engagement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

🔸 Positive Opportunities:
- High review count = high engagement. Apps with more reviews are likely reaching wider audiences. These insights can help:

- Prioritize investment in these top apps for feature expansion, monetization strategies, or premium features.

- Use their popularity to cross-promote other apps in the developer portfolio.

- Benchmark successful apps to understand what drives user interaction (e.g., gamification, notifications, visual design).

- Review mining for top apps can reveal user expectations and drive future design improvements.

🔸 Improvement Potential:
- Lower-end apps in the top 10, like “10 Best Foods for You”, may still show great sentiment but have fewer reviews. Encouraging users to review could help with visibility.

- Negative feedback in high-volume apps (e.g., Bowmasters) could be amplified due to larger user base—requires continuous review monitoring.  

**Yes, potential risks exist even among highly reviewed apps:**

🔻 Risk Factors:
- High number of reviews ≠ high satisfaction. If apps like “Bowmasters” or “Helix Jump” have high review volume but low sentiment polarity (check previous chart), it could signal widespread user frustration.

- Visibility magnifies negativity. Apps with many users and reviews also have higher chances of receiving public criticism. If not addressed, these can harm brand reputation and reduce new installs.

- Gaming apps churn faster—users might leave if updates aren’t frequent or monetization is too aggressive (e.g., too many ads or paywalls).

 #### Chart - 11 Histogram : Sentiment_Polarity (Univerient)

In [None]:
sns.histplot(data['Sentiment_Polarity'], kde=True, bins=30, color='skyblue')
plt.title('Distribution of Sentiment Polarity')
plt.xlabel('Polarity')
plt.ylabel('Frequency')
plt.show()


##### 1. Why did you pick the specific chart?

- The histogram with KDE plot is ideal for visualizing the distribution of a single continuous variable, in this case, Sentiment_Polarity.

- This is a univariate plot, chosen to:

    - Understand the spread and shape of polarity values across all reviews.

    - Identify if reviews are mostly positive, negative, or neutral.

    - Detect skewness, peaks, and frequency patterns in sentiment strength.

##### 2. What is/are the insight(s) found from the chart?

🔹 Centered Near Zero:

- There’s a sharp peak around 0, which means many reviews have neutral sentiment polarity (neither strongly negative nor positive).

- This could mean a lot of users gave short or vague reviews, or mixed feedback.

🔹 Right Skewed Toward Positive:

- The distribution is right-skewed: the frequency of positive sentiment values (0 to 1.0) is significantly higher than extreme negative values.

- This suggests the dataset is generally tilted toward positive sentiment, supporting earlier pie chart findings (~64% positive).

🔹 Fewer Strongly Negative Reviews:

- There are fewer reviews with polarity near -1, indicating that very negative reviews are less common.

- This might indicate users are either relatively satisfied or hesitant to leave very harsh feedback.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Yes — here’s how:
- Product confidence: A large number of reviews near or above 0 shows that most users had neutral to positive experiences, which reinforces user trust.

- Data-driven marketing: This insight validates that the app experience is well-received — useful for app store promotion or user testimonials.

- Target neutral users: The large spike at 0 suggests there’s a big group of “meh” users. These can be targeted with nudges (onboarding tips, reminders, loyalty offers) to shift them into the positive segment.

Yes, potential risks include:

🔻 High Neutrality:
- A strong peak at polarity = 0 suggests many users are not emotionally engaged.

This can indicate:

- Reviews lacking enthusiasm

- Users feeling indifferent

- If not addressed, it could translate into low retention or passive churn, since emotionally neutral users are less likely to advocate or return.

🔻 Missed feedback from negatives:

- Though there are fewer strongly negative reviews, they still exist. If left unaddressed, these can affect app store ratings or user retention.

#### Chart - 12  Count plot : Sentiment VS App (Biverient)

In [None]:
top_apps = data['App'].value_counts().head(5).index
filtered = data[data['App'].isin(top_apps)]
sentiment_counts = filtered.groupby(['App', 'Sentiment']).size().unstack()

sentiment_counts.plot(kind='bar', stacked=False)
plt.title('Sentiment Count per Top 5 Apps')
plt.xlabel('App')
plt.ylabel('Number of Reviews')
plt.xticks(rotation=45)
plt.legend(title='Sentiment')
plt.show()




##### 1. Why did you pick the specific chart?

We used a grouped bar chart because:

- It allows clear comparison of multiple categories (positive, neutral, negative sentiments) across a group (top 5 apps).

- It’s intuitive and effective for visualizing discrete categorical data side by side.

- Grouped bars help highlight the variation in sentiment for each app, which is valuable for product or marketing decisions.

##### 2. What is/are the insight(s) found from the chart?

Several actionable insights are derived:

✅ Positive sentiment dominates for all top apps — a healthy sign of user satisfaction.

✅ Helix Jump and Duolingo have the most positive reviews, indicating good user experience and satisfaction.

⚠️ Angry Birds Classic has a high negative sentiment count — which is a red flag compared to others.

🚫 Neutral sentiment is minimal across all apps, meaning users tend to leave clearly polarized feedback.

📈 Apps like Calorie Counter and Bowmasters have strong positive-to-negative sentiment ratios, suggesting they are well-received.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, absolutely. Here's how:

🎯 Product teams can focus on maintaining features that drive positive reviews in apps like Helix Jump and Duolingo.

🛠️ Customer support or QA teams can investigate and fix issues in apps with higher negative sentiment (e.g., Angry Birds Classic).

📣 Marketing teams can highlight top-rated apps to increase user trust and app downloads.

📊 Sentiment insights help allocate resources for improvement and promotion effectively.

Yes, one key negative insight is:

⚠️ Angry Birds Classic has a significantly high count of negative reviews despite being among the top reviewed apps.

- Justification: This suggests that while the app is popular, many users are dissatisfied, which can harm long-term brand value and user retention.

- It may also reduce ratings and discourage new users from installing it, impacting business growth.



#### Chart - 13  WordCloud : Positive review (Univerient)

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt

positive_reviews = ' '.join(data[data['Sentiment'] == 'positive']['Translated_Review'].dropna())

wordcloud = WordCloud(width=800, height=400, background_color='white').generate(positive_reviews)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud of Positive Reviews')
plt.show()


##### 1. Why did you pick the specific chart?

- A word cloud is ideal for visualizing text data—it helps quickly identify the most frequently used words in positive reviews.

- It provides a visual summary of customer sentiment without reading individual reviews.

- It’s an exploratory tool that’s useful early in analysis to spot recurring themes, feature mentions, or user emotions.

##### 2. What is/are the insight(s) found from the chart?

From the word cloud:

- Most frequent words: "great," "good," "love," "game," "time," "make," "work," "need".

- Users strongly appreciate the game aspect ("game," "play") and performance ("work," "fast," "easy").

- Emotionally positive words like "love," "great," "awesome," "thank" show genuine customer satisfaction.

- Some users also use suggestive terms like "need," "update," "fix," "add," "better", even in positive reviews—implying constructive feedback.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Here’s how:

- The frequent use of "love," "great," "fun" indicates strong user engagement and satisfaction, which is key to retention and organic growth.

- Developers can continue investing in performance and game experience, as these are well appreciated.

- Positive themes can be used in marketing campaigns—e.g., “Loved by thousands of users”, “Great performance,” etc.

- Product managers can take note of repeated user wishes, like “update,” “need,” “fix,” and prioritize future improvements.

**Are there any insights that lead to negative growth? Justify with specific reason.**

Even in a positive word cloud, certain repeated suggestive words raise flags:

- Words like "need," "update," "fix," "issue" indicate users have expectations for improvements.

- These aren’t complaints yet, but unattended needs could become negative sentiments over time.

- If users keep requesting the same features or fixes and don’t see progress, it may lead to frustration or churn.

So, while there’s no immediate negative growth signal, ignoring these hints may risk future dissatisfaction.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Select only numeric columns
numeric_data = data.select_dtypes(include=['int64', 'float64'])

# Compute the correlation matrix
corr_matrix = numeric_data.corr()

# Plot the heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap of Numeric Features')
plt.show()

##### 1. Why did you pick the specific chart?

The correlation heatmap is ideal for:

- Identifying linear relationships between numeric variables.

- Detecting strong positive or negative correlations that might affect business metrics (like rating, installs, or reviews).

- Reducing redundancy in features during feature selection for modeling.

##### 2. What is/are the insight(s) found from the chart?

Here are typical insights you may observe:

**High positive correlation (values close to +1):**

- Reviews vs. Installs: Users tend to leave reviews when the app is downloaded more.

**High negative correlation (values close to -1):**

- Price vs. Installs: Paid apps usually have fewer installs.

**Moderate correlation:**

- Sentiment_Polarity vs. Rating: Indicates if positive reviews align with high ratings.

- Subjectivity vs. Polarity: May show how emotionally charged reviews affect tone.

(These will vary based on what’s in your data.)


#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Select the numeric columns
numeric_cols = ['Sentiment_Polarity', 'Sentiment_Subjectivity']
subset = data[numeric_cols].dropna()

# Plot the pair plot
sns.pairplot(subset, corner=True, diag_kind='kde', plot_kws={'alpha': 0.5})
plt.suptitle('Pair Plot of Sentiment Metrics', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

The pair plot is chosen because it visually displays pairwise relationships between numerical variables in a dataset. Since the dataset includes numerical sentiment metrics (Sentiment_Polarity, Sentiment_Subjectivity), this plot helps explore correlations, patterns, and distributions all at once. It's especially helpful to check:

- Clustering or trends,

- Strength of relationships, and

- Potential anomalies.



##### 2. What is/are the insight(s) found from the chart?

- Most Sentiment_Polarity values cluster around 0, indicating many neutral or mixed reviews.

- Sentiment_Subjectivity tends to concentrate near 0.5 and 1.0, suggesting reviews are either clearly subjective (personal opinion) or moderately subjective.

- No strong linear relationship is visible between polarity and subjectivity — the plot appears scattered.

- The diagonal KDE plots show that:

   - Polarity is skewed toward the positive side.

  - Subjectivity has two peaks, possibly indicating two common styles of reviews (factual vs opinionated).

In [None]:
data.columns

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

 1. Enhance Customer Experience Using Sentiment Analysis
Findings from data:

- Your data contains sentiment labels (Positive, Neutral, Negative) and sentiment polarity/subjectivity scores.

- Most users express strong positive or negative opinions.

Suggestions:

- Actively monitor negative reviews (using Sentiment_Numeric = -1 or low polarity scores).

- Set up alerts for dips in sentiment polarity, especially after updates.

- Add feedback forms in the app to capture issues before users go to the Play Store.

2. Improve App Features Based on Word Cloud Insights
Findings from Word Cloud:

- Common positive keywords: “great”, “love”, “good”, “easy”, “game”, “need”, “update”

Suggestions:

- Highlight and retain these appreciated features.

- Build marketing campaigns around these strengths (e.g., "Enjoy our easy-to-use and fun app experience").

- Prioritize features that users repeatedly mention positively (like update, work, free).

3. Use Subjectivity to Understand Opinion Intensity
From Pair Plot and Heatmap:

- Some users give highly subjective opinions (close to 1.0), while others are more objective.

- No strong linear correlation between polarity and subjectivity — both provide unique insights.

Suggestions:

- Use high subjectivity reviews for user stories/testimonials.

- Treat objective but negative reviews as serious product issues.

 4. Targeted Improvements for Specific Apps
Variable 'App':

- You can group reviews by app and calculate average sentiment polarity per app.

Suggestions:

- Identify underperforming apps (with low average polarity).

- Prioritize updates, bug fixes, and feature enhancements for these apps.

- For top-performing apps, maintain feature consistency and stability.

5. Increase Ratings Through Proactive User Engagement
Suggestions:

- Prompt users with positive polarity to rate the app after they leave a good review.

- Use email or in-app messages to re-engage users with neutral/negative sentiment and offer help or incentives.

| **Goal**                  | **Actionable Recommendation**                                      |
| ------------------------- | ------------------------------------------------------------------ |
| Improve User Satisfaction | Address negative reviews; fix app-specific bugs                    |
| Increase Engagement       | Promote top-rated features, build on what users love               |
| Enhance Ratings           | Encourage ratings from users with high polarity/positive sentiment |
| Reduce Churn              | Use feedback loops to respond quickly to negative sentiment        |
| Better App Prioritization | Use average sentiment by 'App' to decide which apps need attention |

# **Conclusion**

This project provided a detailed analysis of user reviews from the Google Play Store, focusing on key sentiment metrics to understand user satisfaction and identify areas for improvement.

🔍 Key Takeaways:
- Sentiment Distribution:

    - Most reviews are positive, indicating general user satisfaction.

    - A significant number of neutral and negative reviews highlight opportunities for improvement.

- Sentiment Polarity and Subjectivity:

    - Sentiment Polarity ranged from -1 (very negative) to +1 (very positive), with a spike around 0, showing mixed emotions.

    - Subjectivity Scores revealed that many reviews are personal opinions rather than objective feedback, which helps in understanding user perception.

- Word Cloud Insights:

    - Common positive keywords such as “great”, “good”, “love”, “easy”, and “update” emphasize what users value most.

    - These insights can guide feature prioritization and marketing messaging.

- Correlation and Pair Plots:

    - No strong correlation between polarity and subjectivity, showing that both metrics offer distinct insights.

    - The pair plot suggests a broad distribution of review types, emphasizing the need for customized responses.

- App-Level Sentiment Tracking:

    - By analyzing sentiment per app, stakeholders can pinpoint underperforming apps and allocate resources efficiently.

The analysis provides a strong foundation for data-driven decision-making in app development, customer support, and marketing strategies. By leveraging sentiment analysis, the business can:

- Enhance customer satisfaction,

- Reduce churn through proactive support,

- Improve app store ratings,

- And ultimately drive positive business impact.

This project not only surfaces valuable user insights but also lays the groundwork for building a feedback loop between users and developers, ensuring continuous improvement and stronger user engagement.




### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***