<a href="https://colab.research.google.com/github/shahbazrehan/EDA--Play-Store-App-Review-Analysis-Project.ipynb/blob/main/EDA_Play_Store_App_Review_Analysis_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA-Play Store App Review Analysis Project
##### **Contribution**    - Individual
##### **Name -** Shahbaz Rehan Farooqui


# **Project Summary -**


The Play Store App Review Analysis project aims to utilize data analytics and natural language processing techniques to gain valuable insights from user reviews of various mobile applications available on the Google Play Store. The objective is to understand user sentiments, identify common issues, and provide actionable recommendations to improve app quality and user satisfaction.

Methodology:
The project involves a multi-step approach to gather, preprocess, analyze, and visualize the data from Play Store app reviews.

Data Collection:
The first step is to scrape app reviews from the Google Play Store using APIs or web scraping techniques. This involves retrieving reviews, ratings, dates, and other relevant information for selected apps.

Data Preprocessing:
The collected data undergoes preprocessing to clean and prepare it for analysis. This includes removing duplicates, handling missing values, and text normalization techniques like removing special characters, stop words, and stemming or lemmatization.

Sentiment Analysis:
Sentiment analysis is performed on the preprocessed text to classify the reviews into positive, negative, or neutral sentiments. This analysis helps in understanding overall user satisfaction and identifying areas for improvement.

Topic Modeling:
Topic modeling techniques like Latent Dirichlet Allocation (LDA) or Non-negative Matrix Factorization (NMF) are applied to identify common themes or topics within the reviews. This helps in categorizing user feedback into distinct areas of concern.

Keyword Extraction:
Relevant keywords and phrases are extracted to understand the most frequently mentioned aspects of the app. These keywords assist in identifying specific features or issues that users highlight the most.

Visualization:
Visualizations such as word clouds, bar charts, and heatmaps are created to present the analysis results in a clear and intuitive manner. These visualizations aid in identifying patterns and trends in user reviews.


# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The digital landscape has witnessed an exponential growth in the number of mobile applications available on platforms like the Google Play Store. However, with this abundance of apps, users face challenges in making informed decisions due to the sheer volume of options. App developers, on the other hand, struggle to understand and respond to user feedback effectively.

The problem lies in the need for a systematic and data-driven approach to analyze the vast repository of user reviews available on the Play Store. Current methodologies often lack efficient techniques to extract meaningful insights from the abundance of unstructured textual data in these reviews. Without a comprehensive understanding of user sentiments, common issues, and areas of improvement, app developers may miss critical opportunities to enhance their applications.

The lack of a standardized approach to review analysis hinders developers from identifying recurring problems, features that users appreciate, and overall user satisfaction trends. This gap in understanding impedes developers in making data-driven decisions to optimize their applications for better user experiences.

Hence, there is a critical need for a well-structured and automated system that can collect, process, and analyze Play Store app reviews to derive actionable insights. Such insights would empower developers to enhance their applications, address user concerns, and tailor their products to meet user expectations effectively. By bridging this gap, we aim to contribute to a more user-centric app development process, fostering improved user satisfaction and fostering a positive ecosystem for both developers and users in the mobile app space

#### **Define Your Business Objective?**

A business objective is a specific, measurable, achievable, relevant, and time-bound (SMART) goal or target set by a business or organization to achieve a desired outcome or result that aligns with its mission, vision, and strategic priorities. These objectives are crucial in guiding decision-making, prioritizing actions, and measuring the success and progress of the business.

In simpler terms, a business objective outlines what a company wants to achieve within a defined period and provides a clear direction for the organization to work towards a common goal. These objectives are designed to drive growth, profitability, sustainability, customer satisfaction, market share, efficiency, innovation, or any other important aspect that contributes to the success of the business.







# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd

# visualization tools
import seaborn as sns
import matplotlib.pyplot as plt
from wordcloud import WordCloud


### Dataset Loading

In [None]:
data=pd.read_csv('/content/googleplaystore.csv.zip')

### Dataset First View

In [None]:
data.dtypes

### Dataset Rows & Columns count

In [None]:
data.shape
print("Number of Rows",data.shape[0])
print("Number of Columns",data.shape[1])

### Dataset Information

In [None]:
data.info()

#### Duplicate Values

In [None]:
duplicate_values = data.duplicated().sum()
print(f"\nNumber of duplicate values: {duplicate_values}")

#### Missing Values/Null Values

In [None]:
data.isnull().sum()

In [None]:
# Check for missing values in the DataFrame
missing_values = data.isna()

print("\nMissing Values:")
print(missing_values)




### What did you know about your dataset?

As of now as per my analyze there are 10841 rows and 13 columns. There are 10841 Apps on google play store with same amount of categories and 9367 reviews on google play store. The size of this data set is 1.1+MB . There are 483 duplicate values also.  

## ***2. Understanding Your Variables***

In [None]:
data.columns

In [None]:
data.describe(include='all')

### Variables Description

As you can see in describe variable columns statistic of numerical column as well as categorical column. In category column there are 10841 entries and 34 unique entries with frequency of top value. In categorical column there are std deviation , min, percentile values and max.  As you can see there are rating values as per percentile values as well.

### Check Unique Values for each variable.

In [None]:
unique_values_per_column = data.nunique()
print("Number of unique values for each variable:")
print(unique_values_per_column)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
data.columns

In [None]:
sorted_data = data.sort_values(by='App', ascending=True)
sorted_data.to_csv('sorted_dataset.csv', index=False)
print('Sorted dataset:')
print(sorted_data)

In [None]:
# Check for missing values
missing_values = data.isna().sum()

# Inpute missing values for numerical columns with mean
data.fillna(data.mean(), inplace=True)

# Inpute missing values for categorical columns with the most frequent value
data.fillna(data.mode().iloc[0], inplace=True)

print('Summary of changes:')
print('Inputed missing values and encoded categorical variables.')
print('Number of rows before: ', len(data) + missing_values.sum())
print('Number of rows after: ', len(data))






In [None]:
# Group by a categorical column and calculate the mean for numerical columns
aggregated_data = data.groupby('App').mean()

print('Aggregated dataset:')
print(aggregated_data)



### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***