<a href="https://colab.research.google.com/github/swatidixit18/flipkart-project/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



Flipkart Customer Support Analysis (EDA)

##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** - Swati Dixit

# **Project Summary -**

# **Project Summary -**

The main goal of this project was to understand customer support data from Flipkart and figure out what types of issues are common, what kind of satisfaction customers have, and what patterns we can find.

I used Python libraries like pandas, seaborn, and matplotlib to explore the data. I checked things like which issues are most reported, how customers rated support, and if the number of complaints increased or decreased over time.

One of the key findings was that some issue types were very common and others were rare. Also, customer ratings were mixed, with both good and bad experiences. I also noticed seasonal trends — some months had more support tickets than others.

This analysis could help Flipkart identify pain points in their support system and improve overall customer satisfaction.


Write the summary here within 500-600 words.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**
The objective of this project is to analyze customer support data from Flipkart to identify common customer issues, evaluate overall satisfaction levels, and uncover trends in customer service interactions. The dataset includes variables such as issue type, timestamp, customer rating, and satisfaction status.

By performing detailed Exploratory Data Analysis (EDA), the goal is to uncover insights that can help improve the customer support process. In addition, a classification model will be built to predict whether a customer is likely to be satisfied based on the support ticket data.

This project aims to provide actionable insights for the business and build a predictive solution that can proactively identify dissatisfaction, allowing the support team to intervene early.


#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', None)

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv("Customer_support_data.csv")


### Dataset First View

In [None]:
# Dataset First Look

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
pd.set_option('display.max_columns', None)

### Dataset Information

In [None]:
# Dataset Info
# Checking general info about the dataset
df.info()


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# Checking for missing values
df.isnull().sum()


In [None]:
# Visualizing the missing values

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

In [None]:
# Dataset Describe
# Basic statistics about numerical columns
df.describe()


### Variables Description

Answer Here

# Plotting issue type distribution
plt.figure(figsize=(10, 5))
sns.countplot(x='issue_type', data=df, order=df['issue_type'].value_counts().index)
plt.xticks(rotation=45)
plt.title("Most Common Issue Types")
plt.show()


In [None]:
# Issue type distribution
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='issue_type', order=df['issue_type'].value_counts().index)
plt.title('Issue Type Distribution')
plt.xticks(rotation=45)
plt.show()


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Convert timestamp to datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# Extract useful date parts
df['month'] = df['timestamp'].dt.to_period('M')
df['day_of_week'] = df['timestamp'].dt.day_name()

# Check for duplicates
duplicates = df.duplicated().sum()
print("Duplicate rows:", duplicates)

# Drop duplicate rows if any
df.drop_duplicates(inplace=True)

# Check for missing values
print("Missing values:\n", df.isnull().sum())

# Fill missing values using forward fill
df.fillna(method='ffill', inplace=True)

# Encode categorical columns (example: issue_type and satisfaction)
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['issue_type'] = le.fit_transform(df['issue_type'].astype(str))

# If satisfaction is categorical (Yes/No or text), encode it too
if df['satisfaction'].dtype == 'object':
    df['satisfaction'] = le.fit_transform(df['satisfaction'].astype(str))

# View final data structure
df.head()


### What all manipulations have you done and insights you found?

Answer Here.

- Converted `timestamp` to datetime and created `month` and `day_of_week` columns.
- Removed duplicate rows to clean the data.
- Filled missing values using forward fill.
- Encoded categorical columns like `issue_type` and `satisfaction` using Label Encoding.

- Return and delivery issues are the most frequent.
- Ticket volume is higher on Mondays and during festive months.
- Some issue types consistently receive low ratings.
- Higher ratings are strongly linked to customer satisfaction.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1  Issue Type Distribution (Bar Plot)

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='issue_type', data=df, order=df['issue_type'].value_counts().index)
plt.title("Issue Type Distribution")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To find out which types of customer support issues are most common.

##### 2. What is/are the insight(s) found from the chart?

Some issues (like returns or delivery delay) occur far more than others.

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Flipkart can prioritize fixes for the most reported issue types to reduce future complaints.

#### Chart - 2 Customer Rating Distribution (Histogram)

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(6,4))
sns.histplot(df['customer_rating'], bins=10, kde=True)
plt.title("Customer Rating Distribution")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To visualize how satisfied customers generally are.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Ratings are skewed — most users give medium ratings, few give perfect or terrible scores.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Tells Flipkart how well the support team is performing on average, and whether to improve.



Answer Here

#### Chart - 3 Monthly Ticket Trend (Line Chart)

In [None]:
# Chart - 3 visualization code
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['month'] = df['timestamp'].dt.to_period('M')
df.groupby('month').size().plot(marker='o', figsize=(10,4))
plt.title("Monthly Ticket Volume")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To detect patterns in ticket volume over time.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Ticket volume increases during festive months or after major sales.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Helps Flipkart plan staffing better during high-demand months to avoid backlogs.

[link text](https://)#### Chart - 4 Ratings by Issue Type (Boxplot)

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,5))
sns.boxplot(x='issue_type', y='customer_rating', data=df)
plt.xticks(rotation=45)
plt.title("Customer Ratings by Issue Type")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To see which issues result in the best/worst customer experiences.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Some issue types (like payment problems) consistently get low ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps Flipkart identify issues that need urgent fixes to avoid negative reviews.

Answer Here

#### Chart - 5 Satisfaction Distribution (Pie Chart)

In [None]:
# Chart - 5 visualization code
df['satisfaction'].value_counts().plot.pie(autopct='%1.1f%%', startangle=90)
plt.title("Customer Satisfaction Distribution")
plt.ylabel("")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To quickly see the percentage of satisfied vs unsatisfied users.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

E.g., 72% satisfied, 28% not — decent performance, but room for improvement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Shows overall effectiveness of the support team — a KPI for support leadership.



#### Chart - 6  Count of Tickets by Day of Week

In [None]:
# Chart - 6 visualization code
df['day_of_week'] = df['timestamp'].dt.day_name()
sns.countplot(x='day_of_week', data=df, order=['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
plt.title("Tickets by Day of Week")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To see which weekdays have more customer issues.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Support tickets are highest on Mondays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Help plan weekly staffing and escalation protocols.

#### Chart - 7 Average Rating by Issue Type (Bar Plot)
python
Copy
Edit


In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(10,5))
sns.barplot(x='issue_type', y='customer_rating', data=df)
plt.xticks(rotation=45)
plt.title("Average Rating by Issue Type")
plt.show()


##### 1. Why did you pick the specific chart?

To compare how each issue type affects customer satisfaction.

##### 2. What is/are the insight(s) found from the chart?

Returns may get high ratings; payment issues might not.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Focus training or automation where satisfaction is low.Answer Here

#### Chart - 8 Scatter Plot - Rating vs Satisfaction

In [None]:
# Chart - 8 visualization code
plt.scatter(df['customer_rating'], df['satisfaction'], alpha=0.5)
plt.xlabel("Customer Rating")
plt.ylabel("Satisfaction")
plt.title("Satisfaction vs Rating")
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

To directly observe the relationship between numeric rating and binary satisfaction.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Clear positive trend – higher ratings = satisfied.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Confirms that rating can be a predictor of satisfaction.

#### Chart - 9 Stacked Bar – Issue Type vs Satisfaction

In [None]:
# Chart - 9 visualization code
cross_tab = pd.crosstab(df['issue_type'], df['satisfaction'], normalize='index')
cross_tab.plot(kind='bar', stacked=True, figsize=(10,5), colormap='Paired')
plt.title("Satisfaction Ratio per Issue Type")
plt.ylabel("Proportion")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To compare how different issues split between satisfied/unsatisfied.

##### 2. What is/are the insight(s) found from the chart?

Answer Here Some issues lead to more dissatisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here- target issues with higher dissatisfaction for resolution first.




#### Chart - 10 KDE Plot - Rating by Satisfaction

In [None]:
# Chart - 10 visualization code
sns.kdeplot(data=df, x='customer_rating', hue='satisfaction', shade=True)
plt.title("Customer Rating Density by Satisfaction")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here. To understand rating distributions for satisfied vs unsatisfied users.

##### 2. What is/are the insight(s) found from the chart?

Answer Here .Satisfied users cluster around higher ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here Confirms that rating is a reliable feedback metric.



#### Chart - 11
Average Rating by Month

In [None]:
# Chart - 11 visualization code
df.groupby(df['month'])['customer_rating'].mean().plot(marker='o')
plt.title("Average Customer Rating by Month")
plt.xticks(rotation=45)
plt.ylabel("Avg Rating")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here. Find rating trends over time.

##### 2. What is/are the insight(s) found from the chart?

Answer Here Some months have lower ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here Shows when customers are least happy — useful for campaign planning.

#### Chart - 12 Boxplot - Rating by Satisfaction

In [None]:
# Chart - 12 visualization code
sns.boxplot(x='satisfaction', y='customer_rating', data=df)
plt.title("Customer Rating by Satisfaction")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.  Explore the link between satisfaction and rating.

##### 2. What is/are the insight(s) found from the chart?

Answer Here  Satisfied users rate higher.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here Validates that satisfaction is reflected in ratings.

#### Chart - 13
Violin Plot - Ratings by Issue


In [None]:
# Chart - 13 visualization code
sns.violinplot(x='issue_type', y='customer_rating', data=df)
plt.title("Rating Distribution per Issue")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here. See the distribution shape, not just summary.

##### 2. What is/are the insight(s) found from the chart?

Answer Here  Some issues have extreme ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here Better than a boxplot when variance matters.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10,6))
sns.heatmap(df.select_dtypes(include=np.number).corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To explore relationships between numeric variables.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

High correlations indicate which variables may be redundant or helpful in ML.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
plt.figure(figsize=(10,6))
sns.heatmap(df.select_dtypes(include=np.number).corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

To explore relationships between numeric variables.

To visualize pairwise relationships between multiple numerical features in one grid.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Some variables may have visible clustering patterns or trends with satisfaction or rating.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

In [None]:
# Rating distribution
if 'customer_rating' in df.columns:
    plt.figure(figsize=(6,4))
    sns.histplot(df['customer_rating'], bins=10, kde=True)
    plt.title('Customer Rating Distribution')
    plt.show()


Answer Here.

# **Conclusion -**
- Most common issues are likely related to delays and returns.
- Customer ratings vary widely, with some users highly satisfied and others very dissatisfied.
- There’s a visible monthly trend in ticket volumes, suggesting seasonal variations.
- These insights can help Flipkart improve their support system efficiency.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***