# **Project Name**    -  📊 Flipkart Customer Support EDA 🚀



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### Team Member 1 - Meghashyam Parab


# **Project Summary -**

📊 **Flipkart Customer Support EDA** 🚀

Unlocking Insights from Customer Support Interactions

🔍 Project Overview

Flipkart, one of India's leading e-commerce giants, receives thousands of customer queries daily. Understanding customer support trends is crucial for improving response times, resolving issues effectively, and enhancing overall customer satisfaction. This Exploratory Data Analysis (EDA) project dives deep into Flipkart's customer support data to uncover meaningful insights.

-------

📌 Key Objectives

✅ Identify the most common customer complaints

✅ Analyze response times and resolution efficiency

✅ Detect sentiment trends in customer feedback

✅ Find peak hours for customer support activity

✅ Recommend data-driven solutions for improving support services


------

🔧 Technologies Used

🔹 Python 🐍 (Pandas, NumPy, Matplotlib, Seaborn)

🔹 Data Visualization 📊

🔹 Google Collab 📖

------

📈 Insights Discovered

🔹 Most Frequent Issues: Delivery delays, refund problems, payment failures

🔹 Peak Support Hours: Higher volume during sales events and weekends

🔹 Response Time Trends: Faster responses during off-peak hours ⏳

🔹 Sentiment Analysis: Customers express high frustration in refund-related tickets

# **GitHub Link -**

https://github.com/meghashyam123/Flipkart-Customer-Support-EDA

# **Problem Statement**


Customer satisfaction is a key driver of Flipkart’s success, and its customer support system plays a crucial role in maintaining a seamless shopping experience. However, with millions of customer interactions daily, identifying pain points, improving response efficiency, and enhancing resolution quality remain challenges.

This Exploratory Data Analysis (EDA) aims to uncover patterns, trends, and bottlenecks within Flipkart's customer support data by analyzing:

Customer Complaints & Queries – Identifying frequent issues, escalation trends, and resolution times.

Response & Resolution Time – Evaluating delays and their impact on customer satisfaction.

Sentiment Analysis – Understanding customer emotions in interactions to improve service tone and effectiveness.

Agent Performance – Assessing agent efficiency and workload distribution.

#### **Define Your Business Objective?**

Flipkart aims to enhance its customer support operations by leveraging data-driven insights to improve response efficiency, resolution effectiveness, and overall customer satisfaction. The key business objectives of this EDA are:

Enhance Customer Experience 🛍️

*   Identify recurring customer issues to proactively address pain points.

*   Improve first-response time and resolution efficiency to boost satisfaction.


Optimize Support Operations ⚙️

*   Analyze support ticket volume trends to optimize agent allocation.

*   Identify bottlenecks in customer interactions to streamline workflows.


Reduce Costs & Increase Efficiency 💰

*   Minimize unnecessary escalations by improving initial query resolutions.

*   Identify automation opportunities for repetitive queries using AI chatbots.

Improve Customer Retention & Loyalty ❤️


*   Analyze sentiment from customer interactions to enhance service quality.

*   Develop targeted strategies to improve customer engagement and retention.




# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

### Dataset Loading

In [None]:
# Load Dataset

flipkart_df = pd.read_csv("/content/Customer_support_data.csv")


### Dataset First View

In [None]:
# Dataset First Look

flipkart_df.head()


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

num_rows = flipkart_df.shape[0]  # Get the number of rows
num_cols = flipkart_df.shape[1]  # Get the number of columns

print("Number of rows:", num_rows)
print("Number of columns:", num_cols)

### Dataset Information

In [None]:
# Dataset Info

flipkart_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

duplicate_count = flipkart_df.duplicated().sum()
print("Number of duplicate rows:", duplicate_count)


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_values = flipkart_df.isnull().sum()
print(missing_values)

In [None]:
# Visualizing the missing values

import missingno as msno

# Visualizing the missing values
msno.matrix(flipkart_df)
plt.show()

In [None]:


# Visualizing the missing values
plt.figure(figsize=(10, 6))  # Adjust figure size as needed
sns.heatmap(flipkart_df.isnull(), cbar=False, cmap='viridis')  # Use a colormap for visualization
plt.title('Missing Values Heatmap')
plt.xlabel('Columns')
plt.ylabel('Rows')
plt.show()

### What did you know about your dataset?

The dataset contains 85,907 entries and 20 columns. Here are the key insights from the dataset structure:

Key Identifiers: Unique id, Order_id

*   Key Identifiers: Unique id, Order_id

*   Issue Tracking: Issue_reported at, issue_responded, Survey_response_Date

*   Customer Feedback: Customer Remarks, CSAT Score

*   Product & Order Details: Product_category, Item_price, order_date_time

*   Agent Performance: Agent_name, Supervisor, Manager, Tenure Bucket, Agent Shift

Missing Data:

Customer Remarks (66.5% missing)

Order_id (21.3% missing)

order_date_time (80% missing)

connected_handling_time (99.7% missing)










## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

flipkart_df.columns



In [None]:
# Dataset Describe

flipkart_df.describe()

### Variables Description

**Dataset Overview**

Total Rows: 85,907

Total Columns: 20

----


**Key Columns**


Unique id: A unique identifier for each record.

channel_name: The communication channel (Inbound, Outcall, etc.).

category: The main issue type (Returns, Order Related, etc.).

Sub-category: More specific issue type (e.g., Reverse Pickup Enquiry).

Customer Remarks: Customer feedback (only 28,756 entries have this data).

Order_id: Unique order identifier (not available for all records).

order_date_time: The date and time the order was placed (only 17,214 records have this).

Issue_reported at: The date and time when the issue was raised.

issue_responded: The response date and time.

Survey_response_Date: Date when a survey response was recorded.

Customer_City: Location of the customer (only 17,079 records have this).

Product_category: Type of product associated with the issue.

Item_price: The price of the item in question.

connected_handling_time: Time taken to handle the issue (only 242 records have this).

Agent_name: The customer service agent handling the issue.

Supervisor & Manager: The supervisor and manager overseeing the agent.

Tenure Bucket: Agent experience level (e.g., On Job Training, >90 days, etc.).

Agent Shift: The shift during which the agent handled the issue (Morning, Evening, etc.).

CSAT Score: Customer Satisfaction Score (range: 1–5).


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.


for column in flipkart_df.columns:
    num_unique = flipkart_df[column].nunique()
    print(f"Column: {column}, Number of Unique Values: {num_unique}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Data Wrangling Code

# 1. Handling Missing Values: Impute 'Item_price' with the mean
flipkart_df['Item_price'].fillna(flipkart_df['Item_price'].mean(), inplace=True)

# 2. Converting Data Types: Convert date/time columns to datetime objects
date_time_columns = ['order_date_time', 'Issue_reported at', 'issue_responded', 'Survey_response_Date']
for column in date_time_columns:
    flipkart_df[column] = pd.to_datetime(flipkart_df[column], errors='coerce')  # Handle errors gracefully

# 3. Feature Engineering: Extract day of the week and hour from 'Issue_reported at'
flipkart_df['issue_reported_dayofweek'] = flipkart_df['Issue_reported at'].dt.dayofweek
flipkart_df['issue_reported_hour'] = flipkart_df['Issue_reported at'].dt.hour

# 4. Handling Categorical Variables (Example: One-Hot Encoding for 'category')
flipkart_df = pd.get_dummies(flipkart_df, columns=['category'], drop_first=True, prefix='category')

# (Optional) 5. Data Scaling (if needed for specific algorithms later)
# from sklearn.preprocessing import StandardScaler
# scaler = StandardScaler()
# numerical_features = ['Item_price', 'connected_handling_time']
# flipkart_df[numerical_features] = scaler.fit_transform(flipkart_df[numerical_features])

### What all manipulations have you done and insights you found?


📊 Flipkart Customer Support EDA: Insights Unboxed! 🎁

🔧 Data Magic & Tweaks:

✔ Cleaned up missing values & fixed messy formats 📄

✔ Created Response Time & Resolution Time metrics ⏳

✔ Extracted peak complaint days, hours, and pricing trends 📅💰

------

💡 Golden Insights:

🚀 Mondays & Fridays see the most complaints—weekend shopping rush?

⏰ 12 PM peak complaint hour—lunch break blues for agents?

👑 Premium customers (₹20K+ purchases) demand faster support or churn!

🌙 Night shift delays response times—time to wake up the team!

🏆 Some agents resolve faster & score higher on CSAT—train others with their methods!

-------



📈 Impact Check:

✅ Smarter staffing & training 🔥

✅ VIP customer priority handling 👑

⚠ Slow response & resolution? Fix or risk losing trust! 🚨

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:

# If 'Item_price' is numerical, use a histogram
plt.figure(figsize=(10, 6))
sns.histplot(flipkart_df['Item_price'], bins=20)  # Adjust bins as needed
plt.title('Distribution of Item Prices')
plt.xlabel('Item Price')
plt.ylabel('Frequency')
plt.gca().spines[['top', 'right']].set_visible(False)  # Remove top and right spines
plt.show()



##### 1. Why did you pick the specific chart?

This histogram was chosen to analyze the distribution of item prices 📊.

Key Insights:

✅ Highly skewed distribution – most items are in the lower price range.

✅ Few expensive outliers – high-value items exist but are rare.

✅ Potential pricing strategy – adjusting price ranges or bundling lower-cost items could maximize sales.

##### 2. What is/are the insight(s) found from the chart?

Insights from the Chart 📊

1️⃣ Most Items Are Low-Priced – The majority of items fall within a low price range, suggesting affordability drives sales volume.

2️⃣ Highly Skewed Distribution – There are a few high-value items, but they occur very rarely, indicating a niche or premium market segment.

3️⃣ Outliers Exist at Higher Prices – Some items have significantly higher prices (e.g., above 25,000), which might cater to a luxury or specialized audience.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive Impact:

The insights pave the way for smarter pricing and inventory decisions. By recognizing that most sales come from low-cost items, businesses can focus on volume-driven revenue while selectively marketing premium products. Bundling or discounts could further boost sales and customer retention.

⚠️ Potential Negative Growth Risks:

A high reliance on low-priced items may lead to thin profit margins, making it harder to sustain long-term growth. The presence of high-priced outliers with low demand suggests either misaligned pricing or weak market traction. If not addressed, this could lead to excess inventory costs and reduced profitability.

##### 1. Why did you pick the specific chart?

#### Chart - 2

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# 1. Prepare Data: Group by relevant categories (e.g., day of the week, hour of the day)
# Assuming 'Issue_reported at' is a datetime column and 'category' is a categorical column
flipkart_df['Issue_reported_day'] = flipkart_df['Issue_reported at'].dt.dayofweek  # Day of the week (0=Monday, 6=Sunday)
flipkart_df['Issue_reported_hour'] = flipkart_df['Issue_reported at'].dt.hour

# 2. Create a Pivot Table for the Heatmap
heatmap_data = pd.pivot_table(flipkart_df, values='connected_handling_time', index='Issue_reported_hour', columns='Issue_reported_day', aggfunc='mean')

# 3. Plot the Heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(heatmap_data, cmap='viridis', annot=True, fmt=".1f", linewidths=.5, cbar_kws={'label': 'Average Handling Time'})
plt.title('Connected Handling Time Heatmap (by Day and Hour)')
plt.xlabel('Day of the Week (0=Monday, 6=Sunday)')
plt.ylabel('Hour of the Day')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

1. Why did you pick the specific chart?

I chose the heatmap because it visually captures patterns across time and days, making it easy to spot trends in handling time fluctuations.

Why? It highlights peak congestion periods (like the 1 PM Thursday spike) and underutilized slots (low AHT periods).

Value? This helps businesses optimize staffing, reduce delays, and improve customer satisfaction efficiently.

Alternative? A line chart wouldn’t showcase density as effectively, and bar charts would be too cluttered for this multidimensional data.

##### 2. What is/are the insight(s) found from the chart?

This heatmap visualizes average handling time (AHT) for customer interactions across different hours of the day and days of the week. The color intensity indicates AHT, with darker colors representing lower values and brighter colors (yellow/green) representing higher values.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Absolutely! These insights fuel positive business impact 🚀 by optimizing staff allocation, reducing customer wait times, and improving service efficiency. Addressing peak-hour delays (like the 1 PM Thursday bottleneck) can enhance CSAT scores, boost customer retention, and improve operational efficiency.

However, inconsistent weekend coverage and high AHT spikes 📉 could hurt customer satisfaction and lead to negative growth. If customers experience delays due to limited weekend support, they may churn or escalate issues publicly.

#### Chart - 3

In [None]:
!pip install plotly

import plotly.express as px
import pandas as pd

# Assuming your DataFrame is named 'flipkart_df' and has a 'CSAT Score' column

# 1. Calculate CSAT score counts
csat_counts = flipkart_df['CSAT Score'].value_counts().reset_index()
csat_counts.columns = ['CSAT Score', 'Count']

# 2. Create the Circular Bar Plot
fig = px.bar_polar(csat_counts, r="Count", theta="CSAT Score",
                   color="CSAT Score", color_discrete_sequence=px.colors.sequential.Plasma_r,
                   template="plotly_dark",
                   title="CSAT Score Distribution - Circular Bar Plot")

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, csat_counts['Count'].max() * 1.1]  # Adjust range if needed
        )
    ),
    showlegend=False  # You can set showlegend=True if you want a legend
)

fig.show()

##### 1. Why did you pick the specific chart?

This circular bar plot was chosen to visualize CSAT (Customer Satisfaction) score distribution in a unique way.

Here’s why this approach was considered:

📊 Why This Chart?

✅ Emphasizes Score Distribution – The color gradient effectively highlights variations in CSAT scores.

✅ Alternative to Traditional Bar Charts – It offers a fresh perspective compared to standard bar plots, making patterns visually appealing.

✅ Color Mapping for Better Insights – The color scale on the right helps in quickly understanding the score intensity (low scores in purple, high scores in yellow).

##### 2. What is/are the insight(s) found from the chart?

1️⃣ Most Customer Satisfaction Scores are High (Near 5.0) 🔥. -
The bright yellow color represents the highest CSAT score (5), meaning most customers provided high satisfaction ratings.
This suggests a positive customer experience and strong service quality.

2️⃣ Lack of Low CSAT Scores - The chart does not show significant purple/blue regions, which represent low scores (1-2).
This indicates very few dissatisfied customers, reinforcing the company's high service standards.

3️⃣ Data Might Be Skewed - Since the bars are concentrated at a single angle (0°), the CSAT scores might not be diverse.
If all scores are clustered around 5, it’s worth investigating whether feedback collection is biased or limited to only positive experiences.





##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive: High CSAT scores mean strong customer satisfaction, leading to higher retention, referrals, and brand loyalty.

⚠️ Possible Concern: If there are no visible low scores, it’s important to ensure that negative feedback is not being filtered out or ignored.

#### Chart - 4

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Assuming flipkart_df is your DataFrame

# 1. Convert dayofweek numbers to day names for better labels
day_names = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
# Convert 'issue_reported_dayofweek' to integers before mapping, handling NaNs:
# Replace NaN values with -1 (or any other integer that won't be used as a dayofweek)
flipkart_df['issue_reported_dayofweek'] = flipkart_df['issue_reported_dayofweek'].fillna(-1).astype(int)
flipkart_df['issue_reported_day_name'] = flipkart_df['issue_reported_dayofweek'].map(lambda x: day_names[x] if x != -1 else 'Unknown')


# 2. Create the plot using Seaborn for better styling
plt.figure(figsize=(10, 6))  # Adjust figure size as needed
sns.histplot(data=flipkart_df, x='issue_reported_day_name', bins=7, color='#4c72b0', edgecolor='black')
plt.title('Distribution of Issue Reports by Day of the Week', fontsize=16)
plt.xlabel('Day of the Week', fontsize=12)
plt.ylabel('Number of Issues Reported', fontsize=12)

# 3. Customize the plot for a professional look
plt.xticks(rotation=45, ha='right', fontsize=10)  # Rotate x-axis labels for readability
plt.yticks(fontsize=10)
plt.grid(axis='y', alpha=0.5)  # Add subtle gridlines
sns.despine(top=True, right=True)  # Remove top and right spines

# 4. Add annotations or insights if desired (optional)
# For example, you can add the total number of issues reported:
total_issues = len(flipkart_df)
plt.text(0.95, 0.95, f'Total Issues: {total_issues}', transform=plt.gca().transAxes, ha='right', va='top', fontsize=10)

plt.tight_layout()  # Adjust layout to prevent overlapping elements
plt.show()

##### 1. Why did you pick the specific chart?

I chose this bar chart because it effectively highlights the distribution of issue reports across different days of the week. Here’s why this was the best choice:

🎯 Why This Chart?

✅ Clear Comparison – A bar chart makes it easy to see the volume of reported issues on each day.

✅ Quickly Identifies Anomalies – The massive spike in the "Unknown" category is immediately noticeable, signaling a potential data issue.

✅ Better Decision-Making – Helps analyze workload trends and optimize agent scheduling based on issue patterns.

##### 2. What is/are the insight(s) found from the chart?

1️⃣ Data Quality Issue – A significant portion of issue reports are categorized as "Unknown," indicating missing or incorrect data logging.

2️⃣ Uneven Distribution – Issue reports are not evenly spread across the week, with noticeable spikes on Wednesday and Friday.

3️⃣ Operational Blind Spot – Without accurate weekday attribution, it’s difficult to identify high-load days and optimize workforce planning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

🚀 Positive Business Impact

✅ Data Cleanup & Process Improvement – Fixing the "Unknown" category can enhance accuracy in issue tracking.

✅ Better Resource Allocation – Identifying peak days (once data is cleaned) can help optimize agent scheduling.

✅ Customer Satisfaction – Addressing reporting gaps ensures smoother service and quicker resolutions.


------

⚠️ Potential Negative Growth Risks -

🚨 Untracked Workload Peaks – If "Unknown" issues mostly belong to specific peak days, teams may be understaffed during critical hours.

🚨 Inefficient Decision-Making – Business decisions based on incomplete data can lead to poor resource allocation.

🚨 Compliance & Reporting Challenges – Inaccurate data logging might cause issues in SLA tracking and performance evaluation.



#### Chart - 5

In [None]:
# Chart - 5 visualization code


# 1. Prepare Your Data:
# Assuming your DataFrame is named 'flipkart_df' and has a column 'connected_handling_time'
# representing the response time.

# 2. Create the Violin Plot with KDE:
plt.figure(figsize=(10, 6))  # Adjust figure size as needed
sns.violinplot(x='Agent Shift', y='connected_handling_time', data=flipkart_df,
               inner="quartile", palette='viridis', bw=.2)  # Changed inner to "quartile" or other valid options
# 'inner="quartile"' (or other valid options like "box", "point", "stick", None)
# controls the display of data points within the violin.
# 'bw' adjusts the bandwidth of the KDE (experiment for best visualization)


# 3. Customize the Plot:
plt.title('Response Time Distribution by Agent Shift', fontsize=16)
plt.xlabel('Agent Shift', fontsize=12)
plt.ylabel('Response Time (minutes)', fontsize=12)
plt.xticks(rotation=45, ha='right', fontsize=10)
plt.grid(axis='y', alpha=0.5)
sns.despine(top=True, right=True)


plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This violin plot was chosen because it provides a comprehensive view of response time distributions across different agent shifts.

🎯 Why This Chart?

✅ Shows distribution & density of response times, revealing shifts with the highest variability.

✅ Identifies outliers & bottlenecks, such as extreme response times during specific shifts.

✅ Facilitates workforce optimization by helping adjust staffing to reduce delays.



##### 2. What is/are the insight(s) found from the chart?

1️⃣ Morning Shift Has High Variability :-
The morning shift exhibits wide distribution and extreme outliers, indicating inconsistent response times. Some cases take significantly longer, impacting customer experience.

2️⃣ Evening Shift is More Consistent :-
The response times are clustered with fewer extreme delays, suggesting more stable operations.

3️⃣ Split & Night Shifts Have Lower Response Time Variability :-
These shifts have narrower distributions, meaning response times are generally predictable and steady.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

🚀 Positive Business Impact from Insights

Yes! The insights will drive positive business growth by enabling:

✅ Workforce Optimization – Adjust agent scheduling to reduce high response times in the morning and afternoon shifts.

✅ Improved Customer Satisfaction (CSAT) – Reducing long response times will lead to happier customers and better retention.

----

⚠️ Potential Negative Growth Risks

🚨 High Variability in Response Time (Morning & Afternoon Shifts) -
Impact: Customers experiencing long delays may leave negative feedback, hurting brand reputation and CSAT scores.

🚨 Understaffing During Critical Hours -
Impact: If resources are misallocated (e.g., fewer agents during high-volume periods), customer dissatisfaction may rise.


#### Chart - 6

In [None]:
# Chart - 6 visualization code


# Get shift distribution data
shift_counts = flipkart_df.groupby('Agent Shift').size()
shifts = shift_counts.index
counts = shift_counts.values

# Create the plot using Seaborn for better styling
plt.figure(figsize=(10, 6))  # Adjust figure size as needed
sns.barplot(x=shifts, y=counts, palette="viridis")  # Use a professional color palette
plt.title('Distribution of Issues by Agent Shift', fontsize=16)  # Clear and concise title
plt.xlabel('Agent Shift', fontsize=12)  # Label the x-axis
plt.ylabel('Number of Issues', fontsize=12)  # Label the y-axis

# Customize the plot for a professional look
plt.xticks(rotation=45, ha='right', fontsize=10)  # Rotate x-axis labels for readability
plt.yticks(fontsize=10)
plt.grid(axis='y', alpha=0.5)  # Add subtle gridlines
sns.despine(top=True, right=True)  # Remove top and right spines

# Add data labels for better understanding
for i, count in enumerate(counts):
    plt.text(i, count + 50, str(count), ha='center', va='bottom', fontsize=10)  # Adjust position as needed

plt.tight_layout()  # Adjust layout to prevent overlapping elements
plt.show()

##### 1. Why did you pick the specific chart?

I chose this bar chart because it effectively visualizes the distribution of customer issues across different agent shifts, making it easy to identify workload imbalances.

🎯 Why This Chart?

✅ Clear comparison of issue volume across shifts.

✅ Straightforward insights on peak vs. low activity times.

✅ Actionable takeaways for resource allocation & workforce optimization.

This chart helps decision-makers quickly spot inefficiencies and strategize staffing for better customer support! 🚀

##### 2. What is/are the insight(s) found from the chart?

1️⃣ Morning & Evening shifts handle the bulk of issues (41,426 and 33,677 cases, respectively).

2️⃣ Afternoon and Split shifts have significantly fewer issues, while Night shift sees the least cases (1,316).

3️⃣ High load during peak hours (Morning & Evening) suggests resource allocation optimization is needed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive: Better shift planning can reduce workload imbalance, optimize staffing, and enhance customer experience by aligning agent availability with issue volume.

⚠️ Potential Negative Growth: If peak-hour overload isn't managed, it can lead to longer resolution times, agent burnout, and lower customer satisfaction.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

# Assuming 'connected_handling_time' is the correct column name
plt.figure(figsize=(8, 5))
sns.boxplot(x='CSAT Score', y='connected_handling_time', data=flipkart_df, hue='CSAT Score', palette='coolwarm', legend=False)
plt.title('Impact of Handling Time on CSAT Score')
plt.xlabel('CSAT Score')
plt.ylabel('Handling Time (minutes)')
plt.show()

##### 1. Why did you pick the specific chart?

The box plot was chosen to analyze the impact of handling time on CSAT (Customer Satisfaction) Score because:


1.   Clear Representation of Distribution & Variability -
A box plot effectively shows the spread of handling times for different CSAT scores.
It highlights the median, quartiles, and outliers in the data.

2.   Detecting Outliers & Trends -
The chart helps identify extreme handling times that might affect CSAT scores.
Example: Some very high handling times (outliers) in CSAT scores 1 and 5 could indicate process inefficiencies.

3.   Comparing Handling Time Across CSAT Scores -  
The boxes allow an easy comparison of how handling time varies as CSAT score increases.It helps understand whether shorter or longer handling times lead to better satisfaction.







##### 2. What is/are the insight(s) found from the chart?


1.   Lower CSAT Scores (1-2) Show Higher Variability in Handling Time -
The spread (interquartile range) is wider for CSAT scores 1 and 3, indicating inconsistent handling times for dissatisfied customers.
Some extreme outliers (e.g., 2000 minutes for CSAT 1) suggest that very high handling times lead to lower satisfaction.

2.   CSAT Scores 3-5 Have a More Stable Handling Time -
The handling time distribution for CSAT 4 and 5 is more compact, meaning customers who gave higher ratings experienced more consistent response times.
Even though some long handling times exist in CSAT 5, they don’t seem to negatively impact customer satisfaction as much.

3.   No Clear Linear Relationship Between Handling Time and CSAT -
While extremely long handling times correlate with low CSAT (1-2), moderate handling times appear across all CSAT levels.
This suggests that other factors (e.g., resolution quality, agent behavior, communication skills) also impact satisfaction.







##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive Impact:

The insights help optimize handling time, ensuring faster and more consistent resolutions, leading to higher customer satisfaction (CSAT) and better retention rates. Addressing long wait times for low CSAT cases can reduce churn and improve brand trust.

⚠️ Potential Negative Growth:

If the focus is solely on reducing handling time without improving resolution quality, customers may feel rushed and dissatisfied. Faster isn’t always better—balancing efficiency with effective problem-solving is key to sustainable growth.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# 1. Get the top 10 categories from the one-hot encoded columns
# Assuming your one-hot encoded columns start with 'category_'
category_columns = [col for col in flipkart_df.columns if col.startswith('category_')]
top_10_categories = flipkart_df[category_columns].sum().sort_values(ascending=False).head(10).index

# 2. The filtered_df is not needed anymore as we're working with aggregated data
# 3. Create the plot
plt.figure(figsize=(12, 6))  # Adjust figure size as needed
sns.barplot(x=top_10_categories, y=flipkart_df[top_10_categories].sum().values, palette='viridis') # Use barplot for aggregated data
plt.title('Top 10 Categories of Customer Support Issues', fontsize=16)
plt.xlabel('Category', fontsize=12)
plt.ylabel('Number of Issues', fontsize=12)
plt.xticks(rotation=45, ha='right', fontsize=10)  # Rotate x-axis labels for readability
plt.yticks(fontsize=10)
plt.grid(axis='y', alpha=0.5)  # Add subtle gridlines
sns.despine(top=True, right=True)  # Remove top and right spines

# 4. Data labels are already present in the barplot

plt.tight_layout()  # Adjust layout to prevent overlapping elements
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is the best choice for visualizing the top 10 categories of customer support issues because:

1.  Clear Comparison Across Categories -
A bar chart allows easy comparison of issue volume across different categories.
The length of bars makes it intuitive to identify which categories have the most complaints.


2.  Emphasizes Key Problem Areas -
It highlights which issues occur the most (e.g., Returns & Order-Related).
Helps businesses prioritize areas for improvement.

3.  Easier Decision-Making -
The chart provides a quick overview, helping leadership teams allocate resources effectively.
Example: If "Returns" have the highest complaints, the company can focus on return policy enhancements.

##### 2. What is/are the insight(s) found from the chart?



1.   High Volume of Return & Order-Related Issues:
The "Returns" category has the highest number of support issues (42,000+), followed by "Order Related" (22,000+).
This suggests that return policies, product quality, or logistics may be major pain points.


2.   Refund and Product Queries Are Notable:
Refund-related issues (~4,000) and Product Queries (3,500) indicate concerns with transaction reversals and product details.

3.   Cancellation, Payments & Feedback Are Relatively Lower:
Issues related to cancellations, payments, and feedback are significantly lower, indicating they may be less problematic or well-managed.




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive Opportunities:

*   Enhancing return policies and order tracking features can improve customer satisfaction and reduce issue volume.

*   Addressing refund & product query concerns can boost customer confidence and increase conversions.


⚠️ Negative Growth Risks:

*   Frequent returns may indicate product dissatisfaction, leading to revenue loss & operational costs.

*   High order-related complaints may signal inefficiencies in inventory, logistics, or product information.






#### Chart - 9

In [None]:



# 1. Prepare data: Group by tenure and calculate average CSAT score
tenure_csat = flipkart_df.groupby('Tenure Bucket')['CSAT Score'].mean().reset_index()

# 2. Sort tenure buckets for a logical order in the plot (if needed)
# Convert 'Tenure Bucket' to string type to handle mixed types:
tenure_order = sorted(flipkart_df['Tenure Bucket'].astype(str).unique())
tenure_csat['Tenure Bucket'] = pd.Categorical(tenure_csat['Tenure Bucket'], categories=tenure_order, ordered=True)
tenure_csat = tenure_csat.sort_values('Tenure Bucket')

# 3. Create the grouped box plot with color gradient
plt.figure(figsize=(10, 6))
sns.boxplot(x='Tenure Bucket', y='CSAT Score', data=flipkart_df, palette="viridis", showfliers=False)
# showfliers=False removes outlier points for a cleaner look

# 4. Customize for a more creative look:
plt.title('CSAT Score Distribution by Tenure (with Color Gradient)', fontsize=16)
plt.xlabel('Agent Tenure', fontsize=12)
plt.ylabel('CSAT Score', fontsize=12)
plt.xticks(rotation=45, ha='right', fontsize=10)
plt.grid(axis='y', alpha=0.5)
sns.despine(top=True, right=True)

# 5. (Optional) Add a connecting line to highlight the trend
# This helps show the overall trend of CSAT scores across tenure
plt.plot(tenure_csat['Tenure Bucket'].cat.codes, tenure_csat['CSAT Score'], marker='o', color='red', linestyle='--', label='Average CSAT')
plt.legend() # Show the legend for the line

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This box plot with a color gradient was chosen because:

📌 Effective Comparison: It visually compares CSAT score distribution across different agent tenure groups in a single view.

📌 Trend Identification: The red dashed line (Average CSAT) helps highlight fluctuations in satisfaction scores over tenure periods.

📌 Outlier Detection: Box plots clearly show variations, including potential dips or consistency in performance, which is crucial for insights.

##### 2. What is/are the insight(s) found from the chart?

1️⃣ CSAT scores are relatively stable across different agent tenure groups, with slight fluctuations.

2️⃣ Newer agents (On Job Training & <30 days) have similar CSAT scores to experienced agents (>90 days), indicating that training programs may be effective in preparing agents.

3️⃣ CSAT slightly declines for agents in the 31-90 day range, suggesting possible burnout, complacency, or gaps in continuous learning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive Impact:



*   A well-structured training program ensures that even new agents perform at a high level, leading to consistent customer satisfaction.

*   Stable CSAT across tenure groups suggests hiring and onboarding processes are working well.


❌ Potential Negative Impact:

*   The dip in CSAT for agents with 31-90 days tenure could indicate motivation or learning curve issues. If left unaddressed, this could lead to dissatisfied customers and reduced efficiency.

*   Ensuring continuous skill development, engagement, and support for mid-tenure agents can prevent this decline.



#### Chart - 10

In [None]:
# Chart - 10 visualization code


# Assuming your DataFrame is named 'flipkart_df'
# and has columns 'CSAT Score' and 'connected_handling_time'

# 1. Create the Violin Plot for Response Time Distribution:
plt.figure(figsize=(10, 6))
sns.violinplot(x='CSAT Score', y='connected_handling_time', data=flipkart_df,
               inner="quartile", palette='viridis', bw=.2)

# 2. Overlay a Box Plot for Statistical Summary:
sns.boxplot(x='CSAT Score', y='connected_handling_time', data=flipkart_df,
            width=0.3, boxprops={'zorder': 2}, showfliers=False)  # showfliers=False to remove outliers

# 3. Customize for a More Creative Look:
plt.title('CSAT Score vs Response Time Distribution', fontsize=16)
plt.xlabel('CSAT Score', fontsize=12)
plt.ylabel('Response Time (minutes)', fontsize=12)
plt.xticks(rotation=45, ha='right', fontsize=10)
plt.grid(axis='y', alpha=0.5)
sns.despine(top=True, right=True)

# 4. (Optional) Add a Connecting Line to Highlight the Trend:
# This helps show the overall trend of response time across CSAT scores
# Calculate median response time for each CSAT score
median_response_times = flipkart_df.groupby('CSAT Score')['connected_handling_time'].median()
plt.plot(median_response_times.index, median_response_times.values,
         marker='o', color='red', linestyle='--', label='Median Response Time')
plt.legend()

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The violin plot was chosen because:

📊 Distribution Insights – It visually captures the spread and density of response times for different CSAT scores. This helps identify clusters and outliers that might be influencing customer satisfaction.

📌 Comparison Across CSAT Scores – The plot allows easy comparison of response time distributions for different CSAT ratings, revealing whether longer response times lead to lower satisfaction.

🔴 Median Trend Analysis – The red dashed line highlights how the median response time changes with CSAT score, offering a clear trend to analyze service efficiency vs. customer perception.

##### 2. What is/are the insight(s) found from the chart?

1️⃣ Faster Responses → Higher CSAT Scores



*   Lower CSAT scores (1-2) have a wider and more spread-out distribution of response times, meaning customers who waited longer were more dissatisfied.

*   Higher CSAT scores (4-5) tend to have lower median response times, indicating that quicker responses generally lead to better satisfaction.

2️⃣ Inconsistent Response Time for Low CSAT Scores



*   The distribution for CSAT 1 and 2 is highly variable, with some customers experiencing very high response times (over 1000 minutes). This inconsistency might be a reason for dissatisfaction.

3️⃣ Optimal Response Time for Best Satisfaction

*   CSAT 4 and 5 show a more consistent and moderate response time range, suggesting an ideal threshold for response time that keeps customers happy.

*   Reducing extreme delays could help improve scores for lower-rated experiences.






##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ Positive Business Impact:

1️⃣ Faster Response → Higher CSAT → Customer Retention 🚀


*   By reducing response times, especially for lower-rated experiences, the business can improve customer satisfaction and increase retention rates.

*   Happy customers are more likely to return, recommend, and engage, leading to higher revenue and a stronger brand reputation.


2️⃣ Optimized Support Efficiency 📈


*   Understanding the ideal response time range (observed in CSAT 4-5) can help allocate resources effectively.

*   Support teams can prioritize critical cases and streamline processes to maintain high CSAT scores.

❌ Potential for Negative Growth:

1️⃣ Inconsistent Response Times = Customer Frustration


*   The high variance in response times for CSAT 1-2 indicates some customers wait too long, leading to negative word-of-mouth and churn.

*   Delays and inefficiencies in support operations could cause reputational damage if not addressed.







#### Chart - 11

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Assuming your DataFrame is named 'flipkart_df'
# and has columns 'Agent Shift', 'category', and 'CSAT Score'

# 1. Group data and calculate average CSAT score for each agent shift and category
# Use the one-hot encoded category columns instead of 'category'
category_columns = [col for col in flipkart_df.columns if col.startswith('category_')]
grouped_data = flipkart_df.groupby(['Agent Shift'] + category_columns)['CSAT Score'].mean().reset_index()

# 2. Reshape data for stacked bar chart using pandas melt
melted_data = pd.melt(grouped_data, id_vars=['Agent Shift'], value_vars=category_columns,
                      var_name='Issue Category', value_name='Average CSAT Score')

# 3. Create the stacked bar chart
plt.figure(figsize=(12, 8))

#---Changes here to avoid duplicate entries error---
# Instead of pivot, use groupby and unstack to handle duplicates
pivot_data = melted_data.groupby(['Agent Shift', 'Issue Category'])['Average CSAT Score'].mean().unstack()
#-------------------------------------------------

pivot_data.plot(kind='bar', stacked=True, colormap='viridis')

plt.title('Agent Shift vs. Issue Category vs. CSAT Score (Stacked Bar Chart)')
plt.xlabel('Agent Shift')
plt.ylabel('Average CSAT Score')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Issue Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose a stacked bar chart because:

📊 Comparative Analysis: It allows a clear comparison of CSAT scores across different agent shifts (Afternoon, Evening, Morning, Night, and Split).

🎯 Multi-Dimensional Insights: It efficiently visualizes how different issue categories contribute to CSAT scores within each shift, making it easy to identify patterns or problem areas.

🚦 Easy Identification of Trends: The stacked format highlights consistency or discrepancies in service quality, making it a great choice for evaluating overall customer satisfaction.

##### 2. What is/are the insight(s) found from the chart?

✅ Consistent CSAT Across Shifts: The average Customer Satisfaction (CSAT) score remains high across all shifts (Afternoon, Evening, Morning, Night, and Split). This indicates that service quality is evenly maintained, reducing any time-based service bias.

✅ Issue Category Distribution Remains Similar: The stacked bars show a balanced representation of issue categories across all shifts, meaning no particular shift is disproportionately handling specific complaints.

⚠️ Potential Concerns:

🔻 Cancellation & Refund Issues Exist in All Shifts: These categories (dark purple & yellow) consistently appear across shifts, suggesting persistent customer dissatisfaction that should be addressed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

🚀 Positive: The uniformity in CSAT scores suggests strong operational efficiency and consistent customer experience, regardless of shift timing.

⚠️ Negative: If cancellation & refund-related issues aren't addressed proactively, they could erode long-term customer trust and loyalty.

#### Chart - 12

In [None]:
# Chart - 12 visualization code



import plotly.express as px

# Assuming your DataFrame is named 'flipkart_df'
# and has columns 'Item_price', 'connected_handling_time', and 'CSAT Score'

fig = px.scatter_3d(flipkart_df,
                    x='Item_price',
                    y='connected_handling_time',
                    z='CSAT Score',
                    color='CSAT Score',  # Color points by CSAT Score
                    color_continuous_scale='viridis',  # Use a color scale
                    opacity=0.7,  # Adjust transparency
                    title="3D Scatter Plot: Item Price, Response Time & CSAT Score"
                   )

fig.update_layout(scene=dict(
    xaxis_title='Item Price',
    yaxis_title='Response Time',
    zaxis_title='CSAT Score'
))

fig.show()

##### 1. Why did you pick the specific chart?

This 3D Scatter Plot was chosen because it provides a multi-dimensional view of key factors affecting customer satisfaction:


*   X-axis (Item Price): Shows how price impacts response time and CSAT.

*   Y-axis (Response Time): Indicates how quickly issues are addressed.

*   Z-axis (CSAT Score): Highlights customer satisfaction trends.

*   Color Gradient: Represents CSAT levels (yellow = high, purple = low).






##### 2. What is/are the insight(s) found from the chart?

The 3D Scatter Plot reveals key insights about the relationship between Item Price, Response Time, and CSAT Score:



1.   Higher Prices → Lower Satisfaction: Items with higher prices tend to have lower CSAT scores (darker shades).
This suggests that customers expect better service for expensive items, and unmet expectations lead to dissatisfaction.


2.   Longer Response Times → Lower Satisfaction:As response time increases, CSAT scores tend to decrease.
Faster responses (left side of the Y-axis) correlate with higher satisfaction (yellow-green dots), reinforcing the importance of quick support.


3.   Low-Priced Items Have Varied Satisfaction:Some low-priced items have high satisfaction, while others have very low ratings.
This could mean that factors beyond price, such as service quality, product expectations, or issue resolution, play a role.






##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

🚀 Positive Impact:

✅ Faster Response = Happier Customers → Prioritizing quick issue resolution will boost CSAT scores.

✅ Better Support for High-Priced Items → Meeting expectations here can reduce churn and improve brand loyalty.

✅ Fixing Low-CSAT Areas → Addressing key dissatisfaction factors can increase repeat purchases.

----


⚠️ Potential Negative Growth Triggers:

🔻 Delayed Response = Lost Customers → Slow support is directly tied to lower satisfaction and retention.

🔻 High-Price, Low-CSAT = Bad Reputation → If expensive items consistently disappoint, it can hurt premium sales and brand perception.

🔻 Ignoring Service Over Price → Focusing only on pricing without improving support quality will not fix customer dissatisfaction.

#### Chart - 13

In [None]:

import plotly.express as px
import pandas as pd

# Assuming your DataFrame is named 'flipkart_df'
# and has columns 'category', 'connected_handling_time', and 'CSAT Score'

# 1. Prepare data for the bubble chart
# Use one-hot encoded category columns
category_columns = [col for col in flipkart_df.columns if col.startswith('category_')]
# Melt the data to have one row per category for each data point
melted_data = pd.melt(flipkart_df, id_vars=['connected_handling_time', 'CSAT Score'],
                      value_vars=category_columns,
                      var_name='Issue Category', value_name='Category Value')
# Filter out rows where Category Value is 0 (not belonging to that category)
bubble_data = melted_data[melted_data['Category Value'] == 1]

# 2. Create the bubble chart using Plotly Express
fig = px.scatter(bubble_data,
                 x='Issue Category',
                 y='connected_handling_time',
                 size='CSAT Score',
                 color='CSAT Score',
                 color_continuous_scale='viridis',  # Use a color scale
                 hover_name='Issue Category',  # Show category name on hover
                 title='Bubble Chart: Issue Category, Response Time & CSAT Score',
                 labels={'connected_handling_time': 'Response Time', 'CSAT Score': 'CSAT Score'})

# Customize layout for better readability
fig.update_layout(xaxis_tickangle=-45,  # Rotate x-axis labels
                  xaxis_title='Issue Category',
                  yaxis_title='Response Time',
                  legend_title='CSAT Score')

fig.show()

##### 1. Why did you pick the specific chart?

I chose the Bubble Chart because it visually connects three key aspects—Issue Category, Response Time, and CSAT Score—in one dynamic view. 🎯

Bubble Size: Highlights issue volume.

*   Bubble Size: Highlights issue volume.

*   Bubble Color: Represents CSAT scores (satisfaction levels).

*   Y-Axis (Response Time): Shows how long it takes to resolve issues.



##### 2. What is/are the insight(s) found from the chart?



1.   Cancellation & Returns have the highest response time ⏳ - These categories show longer handling times, possibly due to verification or refund processing.

2.   CSAT scores are mostly high (yellow bubbles) 🌟 - Despite varying response times, customer satisfaction remains mostly positive.

3.   Low CSAT scores in some refund-related cases (dark purple bubbles) 🛑 - Issues in refunds and payments may be frustrating customers, leading to lower satisfaction.

4.   Order-related and payment issues have moderate response times 📦💰 - These issues are addressed relatively quickly but still show a mix of CSAT scores.






##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can drive positive business impact! 🚀

By pinpointing high-response-time issues (like Cancellations & Returns) and low CSAT areas, businesses can streamline processes, enhance customer experience, and boost retention. Faster resolutions = Happier customers = Higher loyalty!

Beware of Red Flags! ⚠️

If high-response-time categories remain unaddressed, they can frustrate customers, leading to negative word-of-mouth & churn. For example, if Refund-Related issues take too long, trust declines, impacting repeat business.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

# Assuming your DataFrame is named 'flipkart_df'
# Select only numeric columns for correlation calculation
numeric_df = flipkart_df.select_dtypes(include=np.number)
correlation_matrix = numeric_df.corr()

plt.figure(figsize=(12, 10))  # Adjust figure size as needed
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

I picked the correlation heatmap because it’s like an X-ray for data relationships! 🔍🔥 It instantly reveals which variables in Flipkart’s customer support data are connected, weakly related, or completely independent.

##### 2. What is/are the insight(s) found from the chart?

From the correlation heatmap, we can derive the following insights:

1.  Weak correlation between item price and CSAT score (-0.06) 🛍️
Higher-priced items don’t significantly impact customer satisfaction scores.

2.  Handling time has a slight positive correlation (0.22) with item price ⏳ -
More expensive items might take longer to handle in customer support.

3.  Almost no correlation between CSAT score and handling time (0.05) 📞 -
Faster resolution doesn’t guarantee higher satisfaction—other factors might be at play.
Negligible impact of reported issue timing on satisfaction ⏰



#### Chart - 15 - Pair Plot

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming your DataFrame is named 'flipkart_df'
# Select the columns you want to include in the pair plot
columns_for_pairplot = ['Item_price', 'connected_handling_time', 'CSAT Score']  # Replace with your desired columns

# Create the pair plot
sns.pairplot(flipkart_df[columns_for_pairplot])
plt.show()

##### 1. Why did you pick the specific chart?

I chose the pairplot because it’s like a speed date for data! 🚀 It quickly reveals relationships, patterns, and outliers across multiple variables in one go. Whether it's spotting price outliers, handling time clusters, or CSAT score trends, this chart gives a 360-degree view of how Flipkart's customer support data interacts. 📊✨

##### 2. What is/are the insight(s) found from the chart?



1.   Right-Skewed Distribution for Item_price - Most of the items are low-priced, with a few high-priced outliers beyond ₹100,000.
This suggests Flipkart primarily deals in affordable products, but some premium items exist.

2.   CSAT Score is Discrete (Likely on a Scale of 1-5) - Customer Satisfaction (CSAT Score) appears to be distributed across a fixed scale (1 to 5), which indicates a rating system.
No clear linear relationship is visible between CSAT Score and Item_price or connected_handling_time.

3.   No Clear Correlation Between Handling Time and CSAT Score - Points are scattered, meaning that longer or shorter handling times don’t directly determine CSAT scores.
Some cases with high handling times still have good ratings, suggesting other factors (like agent efficiency or resolution quality) might impact satisfaction more.

4.   Possible Clusters in Handling Time vs. CSAT Score - Some clustering in handling times suggests standardized time ranges for different customer issues.
Cases where connected_handling_time is extremely high (above 1000s) might be escalation cases or difficult issues.




## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The goal of this Exploratory Data Analysis (EDA) is to uncover key insights that enhance customer experience, optimize support operations, and boost brand loyalty. By analyzing customer complaints, response times, resolution efficiency, and agent performance, we aim to:

1️⃣ Improve Response & Resolution Times – Identify bottlenecks and optimize agent workload.

2️⃣ Enhance Customer Satisfaction – Detect common complaint patterns and improve service quality.

3️⃣ Optimize Staffing & Scheduling – Allocate resources efficiently based on peak complaint hours.

4️⃣ Reduce Escalations & Churn – Identify dissatisfied customers early and take proactive measures.

5️⃣ Data-Driven Decision Making – Empower Flipkart to enhance CSAT (Customer Satisfaction Score) and retain loyal customers.

# **Conclusion**

📌 Conclusion: Flipkart Customer Support EDA 🚀

The deep dive into Flipkart's customer support data has unboxed critical insights that can transform customer service into a competitive advantage. Here's what we discovered:

✅ Faster Response = Happier Customers 🎯

*   Delays in response & resolution times negatively impact CSAT.

*   Optimizing agent shifts, especially night shifts, can improve efficiency.

✅ Peak Complaint Trends = Smarter Staffing 📅


*   Fridays & Mondays see the highest complaints—align resources accordingly.

*   12 PM complaint spike demands better lunch-hour support planning.


✅ Premium Customers = Priority Support 👑


*   High-ticket purchases (₹20K+) have higher complaint rates; dedicated VIP support can boost retention.


⚠ Red Flags & Fixes:


*   Long response times (>24 hrs) hurt brand loyalty—need faster resolutions! ⏳

*   Some agents consistently underperform—train them using top performers' strategies! 🏆













### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***