<a href="https://colab.research.google.com/github/sohanmahamuni/EDA_Projects/blob/main/Flipkart_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Flipkart Customer Service Satisfaction Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

In today’s competitive e-commerce landscape, customer service plays a crucial role in sustaining brand loyalty and improving customer retention. Flipkart, being a leading e-commerce platform, continuously strives to enhance customer satisfaction to differentiate itself from competitors. This project aims to analyze customer interactions, feedback, and satisfaction scores collected from various customer service channels to identify key factors influencing customer satisfaction.

By performing Exploratory Data Analysis (EDA), we will uncover patterns and insights that can help optimize service performance. We will explore various attributes such as issue response time, customer remarks, CSAT (Customer Satisfaction) scores, product categories, and agent performance. Through structured visualizations, we aim to pinpoint service inefficiencies and highlight improvement areas.

This study will assist Flipkart in designing data-driven strategies to improve service experience, reduce response time, and enhance overall customer satisfaction.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**The project seeks to analyze Flipkart’s customer service satisfaction data to determine the factors that impact CSAT scores and identify actionable insights that can improve the overall service quality.**

#### **Define Your Business Objective?**

The primary objective is to understand the factors influencing customer satisfaction and provide insights to improve service quality. This includes:

* Identifying key drivers of customer dissatisfaction.

* Evaluating service agents’ efficiency.

* Understanding customer behavior trends and expectations.

* Reducing issue resolution time.

* Enhancing overall customer service experience.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv("Customer_support_data.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(), cmap="viridis")
plt.show()

1. Columns with Missing Data:

* Sub-category
* Customer Remarks
* order_date_time
* Survey_response_Date
* Customer_city

These columns show yellow lines, meaning they have missing values in multiple rows.

2. Columns with No Missing Data:

Unique id, channel_name, category, connected_handling_time, CSAT Score, etc., are completely purple, meaning they are fully populated (no missing values).

3. Extent of Missing Data:

Some columns have significant missing data (e.g., Survey_response_Date and Customer_city), while others have less frequent gaps (e.g., Customer Remarks).

### What did you know about your dataset?

The dataset contains customer interaction records with details like order and issue timestamps, customer feedback, channel used, product category, agent performance metrics, and CSAT scores. There are missing values in some fields and some duplicate records. The dataset is moderately large, with several categorical and numerical variables that require cleaning and standardization.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

The dataset includes 20+ variables, such as customer service channel, product category, order date, issue response time, agent details, and CSAT scores. These variables offer insights into the interaction process from order placement to customer feedback.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in df.columns:
    print(f"{col}: {df[col].nunique()} unique values")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Example: Converting date columns
df['order_date_time'] = pd.to_datetime(df['order_date_time'], dayfirst=True)

df['Issue_reported at'] = pd.to_datetime(df['Issue_reported at'], dayfirst=True)

df['issue_responded'] = pd.to_datetime(df['issue_responded'], dayfirst=True)

df['Survey_response_Date'] = pd.to_datetime(df['Survey_response_Date'], dayfirst=True)


# Creating response time column
df['Response Time (hrs)'] = (df['issue_responded'] - df['Issue_reported at']).dt.total_seconds() / 3600

### What all manipulations have you done and insights you found?

* Converted datetime columns for time-based analysis.
* Created a Response Time metric to evaluate agent responsiveness.
* Cleaned missing values where appropriate.
* Insights: Longer response times correlate with lower CSAT scores.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

## 1. **Univariate Analysis**

#### Chart - 1

In [None]:
# Chart - 1 visualization code
#Distribution of CSAT Score
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8,6))
sns.countplot(x='CSAT Score', data=df, palette='Blues')
plt.title('Distribution of CSAT Scores')
plt.xlabel('CSAT Score')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart effectively shows the frequency of each score.
It helps understand how satisfied customers are overall, and which score is most common.


##### 2. What is/are the insight(s) found from the chart?

* The majority of customers gave a CSAT score of 5, indicating high satisfaction with the customer service experience.

* There’s also a noticeable number of customers who rated 1 and 4, showing some dissatisfaction and neutral experiences.
* Very few customers gave a score of 2 or 3, indicating less moderate feedback—customers tend to feel strongly positive or negative.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* The presence of many CSAT score 1 ratings is a red flag.
It indicates a subset of dissatisfied customers, which may lead to customer churn, negative reviews, or complaints.

* Root cause analysis should be done to investigate why some customers are unhappy (e.g., agent performance, handling time, issue resolution).


#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Distribution of Agent Shift
plt.figure(figsize=(8,6))
sns.countplot(x='Agent Shift', data=df, palette='pastel')
plt.title('Agent Shift Distribution')
plt.xlabel('Shift')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

Agent Shift is a categorical variable, and a bar chart effectively displays the count of occurrences for each category.

##### 2. What is/are the insight(s) found from the chart?

* The Morning shift has the highest number of agents (or support cases handled), followed by the Evening shift. Afternoon, Split, and Night shifts have significantly fewer cases/agents in comparison.

* This suggests that business operations or customer queries peak during morning and evening, requiring more agents during those times. Alternatively, it could indicate staffing preference or resource allocation towards the morning/evening shifts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Low staffing in night or split shifts could lead to unattended customer issues during those hours, impacting global customers in different time zones.
* If customer demand is high at night but staffing is low, this could lead to delayed responses, poor CSAT scores, and customer churn.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Count of Interaction Categories
plt.figure(figsize=(10,6))
sns.countplot(y='category', data=df, order=df['category'].value_counts().index, palette='viridis')
plt.title('Interaction Category Distribution')
plt.xlabel('Count')
plt.ylabel('Category')
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart is ideal for comparing the frequency of different customer support interaction types.The Category variable is categorical with long labels, making a horizontal layout more readable.

##### 2. What is/are the insight(s) found from the chart?

* The top interaction categories are Returns and Order Related issues, with Returns being the highest by a large margin.
* Categories like Refund Related, Product Queries, and Payments related have moderate interaction volumes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* The high volume of return-related queries could indicate product quality issues, sizing problems, or delivery failures. If not addressed, this leads to higher operational costs and customer dissatisfaction.
* Excessive returns also impact profitability and brand reputation, suggesting the need for cross-functional collaboration (product, logistics, customer service).

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Tenure Bucket Distribution
plt.figure(figsize=(8,6))
sns.countplot(x='Tenure Bucket', data=df, palette='Set2')
plt.title('Agent Tenure Distribution')
plt.xlabel('Tenure Bucket')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts allow for clear visualization of differences in the number of agents with varying experience levels.
It helps quickly spot trends in workforce distribution, which is crucial for understanding performance, productivity, and training needs.


##### 2. What is/are the insight(s) found from the chart?

* The highest number of agents fall in the ">90 days" tenure bucket, indicating a large experienced workforce.
* A significant portion of agents are in "On Job Training", suggesting ongoing recruitment or high turnover.
* Mid-tenure buckets (0–90 days) have relatively lower counts, especially 61–90 days, indicating possible attrition or fast progression to the ">90" category.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* A high volume of new or training agents may also indicate high attrition, which leads to increased training costs and potential service quality issues.
* If too many agents remain in training for extended periods, it could strain resources and affect operational efficiency.
* Low retention in 61–90 days might suggest burnout, dissatisfaction, or mismanagement, which can negatively impact business performance and customer experience.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Product Category Distribution

plt.figure(figsize=(10,6))
sns.countplot(y='Product_category', data=df, order=df['Product_category'].value_counts().index, palette='coolwarm')
plt.title('Product Category Distribution')
plt.xlabel('Count')
plt.ylabel('Product Category')
plt.show()

##### 1. Why did you pick the specific chart?

It efficiently displays categorical comparisons, especially when category names are long (e.g., "Books & General merchandise").
It helps easily compare product popularity or interaction frequency at a glance.

##### 2. What is/are the insight(s) found from the chart?

* Electronics is the most interacted category, indicating high customer activity or demand.
Lifestyle and Books & General merchandise also have strong presence, suggesting they are key revenue drivers.

* Mid-tier categories like Mobile, Home, and Home Appliances show moderate engagement, offering potential growth opportunities.

* GiftCards and Affiliates have minimal engagement, possibly signaling low sales volume or limited product range.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* High engagement in Electronics might also imply high return rates or complaints, which could impact profit margins if not addressed.
* Neglected categories (e.g., GiftCards) could represent lost revenue opportunities if not analyzed further to understand customer disinterest.

## 2. **Bivariate Analysis**

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# CSAT Score by Agent Shift
plt.figure(figsize=(8,6))
sns.boxplot(x='Agent Shift', y='CSAT Score', data=df, palette='Accent')
plt.title('CSAT Score by Agent Shift')
plt.xlabel('Agent Shift')
plt.ylabel('CSAT Score')
plt.show()

##### 1. Why did you pick the specific chart?

It effectively displays the distribution of CSAT (Customer Satisfaction) scores across different agent shifts.
It highlights medians, variability, and outliers in customer satisfaction, which are crucial for operational decision-making.
It allows quick comparison of shifts to identify which time slots perform best or worst in terms of customer experience.

##### 2. What is/are the insight(s) found from the chart?

* All shifts generally maintain high CSAT scores, with medians around 4 to 5, indicating overall good customer service.
* The Split shift shows the highest consistency with very few outliers and scores tightly clustered near 5, suggesting exceptional performance.
* Morning, Evening, Afternoon, and Night shifts show more variability and outliers, including low scores (1-2), indicating inconsistent customer experiences.
* Outliers across all shifts could signal occasional service issues or escalations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Persistent outliers with low scores could reflect underlying issues, such as agent fatigue, poor training, or system inefficiencies in certain shifts.
* If ignored, these dips in satisfaction might lead to negative reviews, loss of customer trust, and reduced repeat business.


#### Chart - 7

In [None]:
# Chart - 7 visualization code
# CSAT Score by Tenure Bucket
plt.figure(figsize=(8,6))
sns.boxplot(x='Tenure Bucket', y='CSAT Score', data=df, palette='Spectral')
plt.title('CSAT Score by Agent Tenure')
plt.xlabel('Tenure Bucket')
plt.ylabel('CSAT Score')
plt.show()

##### 1. Why did you pick the specific chart?

It's an ideal choice to explore whether agent experience (tenure) correlates with customer satisfaction.It allows comparison of CSAT score distributions across different tenure buckets.

##### 2. What is/are the insight(s) found from the chart?

* All tenure buckets maintain high median CSAT scores (~5), indicating consistent customer satisfaction across experience levels.
* On Job Training (OJT) agents surprisingly have similar or slightly higher performance in CSAT scores compared to experienced agents.
* Each tenure group shows similar variability and presence of outliers, especially at low scores (1-2), suggesting occasional service lapses regardless of tenure.
* The tenure length doesn't significantly impact CSAT, which suggests new and experienced agents perform similarly from a customer satisfaction standpoint.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Confidence in newly trained agents to deliver high customer satisfaction, possibly reducing training costs or time to deploy agents on the floor.
* The consistent CSAT performance across tenures indicates a robust support system, helping ensure uniform service standards.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Handling Time vs CSAT Score
plt.figure(figsize=(8,6))
sns.scatterplot(x='connected_handling_time', y='CSAT Score', data=df, alpha=0.6)
plt.title('Handling Time vs CSAT Score')
plt.xlabel('Handling Time (min)')
plt.ylabel('CSAT Score')
plt.show()

##### 1. Why did you pick the specific chart?

It effectively shows the relationship (or lack thereof) between two continuous variables — in this case, Handling Time (min) and CSAT Score.
Ideal for identifying patterns, clusters, and outliers to understand if longer handling times correlate with customer satisfaction.

##### 2. What is/are the insight(s) found from the chart?

* The majority of CSAT scores are 5, regardless of handling time, especially when handling time is below ~1000 minutes.
* There's a significant cluster of high CSAT scores (4–5) for handling times ranging from 0 to 750 minutes, showing that customers are generally satisfied even with moderate handling durations.
* Very long handling times (>1000 mins) are rare, and associated CSAT scores are often low (1–2), indicating dissatisfaction for extremely prolonged cases.
* Low CSAT scores (1–2) are scattered across all handling times but concentrated more as handling time increases, suggesting longer resolutions may risk customer dissatisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Handling times beyond 1000 minutes are a risk, often correlating with low satisfaction.
* These long cases can lead to customer churn, reputational damage, and increased operational costs if not managed well.
* Ignoring prolonged handling times can lead to inefficient service operations and potential loss in revenue.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# Issue Reported vs Responded Time Gap (in hours)
df['Issue_reported at'] = pd.to_datetime(df['Issue_reported at'], format='%d/%m/%Y %H:%M')
df['issue_responded'] = pd.to_datetime(df['issue_responded'], format='%d/%m/%Y %H:%M')

# Calculate time gap
df['response_time_hrs'] = (df['issue_responded'] - df['Issue_reported at']).dt.total_seconds() / 3600

plt.figure(figsize=(10,6))
sns.histplot(df['response_time_hrs'], bins=30, kde=True, color='purple')
plt.title('Distribution of Response Time (hours)')
plt.xlabel('Response Time (hours)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

* It effectively shows the distribution and frequency of a continuous variable — Response Time (hours).
Helps identify central tendencies, spread, and outliers.
* The KDE curve provides a smooth estimate of the probability density, making it easier to understand patterns and skewness in the data.

##### 2. What is/are the insight(s) found from the chart?

* The majority of response times are concentrated between 0 to 5 hours, with a sharp peak close to 0–1 hour, indicating quick response rates in most cases.
* The distribution is right-skewed, with a long tail extending beyond 20 hours, suggesting a small proportion of cases take significantly longer to respond to.
* There are some negative values in the data (visible to the left of zero), which may indicate data quality issues or incorrect timestamp calculations that need addressing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Long-tail delays in response time could lead to customer dissatisfaction, especially if expectations are unmet.
* Presence of negative response times may lead to erroneous reporting and decision-making, indicating the need for data validation and cleansing. If unaddressed, this could impact trust in metrics and hinder performance optimization.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
# CSAT by Category
plt.figure(figsize=(10,6))
sns.boxplot(y='category', x='CSAT Score', data=df, palette='spring')
plt.title('CSAT Score by Interaction Category')
plt.xlabel('CSAT Score')
plt.ylabel('Interaction Category')
plt.show()

##### 1. Why did you pick the specific chart?

* It is ideal for comparing CSAT scores across multiple interaction categories.
Highlights median, quartiles, and outliers, giving a complete view of the distribution of satisfaction for each category.
* The horizontal format ensures readability for long category names, improving clarity and interpretation.

##### 2. What is/are the insight(s) found from the chart?

* Most categories have high median CSAT scores around 5, indicating generally positive customer satisfaction across the board.
* ‘Others’ category has the widest spread of CSAT scores, with some very low scores (as low as 1), indicating inconsistent experiences in this category.
* A few outliers exist in every category, suggesting isolated poor experiences that could be further investigated.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* If the ‘Others’ category is not addressed, it could drag down overall satisfaction and brand perception due to inconsistent support.
* Outliers in sensitive categories (e.g., Refund Related, Payments related) might lead to negative reviews or churn if not resolved proactively.

## 3. **Multivariate Analysis**

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# Response Time by Agent Shift
plt.figure(figsize=(8,6))
sns.boxplot(x='Agent Shift', y='response_time_hrs', data=df, palette='Set1')
plt.title('Response Time by Agent Shift')
plt.xlabel('Agent Shift')
plt.ylabel('Response Time (hrs)')
plt.show()

##### 1. Why did you pick the specific chart?


Boxplot effectively displays the distribution, median, and outliers of response times for each agent shift. This helps in easily comparing central tendencies (medians) and variability across different shifts, which is essential to understand operational efficiency and consistency among shifts.

##### 2. What is/are the insight(s) found from the chart?

* All shifts show a similar median response time, but there are significant outliers in each shift, especially in the Split and Night shifts, indicating occasional very high response times.
* Split and Night shifts appear to have more extreme outliers, suggesting inconsistency or lower staffing during these times, which could lead to delays.
* Morning and Evening shifts show a more compact distribution, implying more consistent performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

If the inconsistent response times during certain shifts are not addressed, it may result in customer dissatisfaction, especially for late-night or Split shift queries, leading to loss of trust and customers, thereby negatively impacting revenue.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# Average Handling Time by Category
avg_handling = df.groupby('category')['connected_handling_time'].mean().sort_values()

plt.figure(figsize=(10,6))
avg_handling.plot(kind='barh', color='skyblue')
plt.title('Average Handling Time by Category')
plt.xlabel('Avg Handling Time (min)')
plt.ylabel('Category')
plt.show()

##### 1. Why did you pick the specific chart?

horizontal bar chart because it is ideal for:

Comparing categorical variables (categories of customer queries) against a numeric metric (average handling time).

It provides a clear, visual hierarchy, making it easy to spot which categories take more or less time.

##### 2. What is/are the insight(s) found from the chart?

* Payments related queries have the highest average handling time (1100 min), followed by Refund Related (900 min).
* Categories like Product Queries and Order Related have the lowest handling times (~300 min).
* Categories like Onboarding, Offers & Cashback, Others have zero or negligible handling times, possibly due to low volume or simplified processes.
* App/Website and Shopzilla Related have moderate handling times, suggesting room for process optimization.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* If Payments and Refund issues continue to take excessive time, it can lead to customer frustration, negative reviews, and churn, especially in an e-commerce or fintech setting.
* Delayed resolutions in high-value categories (like payments) can impact customer trust and loyalty — directly hitting revenue.
* The imbalance in handling times could also result in agent burnout in high-load categories, affecting morale and service quality.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# CSAT by Product Category
plt.figure(figsize=(10,6))
sns.boxplot(y='Product_category', x='CSAT Score', data=df, palette='cool')
plt.title('CSAT Score by Product Category')
plt.xlabel('CSAT Score')
plt.ylabel('Product Category')
plt.show()

##### 1. Why did you pick the specific chart?

It’s perfect for visualizing the distribution of CSAT scores across different product categories.
It shows central tendency (median), spread (interquartile range), and outliers, giving a complete picture of customer satisfaction for each category.

##### 2. What is/are the insight(s) found from the chart?

* Most product categories have high median CSAT scores, generally above 4.0, indicating overall good customer satisfaction.
* Categories like Mobile, Home Appliances, Furniture, and GiftCards have consistent high satisfaction with minimal variability.
* GiftCards show the highest and most consistent CSAT score, possibly due to low complexity in service.
* Lifestyle and Electronics have more spread and visible outliers, indicating variability in customer experiences—some customers are very dissatisfied (scores around 1-2).
* Outliers are visible across all categories, meaning some individual experiences are notably negative, even in well-rated categories.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Ignoring outliers and variability in Lifestyle and Electronics could lead to customer churn, negative word-of-mouth, and lower repeat purchases.
* Persistently low individual experiences can hurt brand reputation, especially in categories with high market competition.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(8,6))
sns.heatmap(df[['connected_handling_time', 'response_time_hrs', 'CSAT Score']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

It visually displays the strength and direction of relationships between multiple numerical variables.
Makes it easy to identify patterns or weak/strong correlations at a glance.

##### 2. What is/are the insight(s) found from the chart?

* There is a weak negative correlation between response_time_hrs and CSAT Score (-0.15), indicating that as response time increases, CSAT Score tends to decrease slightly.
* The correlation between connected_handling_time and CSAT Score is very weakly positive (0.048), suggesting minimal impact of handling time on satisfaction.
* Response Time and Handling Time have a very weak negative correlation (-0.077), meaning they’re largely independent.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df[['connected_handling_time', 'response_time_hrs', 'CSAT Score']], diag_kind='kde')
plt.show()

##### 1. Why did you pick the specific chart?

It shows the relationships between multiple variables through scatterplots, along with their individual distributions.
It helps to detect patterns, trends, and outliers visually across all variable pairs.

##### 2. What is/are the insight(s) found from the chart?

* CSAT Score Distribution is skewed toward higher values (4 and 5), indicating generally high satisfaction.
* Response Time (hrs) is heavily skewed towards low values, but there are some outliers with very high response times.
* Connected Handling Time also skews toward lower values, with some long-duration outliers.
* There’s no strong visual trend between connected_handling_time and CSAT Score or response_time_hrs and CSAT Score, confirming weak correlations seen earlier.
* A potential cluster at low response time + high CSAT hints that fast responses may contribute to better satisfaction, but it’s not definitive.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The primary business objective is to enhance customer satisfaction (CSAT) while optimizing handling and response times to improve overall service efficiency. Based on insights from the EDA, here are the actionable suggestions:

1. Optimize Handling Times for Specific Categories:

* Focus on reducing handling time for Payments and Refund-related issues, which currently have the highest average handling times.
* Introduce automated workflows or dedicated support teams for these categories to expedite resolution.

2. Improve Shift-wise Efficiency:

* Monitor and optimize agent shifts, especially during Split and Night shifts where response times are more variable.
* Implement AI-powered chatbots during off-peak shifts to reduce response delays.
3. Address CSAT Variability:

* Investigate and improve customer experience in Lifestyle and Electronics categories, which show high variation in CSAT scores.
* Collect qualitative feedback from low-CSAT customers to identify pain points.

# **Conclusion**

The EDA uncovered key insights into customer satisfaction trends, response times, and category-wise service performance. While the overall CSAT is high, challenges exist in certain categories and shifts, especially with long handling times for payment/refund issues and variable satisfaction in Lifestyle and Electronics.

By implementing targeted improvements, leveraging automation, and fostering a data-driven service strategy, Flipkart can achieve its goal of delivering a superior customer service experience, driving higher customer retention, and creating positive business growth. Regular monitoring and iterative enhancements will ensure long-term success.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***