# Customer Service & Customer Experience Analytics (Student Version)

This notebook supports the **Customer Service & Customer Experience** session using a ticket-level dataset.

Constraints for this topic:
Do not use OTIF, fill rate, or backorders.
Focus on service interaction dynamics and customer perception.

Deliverables you must produce:
A clean KPI table with 5 experience KPIs
Descriptive, Diagnostic, Predictive (system math), Prescriptive (decision rules) outputs
A short managerial summary based on evidence


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 200)


## 1) Load data

Dataset: `customer_service_experience_tickets.csv`

Expected grain:
One row = one customer ticket (issue reported + service handling outcomes).


In [None]:
# TODO: Load the dataset
path = "https://raw.githubusercontent.com/saikisri97/17_Hof_Lecture_Code_Pingo/refs/heads/main/Supply_Chain_Analytics/data/customer_service_experience_tickets.csv"
df = pd.read_csv(path)

# TODO: Parse time columns as datetimes
time_cols = ["Reported_Time", "First_Response_Time", "Resolution_Time"]
for c in time_cols:
    df[c] = pd.to_datetime(df[c])

df.head()


## 2) Data dictionary (write your own)

Task: Read the dataframe and understand what each column means.


In [None]:
# TODO: Read the columns of DataFrame and understand each meaning 
data_dict = pd.DataFrame([
    ("Ticket_ID", "Unique identifier of the customer service ticket"),
    ("Reported_Time", "When the customer issue enters the service system (arrival time)"),
    ("Issue_Type", "Customer-reported problem category that drives complexity and routing"),
    ("Channel", "Contact channel; influences responsiveness expectations and handling style"),
    ("Region", "Operational market; may reflect staffing coverage and language complexity"),
    ("Customer_Segment", "B2C vs SMB vs Enterprise; influences priority and handling depth"),
    ("Priority", "Service priority classification; should influence triage and escalation"),
    ("Backlog_At_Report", "Estimated queue size (open work) when the ticket arrives"),
    ("Response_Minutes", "Time from Reported_Time to First_Response_Time (speed of acknowledgment)"),
    ("Resolution_Minutes", "Time from Reported_Time to Resolution_Time (time in system)"),
    ("Escalation_Flag", "1 if ticket required higher tier handling; indicates complexity or failure"),
    ("First_Contact_Resolution_Flag", "1 if resolved without follow-up; a proxy for process effectiveness"),
    ("Repeat_Contact_Flag", "1 if the customer contacted again; indicates unresolved friction"),
    ("Customer_Effort_Score", "1–7 score; higher means more effort for customer"),
    ("CSAT_Score", "1–5 satisfaction score; outcome-oriented perception metric"),
    ("Sentiment_Score", "Numeric sentiment proxy (-1 to +1); early warning for churn risk"),
    ("Outcome", "Resolved/Partially/Unresolved; quality of closure"),
    ("Team", "Owning service team; used for diagnostics and capacity planning"),
], columns=["Column","Meaning"])
data_dict


## 3) Data quality checks

Minimum checks:
Missing values by column
Duplicate Ticket_ID
Logical checks:
First_Response_Time >= Reported_Time
Resolution_Time >= First_Response_Time
Response_Minutes and Resolution_Minutes non-negative


In [None]:
# TODO: Missing values
missing = df.isna().mean().sort_values(ascending=False)
missing.head(15)


In [None]:
# TODO: Duplicate Ticket_ID
df["Ticket_ID"].duplicated().sum()


In [None]:
# TODO: Logical time checks
bad_response = (df["First_Response_Time"] < df["Reported_Time"]).sum()
bad_resolution = (df["Resolution_Time"] < df["First_Response_Time"]).sum()
bad_response, bad_resolution


## 4) KPI engineering (exactly 5 KPIs)

You must compute these 5 KPIs :

1) Average Response Time (minutes)
2) First Contact Resolution Rate
3) Mean Resolution Time (minutes)
4) Average Customer Effort Score (CES)
5) Complaint Recurrence Probability (repeat contact rate)

Output format:
A single table with KPI name, formula description, computed value, and managerial interpretation. Calculate the values and write what each values means.


In [None]:
# TODO: Compute KPI values
avg_response = None
fcr_rate = None
mean_resolution = None
avg_ces = None
repeat_rate = None

kpi_table = pd.DataFrame([
    {"KPI": "Average Response Time (min)", "Value": avg_response, "Interpretation": "Lower is better; sets perceived responsiveness and reduces anxiety early"},
    {"KPI": "First Contact Resolution Rate", "Value": fcr_rate, "Interpretation": ""},
    {"KPI": "Mean Resolution Time (min)", "Value": mean_resolution, "Interpretation": ""},
    {"KPI": "Average Customer Effort Score (1-7)", "Value": avg_ces, "Interpretation": ""},
    {"KPI": "Complaint Recurrence Probability", "Value": repeat_rate, "Interpretation": ""},
])

kpi_table


## 5) Descriptive analytics: what is happening?

Minimum outputs:
Ticket volume trend by week
Response time distribution (e.g., histogram)
Resolution time by channel or priority (box plot or grouped summary)


In [None]:
# TODO: Add Week column
df["Week"] = df[""].dt.to_period("W").astype(str)

# TODO: Ticket volume by week
weekly = df.groupby("").size().reset_index(name="Tickets")
weekly.head()


In [None]:
# TODO: Plot weekly tickets
plt.figure()
plt.plot(weekly["Week"], weekly[""])
plt.xticks(rotation=90)
plt.title("Ticket Volume by Week")
plt.xlabel("Week")
plt.ylabel("Tickets")
plt.show()


In [None]:
# TODO: Response time histogram
plt.figure()
plt.hist(df[""], bins=30)
plt.title("Response Time Distribution (minutes)")
plt.xlabel("Response_Minutes")
plt.ylabel("Count")
plt.show()


## 6) Diagnostic analytics: why is it happening?

Goal:
Identify which drivers explain slow response and low FCR.

Minimum analyses:
Response_Minutes by Channel, Priority
FCR rate by Issue_Type and by Team
A simple correlation view for numeric drivers (Backlog_At_Report, Response_Minutes, Resolution_Minutes, CES, CSAT)


In [None]:
# TODO: Response by channel and priority (grouped summary)
res_by = (df.groupby(["",""])["Resolution_Minutes"]
          .agg(["count","mean","median"])
          .reset_index()
          .sort_values(["Priority","mean"], ascending=[True, False]))
res_by.head(15)

In [None]:
# TODO: FCR rate by issue type
fcr_by_issue = (df.groupby("")["First_Contact_Resolution_Flag"]
                 .mean()
                 .sort_values())
fcr_by_issue.head(10)


In [None]:
# TODO: Correlation table on numeric columns
num_cols = ["Backlog_At_Report","Response_Minutes","Resolution_Minutes","Handle_Minutes","Customer_Effort_Score","CSAT_Score","Sentiment_Score"]
corr = df[num_cols].corr()
corr


## 7) Predictive (Little Law): what happens if nothing changes?

No ML forecasting models.

Use system math:
Little’s Law: L = λW

Where:
L = average backlog / WIP in the service system
λ = arrival rate (tickets per hour)
W = average time in system (hours) from Reported_Time to Resolution_Time

Tasks:
Compute λ and W from data
Estimate implied L
Then run a scenario:
If arrivals increase by +15% (peak season), and capacity stays constant, what happens to W or L?
Explain in business terms.


In [None]:
# TODO: Compute arrival rate (tickets/hour) over the full period
# Observation window
t_min = df["Reported_Time"].min()
t_max = df["Reported_Time"].max()
obs_hours = (t_max - t_min).total_seconds() / 3600
# Hint: lambda = total tickets / total hours in observation window
lambda_per_hour = None

# TODO: Compute average time in system in hours
df["Time_In_System_Hours"] = (df["Resolution_Time"] - df["Reported_Time"]).dt.total_seconds() / 3600
W = df[""].mean()

# TODO: Little's Law implied backlog (calculate this)
L = None

lambda_per_hour, W, L


In [None]:
# TODO: Scenario: +15% arrivals, same capacity
lambda2 = None

# If L stays the same, what does W become?
W2 = None

# If W stays the same, what does L become?
L2 = None

(lambda2, W2, L2)


## 8) Prescriptive: what should be done?

Read through 3 decision rules that can be implemented as policy triggers, such as:

Escalate if Priority is High/Critical AND Sentiment is negative AND Response_Minutes exceeds threshold
Route Billing disputes to Billing team when backlog is above threshold
Offer proactive update when Resolution_Minutes crosses a threshold (experience recovery)

Task:
Write the rules clearly. (Due to time limitation, I have implemented them for you.)
Quantify thresholds from your own analysis (e.g., 75th percentile response time).
Then estimate how many tickets would be impacted by each rule.


In [None]:
# TODO: Choose thresholds using percentiles
p75_resp = np.percentile(df["Response_Minutes"], 75)
p75_res = np.percentile(df["Resolution_Minutes"], 75)
p75_resp, p75_res


In [1]:
# Rule 1: Escalate early for high-impact risk signals
# Condition: Priority in {High, Critical} AND Sentiment negative AND Response above 75th percentile
rule1 = (
    df["Priority"].isin(["High","Critical"]) &
    (df["Sentiment_Score"] < -0.25) &
    (df["Response_Minutes"] > p75_resp)
)
rule1_count = int(rule1.sum())

# Rule 2: Proactive status update trigger (experience recovery)
# Condition: Resolution time already beyond 75th percentile OR backlog high at report
rule2 = (
    (df["Resolution_Minutes"] > p75_res) |
    (df["Backlog_At_Report"] > np.percentile(df["Backlog_At_Report"], 80))
)
rule2_count = int(rule2.sum())

# Rule 3: Route-to-specialist policy
# Condition: Billing dispute OR Account access -> move to specialist team if not already there
rule3 = df["Issue_Type"].isin(["Billing dispute","Account access"])
rule3_count = int(rule3.sum())

pd.DataFrame([
    {"Policy Rule": "R1 Escalate early for High/Critical + negative sentiment + slow response",
     "Thresholds": f"Sentiment<-0.25 and Response>{int(p75_resp)} min",
     "Tickets impacted": rule1_count,
     "Managerial intent": "Protect churn risk by prioritizing empathy + speed for high-impact cases."},
    {"Policy Rule": "R2 Proactive update when resolution time/backlog is high",
     "Thresholds": f"Resolution>{int(p75_res)} min or Backlog>80th pct",
     "Tickets impacted": rule2_count,
     "Managerial intent": "Reduce customer anxiety and repeat contacts through proactive communication."},
    {"Policy Rule": "R3 Specialist routing for Billing disputes & Account access",
     "Thresholds": "Issue_Type in {Billing dispute, Account access}",
     "Tickets impacted": rule3_count,
     "Managerial intent": "Increase first-contact resolution by matching complexity to skill."},
])


NameError: name 'df' is not defined

## 9) Managerial wrap-up (short, evidence-based)

Write 8–12 lines:
What is happening
Why it is happening
What will happen in peak load if nothing changes (Little’s Law)
What 2–3 policies you recommend

Must reference your computed KPIs and one diagnostic finding.


_Write your wrap-up here._