# Executive Health Check: Support Operations (Chapter 1)
**Project Lead**: Senior Data Analyst
**Status**: Data Audit & Descriptive Baseline

## 1. The Strategic Context
Before implementing advanced AI solutions (see **Chapter 2: SLA Optimization**), we must first establish the "Ground Truth" of our Support Operations. 
In this **Executive Health Check (Chapter 1)**, we perform a forensic audit of the dataset to answer fundamental business questions:
1.  **Data Integrity**: Do we trust our logs? (Timestamps, Missing Data)
2.  **Volume Analysis**: What are people complaining about? (Product/Category Breakdown)
3.  **Customer Sentiment**: Are we keeping our customers happy? (CSAT Analysis)

### The Roadmap
-   **Chapter 1 (This Notebook)**: *Diagnosis & Data Health*.
-   **Chapter 2**: *The Cure (Predictive Modeling & Shift Optimization)*.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

sns.set_theme(style="whitegrid")
pd.set_option("display.max_columns", None)

## 2. Ingesting the Raw Logs

In [None]:
df = pd.read_csv('../data/customer_support_tickets.csv')
print(f"Dataset shape: {df.shape}")
df.head()

## 3. Forensic Data Audit
We need to ensure the data is reliable before making decisions. 
**Red Flag 1**: Missing data in 'Resolution' and 'Satisfaction Rating' suggests many tickets are either open or data capture failed.
**Red Flag 2**: Timestamps need standardization.

In [None]:
# Check for missing values and data types
df.info()

In [None]:
# Check for duplicates
print(f"Duplicate rows: {df.duplicated().sum()}")

# Check for unique values in categorical columns
cat_cols = ['Ticket Type', 'Ticket Status', 'Ticket Priority', 'Ticket Channel', 'Product Purchased']
for col in cat_cols:
    print(f"\nUnique values in {col}:")
    print(df[col].unique())

### Data Hygiene Actions
1. **Imputation**: We fill missing CSAT scores with the median to prevent skewing averages.
2. **Type Casting**: Converting string dates to datetime objects for temporal analysis.

In [None]:
# Convert date columns
date_cols = ['Date of Purchase', 'First Response Time', 'Time to Resolution']
for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

# Simple handle for missing values (if any)
df['Customer Satisfaction Rating'] = df['Customer Satisfaction Rating'].fillna(df['Customer Satisfaction Rating'].median())
df['Resolution'] = df['Resolution'].fillna("No resolution recorded")

df.info()

## 4. Volume Analysis: What drives our workload?
Understanding the "Why" behind ticket volume is critical for staffing.
**Observation**: Is there a dominant Ticket Type or Priority level?

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Ticket Type', hue='Ticket Priority', palette='viridis')
plt.title('Board View: Where is the pressure coming from?')
plt.xticks(rotation=45)
plt.show()

## 5. Summary Statistics Baseline
This establishes our current operating baseline. Any future optimizations (Chapter 2) will be measured against these numbers.

In [None]:
display(df.describe(include='all'))

## 6. Executive Chapter 1 Conclusion
### Findings:
1.  **Data Quality**: The dataset required cleaning, particularly in timestamps and null handling for open tickets. We have stabilized it for analysis.
2.  **Workload Drivers**: We see significant volume in specific categories (visualized above).
3.  **Baseline Established**: We now have a clean dataset ready for advanced SLA analysis.

### Next Steps (Transition to Chapter 2)
With the data health confirmed, we move to **Chapter 2: Support Operations & SLA Optimization** to predict breaches and optimize shifts.