# Optimizing IT Support Team Performance Using Analytics
## Milestone 1: Data Preparation & Feature Engineering

This notebook documents the steps involved in loading, cleaning, and preparing customer support ticket data for performance analysis.



In [9]:
import pandas as pd
import numpy as np



## Dataset Overview

The dataset used in this project contains real-world customer support ticket records including ticket priority, response time, resolution time, and customer satisfaction ratings.


In [10]:
df = pd.read_csv("/content/customer_support_tickets.csv")
df.head()


Unnamed: 0,Ticket ID,Customer Name,Customer Email,Customer Age,Customer Gender,Product Purchased,Date of Purchase,Ticket Type,Ticket Subject,Ticket Description,Ticket Status,Resolution,Ticket Priority,Ticket Channel,First Response Time,Time to Resolution,Customer Satisfaction Rating
0,1,Marisa Obrien,carrollallison@example.com,32,Other,GoPro Hero,2021-03-22,Technical issue,Product setup,I'm having an issue with the {product_purchase...,Pending Customer Response,,Critical,Social media,2023-06-01 12:15:36,,
1,2,Jessica Rios,clarkeashley@example.com,42,Female,LG Smart TV,2021-05-22,Technical issue,Peripheral compatibility,I'm having an issue with the {product_purchase...,Pending Customer Response,,Critical,Chat,2023-06-01 16:45:38,,
2,3,Christopher Robbins,gonzalestracy@example.com,48,Other,Dell XPS,2020-07-14,Technical issue,Network problem,I'm facing a problem with my {product_purchase...,Closed,Case maybe show recently my computer follow.,Low,Social media,2023-06-01 11:14:38,2023-06-01 18:05:38,3.0
3,4,Christina Dillon,bradleyolson@example.org,27,Female,Microsoft Office,2020-11-13,Billing inquiry,Account access,I'm having an issue with the {product_purchase...,Closed,Try capital clearly never color toward story.,Low,Social media,2023-06-01 07:29:40,2023-06-01 01:57:40,3.0
4,5,Alexander Carroll,bradleymark@example.com,67,Female,Autodesk AutoCAD,2020-02-04,Billing inquiry,Data loss,I'm having an issue with the {product_purchase...,Closed,West decision evidence bit.,Low,Email,2023-06-01 00:12:42,2023-06-01 19:53:42,1.0


In [11]:
df.shape
df.info()
df.columns


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8469 entries, 0 to 8468
Data columns (total 17 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Ticket ID                     8469 non-null   int64  
 1   Customer Name                 8469 non-null   object 
 2   Customer Email                8469 non-null   object 
 3   Customer Age                  8469 non-null   int64  
 4   Customer Gender               8469 non-null   object 
 5   Product Purchased             8469 non-null   object 
 6   Date of Purchase              8469 non-null   object 
 7   Ticket Type                   8469 non-null   object 
 8   Ticket Subject                8469 non-null   object 
 9   Ticket Description            8469 non-null   object 
 10  Ticket Status                 8469 non-null   object 
 11  Resolution                    2769 non-null   object 
 12  Ticket Priority               8469 non-null   object 
 13  Tic

Index(['Ticket ID', 'Customer Name', 'Customer Email', 'Customer Age',
       'Customer Gender', 'Product Purchased', 'Date of Purchase',
       'Ticket Type', 'Ticket Subject', 'Ticket Description', 'Ticket Status',
       'Resolution', 'Ticket Priority', 'Ticket Channel',
       'First Response Time', 'Time to Resolution',
       'Customer Satisfaction Rating'],
      dtype='object')

In [12]:
df.isnull().sum()


Unnamed: 0,0
Ticket ID,0
Customer Name,0
Customer Email,0
Customer Age,0
Customer Gender,0
Product Purchased,0
Date of Purchase,0
Ticket Type,0
Ticket Subject,0
Ticket Description,0


In [13]:
df['Ticket Priority'].value_counts()
df['Ticket Status'].value_counts()
df['Ticket Type'].value_counts()


Unnamed: 0_level_0,count
Ticket Type,Unnamed: 1_level_1
Refund request,1752
Technical issue,1747
Cancellation request,1695
Product inquiry,1641
Billing inquiry,1634


In [14]:
df = df.drop(columns=[
    'Customer Name',
    'Customer Email',
    'Customer Age',
    'Customer Gender'
])


## Datetime Processing

Timestamp columns are converted into datetime format to enable accurate calculation of resolution duration and other time-based performance metrics.


In [15]:
df['Time to Resolution'] = pd.to_datetime(
    df['Time to Resolution'],
    errors='coerce'
)

df['First Response Time'] = pd.to_datetime(
    df['First Response Time'],
    errors='coerce'
)

df[['First Response Time', 'Time to Resolution']].dtypes

df['Resolution_Duration_Hours'] = (
    df['Time to Resolution'] - df['First Response Time']
).dt.total_seconds() / 3600

df['Resolution_Duration_Hours'] = df['Resolution_Duration_Hours'].fillna(
    df['Resolution_Duration_Hours'].median()
)


In [16]:
df['First_Response_Duration_Hours'] = (
    df['First Response Time'] - df['First Response Time'].min()
).dt.total_seconds() / 3600

df['Resolution_Efficiency'] = (
    df['Resolution_Duration_Hours'] / df['First_Response_Duration_Hours']
)

df['Resolution_Efficiency'] = df['Resolution_Efficiency'].replace(
    [np.inf, -np.inf], np.nan
)

df['Resolution_Efficiency'] = df['Resolution_Efficiency'].fillna(
    df['Resolution_Efficiency'].median()
)


In [17]:
priority_map = {
    'Low': 1,
    'Medium': 2,
    'High': 3,
    'Critical': 4
}

df['Priority_Score'] = df['Ticket Priority'].map(priority_map)


In [18]:
df.isnull().sum()


Unnamed: 0,0
Ticket ID,0
Product Purchased,0
Date of Purchase,0
Ticket Type,0
Ticket Subject,0
Ticket Description,0
Ticket Status,0
Resolution,5700
Ticket Priority,0
Ticket Channel,0


## Feature Engineering and Final Output

New performance-related features such as resolution duration, resolution efficiency, and priority score were engineered. The cleaned dataset is saved for use in subsequent milestones.


In [19]:
df.to_csv("/content/cleaned_customer_support_tickets.csv", index=False)
