## **DV - Optimizing IT Support Team Performance Using Analytics ( Supportlytics)**

### Import required libraries

In [None]:
import numpy as np
import pandas as pd

### Load the dataset

In [6]:
from google.colab import files
uploaded = files.upload()
df = pd.read_csv('ITSM_Dataset.csv')
df.head()

Saving ITSM_Dataset.csv to ITSM_Dataset.csv


Unnamed: 0,Status,Ticket ID,Priority,Source,Topic,Agent Group,Agent Name,Created time,Expected SLA to resolve,Expected SLA to first response,...,Resolution time,SLA For Resolution,Close time,Agent interactions,Survey results,Product group,Support Level,Country,Latitude,Longitude
0,Closed,TCKT-100000,High,Email,General Inquiry,Security,Khalid Al-Salem,2024-07-04 12:42:00,2024-07-04 14:42:00,2024-07-04 13:12:00,...,2024-07-04 14:30:00,Met,2024-07-04 14:32:00,5,Neutral,Cloud,L3,Oman,25.1856,50.9447
1,Closed,TCKT-100001,High,Chat,Network Issue,Customer Service,Ahmed Al-Sabah,2024-05-23 20:03:00,2024-05-23 22:03:00,2024-05-23 20:33:00,...,2024-05-23 22:00:00,Met,2024-05-23 22:05:00,4,Dissatisfied,Cloud,L2,Qatar,23.2741,55.3867
2,In Progress,TCKT-100002,Low,Phone,General Inquiry,Development,Mohammed Al-Mansoori,2024-04-13 20:51:00,2024-04-14 00:51:00,2024-04-13 21:51:00,...,2024-04-14 00:47:00,Met,2024-04-14 00:51:00,3,Dissatisfied,Software,L1,Bahrain,23.6264,50.1302
3,Resolved,TCKT-100003,Critical,Chat,Access Request,Development,Mohammed Al-Khalifa,2024-05-13 12:50:00,2024-05-13 13:50:00,2024-05-13 13:00:00,...,2024-05-13 13:48:00,Met,2024-05-13 13:53:00,5,Dissatisfied,Network,L2,Kuwait,25.0736,54.8437
4,Closed,TCKT-100004,Critical,Portal,Hardware Failure,Customer Service,Hassan Al-Nasser,2024-06-19 22:51:00,2024-06-19 23:51:00,2024-06-19 23:01:00,...,2024-06-19 23:49:00,Met,2024-06-19 23:54:00,4,Neutral,Hardware,L3,Qatar,24.7362,51.4839


### Check dataset structure

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 22 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   Status                          100000 non-null  object 
 1   Ticket ID                       100000 non-null  object 
 2   Priority                        100000 non-null  object 
 3   Source                          100000 non-null  object 
 4   Topic                           100000 non-null  object 
 5   Agent Group                     100000 non-null  object 
 6   Agent Name                      100000 non-null  object 
 7   Created time                    100000 non-null  object 
 8   Expected SLA to resolve         100000 non-null  object 
 9   Expected SLA to first response  100000 non-null  object 
 10  First response time             100000 non-null  object 
 11  SLA For first response          100000 non-null  object 
 12  Resolution time  

### Check for any missing values

In [8]:
df.isnull().sum()

Unnamed: 0,0
Status,0
Ticket ID,0
Priority,0
Source,0
Topic,0
Agent Group,0
Agent Name,0
Created time,0
Expected SLA to resolve,0
Expected SLA to first response,0


No null values found.

### Remove unwanted columns

Since SLA will not used in this study so we drop its related columns.

In [9]:
df.drop(columns=["Expected SLA to resolve", "Expected SLA to first response",
    "SLA For first response", "SLA For Resolution"],
    inplace=True)

### Converting date columns into Datetime datatype

In [10]:
time_cols = ["Created time", "First response time", "Resolution time",
             "Close time"]

for col in time_cols:
    df[col] = pd.to_datetime(df[col])

### Initial Ticket Distribution

In [12]:
print("Ticket distribution by Type")
print(df['Topic'].value_counts())

Ticket distribution by Type
Topic
General Inquiry     20254
Network Issue       20053
Hardware Failure    20027
Access Request      19923
Software Bug        19743
Name: count, dtype: int64


In [13]:
print("Ticket distribution by Priority")
print(df['Priority'].value_counts())

Ticket distribution by Priority
Priority
Medium      25117
Critical    25045
Low         25014
High        24824
Name: count, dtype: int64


In [14]:
print("Ticket distribution by Category")
print(df['Product group'].value_counts())

Ticket distribution by Category
Product group
Hardware    25087
Cloud       25029
Network     25023
Software    24861
Name: count, dtype: int64


In [15]:
print("Ticket Distribution by Source")
print(df['Source'].value_counts())

Ticket Distribution by Source
Source
Chat      25140
Portal    25025
Phone     24972
Email     24863
Name: count, dtype: int64


### Feature Engineering

Adding three new columns:
1. Resolution_Duration (in hours)
2. First_Response_Duration (in minutes)
3. Priority_Score

In [22]:
df["Resolution_Duration"] = ((df["Resolution time"] - df["Created time"]).dt.total_seconds() / 3600).round(2)

df["First_Response_Duration"] = ((df["First response time"] - df["Created time"]).dt.total_seconds() / 60).round(2)

priority_map = {'Low': 1, 'Medium': 2, 'High': 3, 'Critical': 4}
df['Priority_Score'] = df['Priority'].map(priority_map)

Verify new features:

In [23]:
df[["Created time", "First response time", "Resolution time","Close time",
"Resolution_Duration", "First_Response_Duration", "Priority_Score"]].head()

Unnamed: 0,Created time,First response time,Resolution time,Close time,Resolution_Duration,First_Response_Duration,Priority_Score
0,2024-07-04 12:42:00,2024-07-04 13:02:00,2024-07-04 14:30:00,2024-07-04 14:32:00,1.8,20.0,3
1,2024-05-23 20:03:00,2024-05-23 20:25:00,2024-05-23 22:00:00,2024-05-23 22:05:00,1.95,22.0,3
2,2024-04-13 20:51:00,2024-04-13 21:41:00,2024-04-14 00:47:00,2024-04-14 00:51:00,3.93,50.0,1
3,2024-05-13 12:50:00,2024-05-13 12:50:00,2024-05-13 13:48:00,2024-05-13 13:53:00,0.97,0.0,4
4,2024-06-19 22:51:00,2024-06-19 23:00:00,2024-06-19 23:49:00,2024-06-19 23:54:00,0.97,9.0,4


#### Explore 'Resolution_Duration' column values

In [30]:
df["Resolution_Duration"].describe()

Unnamed: 0,Resolution_Duration
count,100000.0
mean,2.334558
std,1.122957
min,0.67
25%,1.0
50%,2.67
75%,3.67
max,4.0


Save processed data

In [29]:
df.to_csv('cleaned_ITSM_data.csv', index=False)