In [1]:
import pandas as pd

# Load the cleaned dataset
df = pd.read_csv('technical_support_data_cleaned.csv')

# Re-convert time columns to datetime objects

time_cols = [
    'Created time', 'Expected SLA to resolve', 'Expected SLA to first response',
    'First response time', 'Resolution time', 'Close time'
]
for col in time_cols:
    df[col] = pd.to_datetime(df[col]) 

print("Cleaned data loaded successfully for Feature Engineering!")
print(df.info()) # Check dtypes to confirm datetime conversion

Cleaned data loaded successfully for Feature Engineering!
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2330 entries, 0 to 2329
Data columns (total 22 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   Status                          2330 non-null   object        
 1   Ticket ID                       2330 non-null   int64         
 2   Priority                        2330 non-null   object        
 3   Source                          2330 non-null   object        
 4   Topic                           2330 non-null   object        
 5   Agent Group                     2330 non-null   object        
 6   Agent Name                      2330 non-null   object        
 7   Created time                    2330 non-null   datetime64[ns]
 8   Expected SLA to resolve         2330 non-null   datetime64[ns]
 9   Expected SLA to first response  2330 non-null   datetime64[ns]
 10  First response

The main objectives for the "Optimizing Customer Support Operations at SwiftConnect Telecom" project are to:
- Analyze support efficiency (response and resolution times).
- Understand patterns in ticket volume (e.g., busiest days/hours).
- Evaluate SLA compliance.
- Correlate operational metrics with customer satisfaction.

- The raw timestamp columns (Created time, First response time, etc.) are dates and times, but they don't immediately tell "how long did it take?" or "what day of the week was this ticket created?". 
- By performing feature engineering, we will derive these crucial metrics and temporal insights

Key steps: 
- Create several new features, including:
    - Duration Metrics:
        > Actual First Response Duration: Time taken from ticket creation to the first response.

        > Actual Resolution Duration: Time taken from ticket creation to issue resolution.

        > Actual Ticket Lifecycle Duration: Total time from ticket creation to its final closure.
    - SLA Compliance:
        > Convert the 'Within SLA' / 'SLA Violated' flags into a numerical format (e.g., 0/1) for easier calculation and aggregation.
    - Time-Based Features: Extract components from datetime columns:
        > Day of the week
        > Hour of the day
        > Month/Quarter


In [None]:
# Create "Actual First Respone Duration" and "Actual First Respone Duration Minutes" features

# Calculate the duration (this will be a Timedelta object)
df['Actual_First_Response_Duration'] = df['First response time'] - df['Created time']

# Convert the Timedelta to minutes
df['Actual_First_Response_Duration_Minutes'] = df['Actual_First_Response_Duration'].dt.total_seconds() / 60


In [3]:
# Create "Actual Resolution Duration" and "Actual Resolution Duration Minutes" features

# Calculate the duration
df['Actual_Resolution_Duration'] = df['Resolution time'] - df['Created time']

# Convert the Timedelta to minutes
df['Actual_Resolution_Duration_Minutes'] = df['Actual_Resolution_Duration'].dt.total_seconds() / 60

In [None]:
# Create "Actual Ticket Lifecycle Duration." and "Actual Ticket Lifecycle Duration Minutes" features

# Calculate the duration
df['Actual_Ticket_Lifecycle_Duration'] = df['Close time'] - df['Created time']

# Convert the Timedelta to minutes
df['Actual_Ticket_Lifecycle_Duration_Minutes'] = df['Actual_Ticket_Lifecycle_Duration'].dt.total_seconds() / 60

Succefully created three crucial duration metrics engineered into dataset:

- Actual_First_Response_Duration_Minutes
- Actual_Resolution_Duration_Minutes
- Actual_Ticket_Lifecycle_Duration_Minutes

These features are central to analysis of support efficiency and SLA compliance!

In [5]:
# Create Time-based features

# Extract Day_of_Week and Hour_of_Day from Created time
# Extract day of the week (0=Monday, 6=Sunday)
df['Created_Day_of_Week_Num'] = df['Created time'].dt.dayofweek
# Extract day of the week name
df['Created_Day_of_Week_Name'] = df['Created time'].dt.day_name()
# Extract hour of the day
df['Created_Hour_of_Day'] = df['Created time'].dt.hour


# Extract Month and Year from Created time
# Extract numerical month
df['Created_Month_Num'] = df['Created time'].dt.month
# Extract month name
df['Created_Month_Name'] = df['Created time'].dt.month_name()
# Extract year
df['Created_Year'] = df['Created time'].dt.year

# Extract temporal patterns related to when tickets are responded to and resolved
# Extract Day_of_Week_Name and Hour_of_Day for First response time and Resolution time
# For First Response Time
df['FirstResponse_Day_of_Week_Name'] = df['First response time'].dt.day_name()
df['FirstResponse_Hour_of_Day'] = df['First response time'].dt.hour

# For Resolution Time
df['Resolution_Day_of_Week_Name'] = df['Resolution time'].dt.day_name()
df['Resolution_Hour_of_Day'] = df['Resolution time'].dt.hour

# Extract temporal patterns related to when tickets are responded to and resolved
# Extract Day_of_Week_Name and Hour_of_Day for First response time and Resolution time
# For First Response Time
df['FirstResponse_Day_of_Week_Name'] = df['First response time'].dt.day_name()
df['FirstResponse_Hour_of_Day'] = df['First response time'].dt.hour
# For Resolution Time
df['Resolution_Day_of_Week_Name'] = df['Resolution time'].dt.day_name()
df['Resolution_Hour_of_Day'] = df['Resolution time'].dt.hour



- Succefully created time-based features into dataset!

In [10]:
# Converting categorical compliance flags into a numerical format
# The columns SLA For first response and SLA For Resolution are currently categorical (within sla, sla violated). 
# Converting them into numerical (binary) flags (0 for violated, 1 for within SLA) makes them ready to use in calculations, aggregations, and modeling.

import numpy as np

# For First Response SLA
df['First_Response_SLA_Met'] = np.where(df['SLA For first response'] == 'within sla', 1, 0)

# For Resolution SLA
df['Resolution_SLA_Met'] = np.where(df['SLA For Resolution'] == 'within sla', 1, 0)


Successfully completed the Feature Engineering phase! 
Transformed cleaned data into a rich set of analytical features, including:
- Duration Metrics:
    > Actual_First_Response_Duration_Minutes

    > Actual_Resolution_Duration_Minutes

    > Actual_Ticket_Lifecycle_Duration_Minutes

- Time-Based Features (from Created time, First response time, Resolution time):
    > Created_Day_of_Week_Num, Created_Day_of_Week_Name, Created_Hour_of_Day, Created_Month_Num, Created_Month_Name, Created_Year

    > FirstResponse_Day_of_Week_Name, FirstResponse_Hour_of_Day

    > Resolution_Day_of_Week_Name, Resolution_Hour_of_Day

- Numerical SLA Compliance Flags
    > First_Response_SLA_Met
    > Resolution_SLA_Met


In [11]:
# Save engineered dataset

output_file_path_engineered = 'technical_support_data_engineered.csv'
df.to_csv(output_file_path_engineered, index=False)
print(f"Engineered dataset saved successfully to: {output_file_path_engineered}")

Engineered dataset saved successfully to: technical_support_data_engineered.csv
