# Data Description & Context:


In the support process, incoming incidents are analyzed and assessed by organization’s support teams to fulfill the request. In many organizations, better allocation and effective usage of the valuable support resources will directly result in substantial cost savings.

Currently the incidents are created by various stakeholders (Business Users, IT Users and Monitoring Tools) within IT Service Management Tool and are assigned to Service Desk teams (L1 / L2 teams). This team will review the incidents for right ticket categorization, priorities and then carry out initial diagnosis to see if they can resolve. Around ~54% of the incidents are resolved by L1 / L2 teams. Incase L1 / L2 is unable to resolve, they will then escalate / assign the tickets to Functional teams from Applications and Infrastructure (L3 teams). Some portions of incidents are directly assigned to L3 teams by either Monitoring tools or Callers / Requestors. L3 teams will carry out detailed diagnosis and resolve the incidents. Around ~56% of incidents are resolved by Functional / L3 teams. Incase if vendor support is needed, they will reach out for their support towards incident closure. L1 / L2 needs to spend time reviewing Standard Operating Procedures (SOPs) before assigning to Functional teams (Minimum ~25-30% of incidents needs to be reviewed for SOPs before ticket assignment). 15 min is being spent for SOP review for each incident. Minimum of ~1 FTE effort needed only for incident assignment to L3 teams.

During the process of incident assignments by L1 / L2 teams to functional groups, there were multiple instances of incidents getting assigned to wrong functional groups. Around ~25% of Incidents are wrongly assigned to functional teams. Additional effort needed for Functional teams to re-assign to right functional groups. During this process, some of the incidents are in queue and not addressed timely resulting in poor customer service. Guided by powerful AI techniques that can classify incidents to right functional groups can help organizations to reduce the resolving time of the issue and can focus on more productive tasks.


# Domain:

Information Technology

# Project Description

In this capstone project, the goal is to build a classifier that can classify the tickets by analyzing text.

# Project Objectives

The objective of the project is,
- Learn how to use different classification models.
- Use transfer learning to use pre-built models.
- Learn to set the optimizers, loss functions, epochs, learning rate, batch size, checkpointing, early stopping etc.
- Read different research papers of given domain to obtain the knowledge of advanced models for the given problem.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")


sns.set_style(style='darkgrid')

In [2]:
!cd

C:\Users\Rony\AIML\13_Capstone Project AIML\NLP - Project 1


In [3]:
df = pd.read_excel('input_data.xlsx', sheet_name=0)
df.head()

Unnamed: 0,Short description,Description,Caller,Assignment group
0,login issue,-verified user details.(employee# & manager na...,spxjnwir pjlcoqds,GRP_0
1,outlook,\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail...,hmjdrvpb komuaywn,GRP_0
2,cant log in to vpn,\r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail...,eylqgodm ybqkwiam,GRP_0
3,unable to access hr_tool page,unable to access hr_tool page,xbkucsvz gcpydteq,GRP_0
4,skype error,skype error,owlgqjme qhcozdfx,GRP_0


# Basic EDA

In [4]:
# Find the shape of the data, data type of individual columns
df.info()  #info about the data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8500 entries, 0 to 8499
Data columns (total 4 columns):
Short description    8492 non-null object
Description          8499 non-null object
Caller               8500 non-null object
Assignment group     8500 non-null object
dtypes: object(4)
memory usage: 265.8+ KB


In [5]:
# Checking the presence of missing values
df.isnull().values.any()

True

In [6]:
df.isna().apply(pd.value_counts)   #null value check 

Unnamed: 0,Short description,Description,Caller,Assignment group
False,8492,8499,8500.0,8500.0
True,8,1,,


In [7]:
# Total number of missing values
df.isnull().sum().sum()

9

- There are 4 columns in total
    - One target column - 'Assignment group'
    - None of the columns have numeric values
    - 3 predictor variables
- There are no null values for the columns 'Caller' and 'Assignment group'

In [8]:
# Provides information like total count, unique count, value which occurs most often, maximum frequency of occurance
df.describe().T

Unnamed: 0,count,unique,top,freq
Short description,8492,7481,password reset,38
Description,8499,7817,the,56
Caller,8500,2950,bpctwhsn kzqsbmtp,810
Assignment group,8500,74,GRP_0,3976


In [9]:
df.drop(['Caller'], axis = 1, inplace=True)

In [10]:
df

Unnamed: 0,Short description,Description,Assignment group
0,login issue,-verified user details.(employee# & manager na...,GRP_0
1,outlook,\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail...,GRP_0
2,cant log in to vpn,\r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail...,GRP_0
3,unable to access hr_tool page,unable to access hr_tool page,GRP_0
4,skype error,skype error,GRP_0
...,...,...,...
8495,emails not coming in from zz mail,\r\n\r\nreceived from: avglmrts.vhqmtiua@gmail...,GRP_29
8496,telephony_software issue,telephony_software issue,GRP_0
8497,vip2: windows password reset for tifpdchb pedx...,vip2: windows password reset for tifpdchb pedx...,GRP_0
8498,machine nÃ£o estÃ¡ funcionando,i am unable to access the machine utilities to...,GRP_62


### Find and remove exact duplicates

In [11]:
# Select duplicate rows except first occurrence based on all columns
duplicateRowsDF = df[df.duplicated()]
print("Duplicate Rows except first occurrence based on all columns are :")
duplicateRowsDF

Duplicate Rows except first occurrence based on all columns are :


Unnamed: 0,Short description,Description,Assignment group
51,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0
81,erp SID_34 account locked,erp SID_34 account locked,GRP_0
123,unable to display expense report,unable to display expense report,GRP_0
157,ess password reset,ess password reset,GRP_0
229,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0
...,...,...,...
8424,windows account lockout,windows account lockout,GRP_0
8450,unable to connect to wifi,unable to connect to wifi,GRP_0
8451,password reset erp SID_34,password reset erp SID_34,GRP_0
8458,windows account locked,windows account locked,GRP_0


In [12]:
duplicateRowsDF[duplicateRowsDF['Short description'] == 'call for ecwtrjnq jpecxuty']

Unnamed: 0,Short description,Description,Assignment group
51,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0
229,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0
2714,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0
3085,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0
3219,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0
4303,call for ecwtrjnq jpecxuty,call for ecwtrjnq jpecxuty,GRP_0


In [13]:
duplicateRowsDF['Short description'].value_counts()

windows password reset                                              28
password reset                                                      25
account locked in ad                                                22
windows account locked                                              22
erp SID_34 account unlock                                           17
                                                                    ..
unable to login to skype                                             1
cisco access point is not working.                                   1
blank call // loud noise // gso                                      1
job SID_38hotf failed in job_scheduler at: 09/01/2016 22:20:00       1
unable to connect to company secure                                  1
Name: Short description, Length: 166, dtype: int64

In [14]:
duplicateRowsDF[duplicateRowsDF['Short description'] == 'outlook not working']

Unnamed: 0,Short description,Description,Assignment group
1019,outlook not working,outlook not working,GRP_0
5941,outlook not working,outlook not working,GRP_0


In [15]:
# Drop duplicated by the following code.
df.drop_duplicates(inplace=True)

In [16]:
# Check to confirm if duplicate rows except first occurrence based on all columns are removed
df[df.duplicated()]

Unnamed: 0,Short description,Description,Assignment group


In [17]:
df.info()  #info about the data

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7909 entries, 0 to 8499
Data columns (total 3 columns):
Short description    7904 non-null object
Description          7908 non-null object
Assignment group     7909 non-null object
dtypes: object(3)
memory usage: 247.2+ KB


### Handling Missing Values

In [18]:
df.isna().apply(pd.value_counts)   #null value check 

Unnamed: 0,Short description,Description,Assignment group
False,7904,7908,7909.0
True,5,1,


In [19]:
df[df['Short description'].isnull()]

Unnamed: 0,Short description,Description,Assignment group
2604,,\r\n\r\nreceived from: ohdrnswl.rezuibdt@gmail...,GRP_34
3383,,\r\n-connected to the user system using teamvi...,GRP_0
3906,,-user unable tologin to vpn.\r\n-connected to...,GRP_0
3924,,name:wvqgbdhm fwchqjor\nlanguage:\nbrowser:mic...,GRP_0
4341,,\r\n\r\nreceived from: eqmuniov.ehxkcbgj@gmail...,GRP_0


In [20]:
df[df['Description'].isnull()]

Unnamed: 0,Short description,Description,Assignment group
4395,i am locked out of skype,,GRP_0


In [21]:
print(list(df[df['Short description'].isna()].index))
print(list(df[df['Description'].isna()].index))

[2604, 3383, 3906, 3924, 4341]
[4395]


In [23]:
df['Short description'].fillna(value=' ', inplace=True)
df['Description'].fillna(value=' ', inplace=True)

In [24]:
print(list(df[df['Short description'].isna()].index))
print(list(df[df['Description'].isna()].index))

[]
[]


In [25]:
df[df['Short description'].str.contains('skype error')]

Unnamed: 0,Short description,Description,Assignment group
4,skype error,skype error,GRP_0
285,skype error while logging in,skype error while logging in,GRP_0
3392,skype error : getting skype certificate error,skype error : getting skype certificate error,GRP_0
4813,skype error,skype error,GRP_0


In [26]:
df.head()

Unnamed: 0,Short description,Description,Assignment group
0,login issue,-verified user details.(employee# & manager na...,GRP_0
1,outlook,\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail...,GRP_0
2,cant log in to vpn,\r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail...,GRP_0
3,unable to access hr_tool page,unable to access hr_tool page,GRP_0
4,skype error,skype error,GRP_0


In [27]:
df.iloc[[1]]

Unnamed: 0,Short description,Description,Assignment group
1,outlook,\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail...,GRP_0


In [28]:
df['Description'][1]

'\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail.com\r\n\r\nhello team,\r\n\r\nmy meetings/skype meetings etc are not appearing in my outlook calendar, can somebody please advise how to correct this?\r\n\r\nkind '

In [29]:
df.replace({r'\r\n': ' '}, regex=True, inplace=True)

In [30]:
df['Description'][1]

'  received from: hmjdrvpb.komuaywn@gmail.com  hello team,  my meetings/skype meetings etc are not appearing in my outlook calendar, can somebody please advise how to correct this?  kind '

In [32]:
df.to_excel("input_data_clean_1.xlsx",
             sheet_name='Sheet1')  

In [38]:
df['Short description'][3903]

'ç”µè„‘æ—\xa0æ³•è¿žæŽ¥å…¬å…±ç›˜ï¼Œè¯·å¸®æˆ‘è½¬ç»™å°\x8fè´º'

In [39]:
df['Short description'] = df['Short description'].str.encode('ascii', 'ignore').str.decode('ascii')

In [56]:
df.shape

(7909, 3)

In [64]:
df.iloc[3680:3689,:]

Unnamed: 0,Short description,Description,Assignment group
3900,problems with ap vpn,received from: elixsfvu.pxwbjofl@gmail.com ...,GRP_0
3901,kis documents will not generate because the st...,kis documents will not generate because the st...,GRP_18
3902,wrong unit price on inwarehouse_tool,received from: ovhtgsxd.dcqhnrmy@gmail.com ...,GRP_13
3903,,ç”µè„‘æ— æ³•è¿žæŽ¥å…¬å…±ç›˜ï¼Œè¯·å¸®æˆ‘è½¬ç»™å...,GRP_30
3904,usa pc companyst-apc-01 in the pvd area cannot...,usa pc companyst-apc-01 in the pvd area needs ...,GRP_3
3906,,-user unable tologin to vpn. -connected to th...,GRP_0
3907,i am not able to log into my vpn. when i am tr...,name:mehrugshy\nlanguage:\nbrowser:microsoft i...,GRP_0
3911,vpn connectivity,received from: gmrxwqlf.vzacdmbj@gmail.com ...,GRP_0
3912,referencing ticket ticket_no1499477. customer ...,please revisit ticket ticket_no1499477 and rev...,GRP_20


In [67]:
df['Description'][3903]

'ç”µè„‘æ—\xa0æ³•è¿žæŽ¥å…¬å…±ç›˜ï¼Œè¯·å¸®æˆ‘è½¬ç»™å°\x8fè´º'

In [68]:
df['Description'] = df['Description'].str.encode('ascii', 'ignore').str.decode('ascii')

In [69]:
df['Description'][3903]

''

In [70]:
df.iloc[3680:3689,:]

Unnamed: 0,Short description,Description,Assignment group
3900,problems with ap vpn,received from: elixsfvu.pxwbjofl@gmail.com ...,GRP_0
3901,kis documents will not generate because the st...,kis documents will not generate because the st...,GRP_18
3902,wrong unit price on inwarehouse_tool,received from: ovhtgsxd.dcqhnrmy@gmail.com ...,GRP_13
3903,,,GRP_30
3904,usa pc companyst-apc-01 in the pvd area cannot...,usa pc companyst-apc-01 in the pvd area needs ...,GRP_3
3906,,-user unable tologin to vpn. -connected to th...,GRP_0
3907,i am not able to log into my vpn. when i am tr...,name:mehrugshy\nlanguage:\nbrowser:microsoft i...,GRP_0
3911,vpn connectivity,received from: gmrxwqlf.vzacdmbj@gmail.com ...,GRP_0
3912,referencing ticket ticket_no1499477. customer ...,please revisit ticket ticket_no1499477 and rev...,GRP_20


In [71]:
df.isna().apply(pd.value_counts)   #null value check 

Unnamed: 0,Short description,Description,Assignment group
False,7909,7909,7909


In [72]:
df.to_excel("input_data_clean_2.xlsx",
             sheet_name='Sheet1')  