# ITOps Analytics

## UC: Self-Service Help Desk


### Problem Statement / Business Objective:

We are in a typical IT Helpdesk Office setup. We have the following information available with us.
a) A dataset comprising of list of FAQ articles and a corresponding set of questions.
b) The FAQ article can have multiple questions associated with them.

Objective is to build a solution / an NLP model that can take a new user question and find the closest question in the dataset. We then pick corresponding FAQ article and return it to the user. So the search is AI enabled and user does not have to worry about finding the appropriate and relevant article.

### Benefits:

a) Ability to automate the search process of detecting appropriate FAQ articles from a huge pool of repository.

b) Reduction in maintenance cost

c) Leverage AI enabled method to improvise automation capability

d) Reduction in human/manual error


In [1]:
#Install all related packages/libraries. 
#If you find additional packages missing, please install them in your virtual environment.
import sys
import os
#!conda install --yes --prefix {sys.prefix} pandas tensorflow scikit-learn gensim

In [2]:
cwd = os.getcwd()
cwd

'C:\\Users\\kamalakanta.mishra\\Desktop\\Kamal\\MyStuff\\ITOps_Analytics\\NotebookFiles'

### 1. Building a Document Vector

In [3]:
from collections import defaultdict
from gensim import corpora
from gensim.parsing.preprocessing import remove_stopwords
import numpy as np
import pandas as pd

#Read the input CSV into a Pandas dataframe
helpdesk_data = pd.read_csv("helpdesk_dataset.csv")

print("HelpDesk Data: ")  
print(helpdesk_data.head())

HelpDesk Data: 
                                            Question  \
0              My Mac does not boot, what can I do ?   
1                Can Mac Air get infected by a Virus   
2   My Mac is having boot problems, how do I fix it?   
3                 Do I need an anti virus on my Mac?   
4  I have trouble connecting my monitor to my Mac...   

                   LinkToAnswer  
0  http://faq/mac-does-not-boot  
1     http://faq/mac-book-virus  
2  http://faq/mac-does-not-boot  
3     http://faq/mac-book-virus  
4  http://faq/mac-monitor-setup  


In [4]:
#Extract the Question column from dataset
documents = helpdesk_data["Question"]

#Function to perform data cleansing of the document
def process_document(document):

    #Remove stopwords, convert to lower case and remove "?" character
    cleaned_document = remove_stopwords(document.lower()).replace("?","")  
    return cleaned_document.split()

#Create a document vector
doc_vectors=[process_document(document)
             for document in documents]


In [5]:
#Print the document and the corresponding document vector to compare
print(documents[1])
print(doc_vectors[1])

Can Mac Air get infected by a Virus
['mac', 'air', 'infected', 'virus']


### 2. Creating the LSI Model

LSI - Latent Semantic Indexing

In [6]:
#Create the dictionary based on document vectors
dictionary = corpora.Dictionary(doc_vectors)

print("Dictionary created :")
dictionary.token2id


Dictionary created :


{'boot,': 0,
 'mac': 1,
 'air': 2,
 'infected': 3,
 'virus': 4,
 'boot': 5,
 'fix': 6,
 'having': 7,
 'it': 8,
 'problems,': 9,
 'anti': 10,
 'need': 11,
 'connecting': 12,
 'help': 13,
 'mac.': 14,
 'monitor': 15,
 'trouble': 16,
 'boots,': 17,
 'error': 18,
 'shows': 19,
 'software': 20,
 'unsupporterd': 21,
 'connected': 22,
 'proper': 23,
 'resolution': 24,
 'flicker': 25,
 'monitor.': 26,
 'hdmi': 27,
 'use': 28,
 'connect': 29,
 'monitors': 30,
 'windows': 31,
 'machine': 32,
 'machine.': 33,
 'linux': 34}

In [8]:
#Convert document vector to a corpus based on identifiers in the dictionary
#Create a corpus
corpus = [dictionary.doc2bow(doc_vector) 
          for doc_vector in doc_vectors]

#Review the corpus generated
print(doc_vectors[1])
print(corpus[1])

# So each word will be mapped to a tuple

['mac', 'air', 'infected', 'virus']
[(1, 1), (2, 1), (3, 1), (4, 1)]


In [10]:
#Now let's build a similarity index

#First, Build the LSI Model
from gensim import models,similarities

#Create the model
lsi = models.LsiModel(corpus, id2word=dictionary)

#Create a similarity Index
index = similarities.MatrixSimilarity(lsi[corpus])

for similarities in index:
    print(similarities)

# We will get a nXn matrix output for every n records

[ 1.0000000e+00  3.5355341e-01  2.8867510e-01  3.5355341e-01
 -2.3960363e-08  2.8867507e-01 -2.1823926e-08 -7.7285724e-09
  3.5355344e-01  4.0824834e-01  5.0000000e-01  7.4620088e-11
 -8.1208427e-09  5.7024478e-09 -1.6374955e-08  4.0824831e-01
  1.0141667e-08 -1.6796031e-09 -3.0654340e-10 -1.4191170e-08]
[ 3.5355341e-01  9.9999994e-01  2.0412414e-01  4.9999994e-01
 -1.5189915e-08  2.0412417e-01 -5.8887246e-09  2.8867513e-01
  2.5000003e-01  2.8867516e-01  7.2798940e-09  5.7735032e-01
 -1.3715472e-08  2.2360681e-01 -2.6795817e-08  3.0326026e-09
  5.7735032e-01 -1.9818462e-08  2.2360682e-01 -2.7807575e-08]
[ 2.8867510e-01  2.0412414e-01  1.0000000e+00  2.0412415e-01
  3.0523812e-09  1.6666669e-01  3.0860674e-01 -3.0340328e-09
  2.0412417e-01  2.3570229e-01 -6.0323577e-09  1.7852827e-08
  8.3333331e-01  1.1366384e-08 -1.2746543e-08 -8.1723837e-09
 -1.3646855e-08  8.3333337e-01 -1.6639047e-08 -3.4284398e-08]
[ 3.53553414e-01  4.99999940e-01  2.04124153e-01  9.99999940e-01
  4.98656005e-10 

### 3. Recommending FAQs

In [11]:
# This is a sample question asked by the end user
question = "I have boot problems in my Mac"

#Pre Process the Question 
question_corpus = dictionary.doc2bow(process_document(question))
print("Question translated to :", question_corpus)

#Create an LSI Representation
vec_lsi = lsi[question_corpus]  

#Find similarity of the question with existing documents
sims = index[vec_lsi]  
print("Similarity scores :",list(enumerate(sims)))

# We will get output as "similarity scores" for this particular question with each of the docs
# 1st number indicates - the document ID
# 2nd number indicates - the similarity score
# The higher the similarity score, the more matching is this question to the document in the dataset


Question translated to : [(1, 1), (5, 1)]
Similarity scores : [(0, 0.6626272), (1, 0.46854818), (2, 0.765136), (3, 0.46854818), (4, 9.313226e-09), (5, 0.382568), (6, 2.9802322e-08), (7, -1.4901161e-08), (8, 0.46854818), (9, 0.54103285), (10, 0.0), (11, 1.4901161e-08), (12, 0.382568), (13, 0.0), (14, -7.450581e-09), (15, 7.450581e-09), (16, -7.450581e-09), (17, 0.382568), (18, -7.450581e-09), (19, -2.2351742e-08)]


In [12]:
#Find the corresponding FAQ Link

#sort an array in reverse order and get indexes
matches=np.argsort(sims)[::-1] 
print("Sorted Document index :", matches)

print("\n", "-"*60, "\n")
for i in matches:
    print(sims[i], " -> ", helpdesk_data.iloc[i]["Question"])

print("\n", "-"*60, "\n")
print("Recommended FAQ :" , helpdesk_data.iloc[matches[0]]["LinkToAnswer"] )

Sorted Document index : [ 2  0  9  8  1  3 12 17  5  6 11  4 15 13 10 16 18 14  7 19]

 ------------------------------------------------------------ 

0.765136  ->  My Mac is having boot problems, how do I fix it?
0.6626272  ->  My Mac does not boot, what can I do ?
0.54103285  ->  Can I connect two monitors to my Mac?
0.46854818  ->  Can I use a HDMI monitor with my Mac?
0.46854818  ->  Can Mac Air get infected by a Virus
0.46854818  ->  Do I need an anti virus on my Mac?
0.382568  ->  My Windows is having boot problems, how do I fix it?
0.382568  ->  My Linux is having boot problems, how do I fix it?
0.382568  ->  When my Mac boots, it shows an unsupporterd software error
2.9802322e-08  ->  My Monitor does not show in proper resolution when connected to my Mac. How do I fix it?
1.4901161e-08  ->  Can Windows get infected by a Virus
9.313226e-09  ->  I have trouble connecting my monitor to my Mac. Can you please help?
7.450581e-09  ->  My Linux machine does not boot, what can I do ?
0