<a href="https://colab.research.google.com/github/pallavi-allada/UtteranceClassification/blob/main/src/Inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inference

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
import os
import time
import pickle
import numpy as np
import re
import string

import gensim
from gensim.models import Word2Vec

In [3]:
ROOT_DIR = "/content/drive/MyDrive/Colab Notebooks/Cognizer"
DATA_DIR = "data"
MODEL_DIR = "models"

WORD2VEC_BIN = "GoogleNews-vectors-negative300.bin"

Class for inference
Initialise the required models - pretrained word2vec and our trained model. By default, we are using KNN as our model of choice. We can change it during instantiating the UtteranceClassifier.

In [14]:
class UtteranceClassifier:

  def __init__(self, num_features, classifier="KNN"):
    self.modeldict = {"KNN":"knnpickle_file",
                      "RandomForest":"rfpickle_file",
                      "GradientBoosting":"gbpickle_file"}
    
    model_file = self.modeldict[classifier]
    print(model_file)
    self.num_features = num_features
    self.model = gensim.models.KeyedVectors.load_word2vec_format(os.path.join(ROOT_DIR,MODEL_DIR,WORD2VEC_BIN), binary=True, limit=500000)
    self.loaded_model = pickle.load(open(os.path.join(ROOT_DIR,MODEL_DIR,model_file), 'rb'))

  def clean(self,sentence):
    words = [re.sub('[%s]' % re.escape(string.punctuation), '', word) for word in sentence.split()]
    return [word for word in words if len(word)>0]

  def word2vec_representation(self,doc_words):
    word2vec_rep = np.zeros((1, self.num_features))
    for word in doc_words: 
        try:
            word2vec_rep+=self.model[word]
        except:
            word2vec_rep+=np.zeros((self.num_features))
    return word2vec_rep

  def gettag(self,sentence):
    w2v1 = self.word2vec_representation(self.clean(sentence))
    return self.loaded_model.predict(w2v1)
    

In [15]:
#convert label to tag
def label2tag(lbl):
  return ("Contract" if lbl == 0 else ("Email" if lbl == 1 else ("Calendar" if lbl == 2 else ("Contact" if lbl == 3 else ("Document" if lbl == 4 else ("Employee" if lbl == 5 else "Keyword"))))))


Change the classifier string to "KNN" or "RandomForest" or "GradientBoosting" to load those models. By default it is KNN.

In [16]:
classifier = UtteranceClassifier(num_features = 300, classifier = "KNN")

knnpickle_file


In [None]:
sentence = "show me all e-mail address of Pallavi"
print("User text is-----", sentence)
start_time = time.time()
prediction = classifier.gettag(sentence)
print("Tag is-----", label2tag(prediction))
print("Inference time----- %s seconds ---" % (time.time() - start_time))

User text is----- show me all e-mail address of Pallavi
Tag is----- Contact
Inference time----- 0.010127782821655273 seconds ---
