<a href="https://colab.research.google.com/github/kumaramardeep342/Colab-Work/blob/main/Ed_AI__DL_M13%2BM14_Intent_Detection_LTSM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intent Classification using LSTM

## Summary
- Problem Statement
- About the Dataset
- Load the Dataset
- Import the required libraries
- Pre-process the dataset
- Training the Model
- Predict
- Model Evaluation

## Problem Statement
Intent Classification is the automated association of text to a specific intention. For example: Let's say you are writing an email to one of the Airlines and the text of the same is 'Can you please cancel my ticket with PNR 123456'. The intent of the customer here is 'Cancellation of Air Ticket'.

The idea of this use case to introduce the concept of Intent classification and how can LSTM be used to solve this.

## About the Dataset

The ATIS(Air Travel Information System) data is a rich corpus that contains natural language text used by general public to book flight tickets, enquire about flight timings, prices etc.

There are 2 columns in each of the above datasets. First column is 'target' which is the output we will be classifying and second column is 'text' which is the user input asking for queries related to flights.

Basically 'target' is the intent of the customer.

## Load the Dataset

In [1]:
! pip install -q opendatasets
import opendatasets as od
od.download('https://www.kaggle.com/datasets/hassanamin/atis-airlinetravelinformationsystem')

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: sjagkoo7
Your Kaggle Key: ··········
Dataset URL: https://www.kaggle.com/datasets/hassanamin/atis-airlinetravelinformationsystem
Downloading atis-airlinetravelinformationsystem.zip to ./atis-airlinetravelinformationsystem


100%|██████████| 139k/139k [00:00<00:00, 611kB/s]







## Import libraries

In [2]:
#Mount the Google Drive
# from google.colab import drive
# drive.mount('/content/drive')

#enable table format
from google.colab import data_table
data_table.enable_dataframe_formatter()

#disable table format
# from google.colab import data_table
# data_table.disable_dataframe_formatter()

# processing
import pandas  as pd
import polars as pl
import numpy as np

#visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly as py


from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

#nltk
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

#tensorflow
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.layers import BatchNormalization, Dropout, Input
from tensorflow.keras.regularizers import l2
from tensorflow.keras.initializers import he_uniform, glorot_uniform
from tensorflow.keras.activations import relu, softmax
from tensorflow.keras.losses import CategoricalCrossentropy as cce
from tensorflow.keras.metrics import AUC
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam


nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('punkt_tab')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

## Pre-process the dataset

We will be doing the following preprocessing steps to get the desired format of the data.

- Perform One Hot Encoding on the target variable.
- Convert the text into lower case.
- Tokenize the words.
- Remove punctuation and stop words.
- Perform stemming & normalization.
- Convert texts into sequences.
- Pad the sequences.

In [3]:
# Read the Dataset
atis = pd.read_csv('/content/atis-airlinetravelinformationsystem/atis_intents.csv',names=['intent','Text'],header = None)
atis.head(2)

Unnamed: 0,intent,Text
0,atis_flight,i want to fly from boston at 838 am and arriv...
1,atis_flight,what flights are available from pittsburgh to...


In [4]:
atis.intent.value_counts()

Unnamed: 0_level_0,count
intent,Unnamed: 1_level_1
atis_flight,3666
atis_airfare,423
atis_ground_service,255
atis_airline,157
atis_abbreviation,147
atis_aircraft,81
atis_flight_time,54
atis_quantity,51
atis_flight#atis_airfare,21
atis_airport,20


In [5]:
#replacing  below entry to most relavent entry

# atis_flight#atis_airfare -- atis_airfare
# atis_airline#atis_flight_no -- atis_flight_no
# atis_ground_service#atis_ground_fare -- atis_ground_fare
# atis_airfare#atis_flight_time -- atis_flight_time
# atis_cheapest -- atis_airfare
# atis_aircraft#atis_flight#atis_flight_no	 -- -- atis_flight_no

rep_dict = {'atis_flight#atis_airfare':' atis_airfare',
            'atis_airline#atis_flight_no':'atis_flight_no',
            'atis_ground_service#atis_ground_fare':'atis_ground_fare',
            'atis_airfare#atis_flight_time':'atis_flight_time',
           'atis_cheapest':'atis_airfare' ,
            'atis_aircraft#atis_flight#atis_flight_no':'atis_flight_no'}

In [6]:
#replacing  below entry to most relavent entry
for key,value in rep_dict.items():
  atis.intent = atis.intent.str.replace(key,value)

In [7]:
atis.intent.value_counts()

Unnamed: 0_level_0,count
intent,Unnamed: 1_level_1
atis_flight,3666
atis_airfare,424
atis_ground_service,255
atis_airline,157
atis_abbreviation,147
atis_aircraft,81
atis_flight_time,55
atis_quantity,51
atis_airfare,21
atis_airport,20


In [8]:
# Perform One Hot Encoding on the target variable.
encode_intent = OneHotEncoder().fit(np.array(atis.intent).reshape(-1,1)) #We perform one hot encoding on the target variable to convert into a matrix of 0s and 1s.

In [9]:
encode_intent

In [10]:
intent_encoded =  encode_intent.transform(np.array(atis.intent).reshape(-1,1)).toarray()

In [11]:
intent_encoded

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [12]:
# Convert the text into lower case.
atis["Text"]= atis.Text.map(lambda l: l.lower())

In [13]:
atis.head(2)

Unnamed: 0,intent,Text
0,atis_flight,i want to fly from boston at 838 am and arriv...
1,atis_flight,what flights are available from pittsburgh to...


In [14]:
# Tokenize the words.
atis["Text"] = atis["Text"].apply(nltk.word_tokenize)

In [15]:
atis.head(2)

Unnamed: 0,intent,Text
0,atis_flight,"[i, want, to, fly, from, boston, at, 838, am, ..."
1,atis_flight,"[what, flights, are, available, from, pittsbur..."


In [16]:
# Remove punctuation and stop words.
stop_words = set(stopwords.words('english'))
atis['Text'] = atis['Text'].apply(lambda x: [item for item in x if item not in stop_words])

In [17]:
atis['Text'] = atis['Text'].apply(lambda x: [item for item in x if item.isalnum()])

In [18]:
atis.head(2)

Unnamed: 0,intent,Text
0,atis_flight,"[want, fly, boston, 838, arrive, denver, 1110,..."
1,atis_flight,"[flights, available, pittsburgh, baltimore, th..."


In [19]:
# Perform stemming & normalization.
ps = WordNetLemmatizer()
atis['Text'] = atis['Text'].apply(lambda x: [ps.lemmatize(y) for y in x])

In [20]:
atis.head(2)

Unnamed: 0,intent,Text
0,atis_flight,"[want, fly, boston, 838, arrive, denver, 1110,..."
1,atis_flight,"[flight, available, pittsburgh, baltimore, thu..."


In [21]:
# Convert texts into sequences.
atis['Text'] = atis['Text'].apply(lambda x: ' '.join(x))

In [22]:
atis.head(2)

Unnamed: 0,intent,Text
0,atis_flight,want fly boston 838 arrive denver 1110 morning
1,atis_flight,flight available pittsburgh baltimore thursday...


In [23]:
# We use Tokenizer from tensorflow.keras.preprocessing.text library
num_words=10000
text_tokenizer= Tokenizer(num_words)
text_tokenizer.fit_on_texts(atis.Text) #fit_on_texts - creates the vocabulary index based on word frequency.

tokenized_atis_data= text_tokenizer.texts_to_sequences(atis.Text) #Converting texts to sequences

In [24]:
tokenized_atis_data

[[42, 21, 2, 316, 52, 5, 411, 19],
 [1, 35, 8, 10, 47, 19],
 [255, 99, 4, 6, 412, 1, 17, 16],
 [31, 212, 140, 79],
 [32, 27, 15, 8, 11, 182, 119],
 [33, 1, 96, 167, 129],
 [162, 97, 163, 1, 98, 9],
 [3, 1, 8, 100, 101, 47],
 [1, 2, 16],
 [162, 26, 30, 35, 5],
 [3, 1, 9, 4, 6],
 [3, 1, 4, 80, 67, 43, 86],
 [40, 79],
 [31, 1, 2, 227],
 [1, 10, 109, 20],
 [3, 23, 29, 15, 2, 5],
 [3, 26, 30, 5],
 [1, 5, 8, 17, 109, 20, 131, 20],
 [33, 56, 1, 53, 17, 10, 9, 9, 2, 2, 10],
 [18, 38, 1, 2, 8, 47, 81, 213],
 [22, 13, 21, 5, 8, 54, 12],
 [3, 1, 4, 80, 67],
 [18, 14, 23, 29, 1, 54, 5, 10],
 [162, 189, 163, 37, 12],
 [13, 56, 107, 5, 8, 7],
 [13, 147, 1, 7, 5],
 [12, 168, 5, 8, 7],
 [3, 1, 2, 8, 24, 81, 213, 45, 2, 185, 20],
 [7, 26, 30],
 [207, 33, 134, 9, 2, 34, 65],
 [3, 31, 32, 27, 15, 10, 9],
 [3, 1, 41, 46, 57, 169, 12, 135, 413],
 [1, 41, 46, 62],
 [18, 58, 1, 63, 11, 4, 6, 38, 1, 178],
 [33, 1, 9, 4, 6],
 [15, 1, 8, 11],
 [3, 12, 23, 29, 1],
 [1, 137, 190, 61, 64],
 [13, 300, 1, 8, 7],
 [3

In [25]:
#We use pad_sequences from tensorflow.keras.preprocessing.sequence library
atis_data= pad_sequences(tokenized_atis_data, maxlen= 20, padding= "pre")

In [26]:
atis_data

array([[  0,   0,   0, ...,   5, 411,  19],
       [  0,   0,   0, ...,  10,  47,  19],
       [  0,   0,   0, ...,   1,  17,  16],
       ...,
       [  0,   0,   0, ...,  12,  21,   5],
       [  0,   0,   0, ...,   6,  60,   5],
       [  0,   0,   0, ...,   5,   4,   6]], dtype=int32)

In [27]:
def transform_matrix(data, tokenizer):
    output_shape_mat= [data.shape[0],
                  data.shape[1],
                  tokenizer.word_index.keys().__len__()] #Three dimensional matrix with samples, steps and number of uniques words as each dimension.
    results_data= np.zeros(output_shape_mat) #creates new array with given dimensions.
    print(results_data)

    for i in range(data.shape[0]):
      for j in range(data.shape[1]):
        results_data[i, j, data[i,j]-1]= 1 # In this for loop, we are looping over the shape of the training & test data and assigning the cell of above created zero matrix to 1. We are performing encoding on the unique words to obtain the transformation matrix
    return results_data

trans_matrix_atis= transform_matrix(atis_data, text_tokenizer) #This will be the matrix on which the lstm model is applied
trans_matrix_atis

[[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 ...

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]]


array([[[0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        ...,
        [1., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       ...,

       [[0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        [0., 0., 0., ..., 0., 0., 1.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0.

# Build Model Using Long Short Term Memory -  LSTM

In [28]:
# split dataset into train and test dataset
trans_matrix_train,trans_matrix_test,train_target_encoded,test_target_encoded = train_test_split(trans_matrix_atis, intent_encoded, test_size=0.2, random_state=42)

## Build LSTM Model

In [29]:
class  lstm_model_class :
  def  __init__ (self):
    self.final_model = None
  def  build_lstm_model(self,input_dimensions, op_shape, num_steps, dropout_rate, kernel_reg, bias_reg):
    ip_layer= Input(shape= (num_steps, input_dimensions)) #Define embedded layer with shape as number of steps and input dimensions. Note that both these are input variables to the model.

    lstm_model= LSTM(units= num_steps)(ip_layer) #Make the LSTM layer with number of steps as memory units
    dense_layer_1= Dense(op_shape, kernel_initializer= he_uniform(), #he_uniform draws samples in uniform distribution with -inf to +inf as range.
                   bias_initializer= "zeros",
                   kernel_regularizer= l2(kernel_reg),
                   bias_regularizer= l2(bias_reg))(lstm_model) # Create the  Dense layer which is the regular deeply connected layer

    int_layer= BatchNormalization()(dense_layer_1) #Normalize and scale activations of the dense layer with BatchNormalization function
    int_layer= relu(int_layer) #This applies the rectified linear unit activation function
    int_layer= Dropout(rate= dropout_rate)(int_layer) #Dropout is used to define Dropout layer that sets input units to 0 with a frequency. Here it is dropout_rate
    output_1= Dense(op_shape, kernel_initializer= glorot_uniform(), #glorot_uniform draws samples in uniform distribution with stddev = sqrt(2 / (fan_in + fan_out)) fan_in is num of units in weight tensor and fan_out is num of output units
             bias_initializer= "zeros",
             kernel_regularizer= l2(kernel_reg),
             bias_regularizer= l2(bias_reg))(dense_layer_1) # Create another dense layer which is the output of the model.
    output_1= BatchNormalization()(output_1) #Normalize and scale activations of the dense layer with BatchNormalization function
    final_output= softmax(output_1, axis= 1)

    loss_func= cce() # Since it is a multi-class classification problem, categorical crossentropy(cce) is used as the loss function
    perf_metrics= AUC() #our performance metric will be area under the curve
    optimizer= Adam() #we shall use Adam optimizer as our optimizer
    self.final_model= Model(inputs= [ip_layer], outputs= [final_output]) #Build the model with input and output layers
    self.final_model.compile(optimizer= optimizer, loss= loss_func, metrics= [perf_metrics]) #Compiling the keras model

  def train_lstm_model(self,x, y, valid_split, ep):
      self.final_model.fit(x, y, validation_split= valid_split, epochs= ep) #Create the train model

  def predict_lstm_model(self,x):
      return self.final_model.predict(x)    #Create the predict model


In [30]:
steps= trans_matrix_train.shape[1] #Define the number of steps is usually the number of steps in the train data.
input_dim= trans_matrix_train.shape[2] #Input dimension. Number of unique words in the train data
output_shape= train_target_encoded.shape[1] #Output shape. Usually the same number as the number of classes in the target variable. Here we have 8.
final_model= lstm_model_class()
final_model.build_lstm_model(input_dimensions= input_dim,
                  op_shape= output_shape,
                  num_steps= steps,
                  dropout_rate= 0.5, # Meaning 1 in 2 inputs will be randomly executed.
                  bias_reg= 0.3, # Reduce the bias in the model
                  kernel_reg= 0.3) #Reduce the weights excluding bias.

## Train , Predict  & Evaluate the model

In [31]:
final_model.train_lstm_model(trans_matrix_train, train_target_encoded,
           0.2, 60) #Model takes train data, train target variable, validation split(here it is 80:20) and number of epochs.

Epoch 1/60
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 64ms/step - auc: 0.6289 - loss: 17.1640 - val_auc: 0.9583 - val_loss: 11.6465
Epoch 2/60
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 53ms/step - auc: 0.8899 - loss: 9.9284 - val_auc: 0.9755 - val_loss: 7.1332
Epoch 3/60
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 55ms/step - auc: 0.9560 - loss: 5.8599 - val_auc: 0.8947 - val_loss: 4.8922
Epoch 4/60
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 19ms/step - auc: 0.9718 - loss: 3.6437 - val_auc: 0.9300 - val_loss: 3.3105
Epoch 5/60
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 18ms/step - auc: 0.9759 - loss: 2.5016 - val_auc: 0.9816 - val_loss: 2.1177
Epoch 6/60
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 18ms/step - auc: 0.9845 - loss: 1.8215 - val_auc: 0.9862 - val_loss: 1.5523
Epoch 7/60
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 18

In [33]:
pred_train= encode_intent.inverse_transform(final_model.predict_lstm_model(trans_matrix_train)) #Predict on the train matrix and look at the performance
train_target = encode_intent.inverse_transform(train_target_encoded)
print(classification_report(train_target, pred_train)) #Print the classification report

[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step
                     precision    recall  f1-score   support

       atis_airfare       0.00      0.00      0.00        16
  atis_abbreviation       0.90      0.98      0.94       127
      atis_aircraft       0.86      0.97      0.91        66
       atis_airfare       0.97      0.99      0.98       338
       atis_airline       0.95      0.98      0.97       120
       atis_airport       0.00      0.00      0.00        17
      atis_capacity       0.00      0.00      0.00        14
          atis_city       0.00      0.00      0.00        13
      atis_distance       0.41      0.37      0.39        19
        atis_flight       0.99      1.00      1.00      2914
     atis_flight_no       0.00      0.00      0.00        10
   atis_flight_time       0.61      0.89      0.72        47
   atis_ground_fare       0.33      0.17      0.22        18
atis_ground_service       0.93      0.97      0.95       213
         

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [34]:
pred_test= encode_intent.inverse_transform(final_model.predict_lstm_model(trans_matrix_test)) #Predict on the test data
test_target = encode_intent.inverse_transform(test_target_encoded)
print(classification_report(test_target, pred_test)) #Print the classification report

[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
                     precision    recall  f1-score   support

       atis_airfare       0.00      0.00      0.00         5
  atis_abbreviation       0.72      0.90      0.80        20
      atis_aircraft       0.88      1.00      0.94        15
       atis_airfare       0.92      1.00      0.96        86
       atis_airline       0.95      0.97      0.96        37
       atis_airport       0.00      0.00      0.00         3
      atis_capacity       0.00      0.00      0.00         2
          atis_city       0.00      0.00      0.00         6
      atis_distance       0.00      0.00      0.00         1
        atis_flight       0.99      0.99      0.99       752
     atis_flight_no       0.00      0.00      0.00         5
   atis_flight_time       0.40      0.75      0.52         8
   atis_ground_fare       0.00      0.00      0.00         1
atis_ground_service       1.00      0.93      0.96        42
          a

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
