# Introduction : 
In this Colab notebook, we will explore how to build an Intent Analysis model using deep learning. Intent Analysis is an essential component of any chatbot or virtual assistant, which helps the system understand the user's intention or purpose behind their message. For instance, a user could ask a chatbot to cancel an order, track a shipment, or request a refund. The chatbot should be able to identify the user's intention and respond accordingly.

We will use a public dataset that contains thousands of user messages and their corresponding intent labels. We will preprocess the dataset, tokenize the text data, and build a deep learning model using the TensorFlow Keras API. We will also evaluate the model's performance on a holdout test set and save the trained model and tokenizer for future use in the chatbot.

By the end of this Colab notebook, you should be able to:

Understand the basics of Intent Analysis
Prepare text data for model training
Build and train a deep learning model for Intent Analysis using Keras
Evaluate the model's performance on a test set
Save the trained model and tokenizer for future use in the chatbot.

In [52]:
import pandas as pd
train_path="/content/drive/MyDrive/Colab Notebooks/Project/Review/intent analysis/Bitext_Sample_Customer_Service_Training_Dataset.csv"
valid_path="/content/drive/MyDrive/Colab Notebooks/Project/Review/intent analysis/Bitext_Sample_Customer_Service_Validation_Dataset.csv"
test_data="/content/drive/MyDrive/Colab Notebooks/Project/Review/intent analysis/Bitext_Sample_Customer_Service_Testing_Dataset.csv"
df_train=pd.read_csv(train_path)
df_test=pd.read_csv(test_data)
df_val=pd.read_csv(valid_path)


In [53]:
df_train.head()

Unnamed: 0,utterance,intent,category,tags
0,would it be possible to cancel the order I made?,cancel_order,ORDER,BIP
1,cancelling order,cancel_order,ORDER,BK
2,I need assistance canceling the last order I h...,cancel_order,ORDER,B
3,problem with canceling the order I made,cancel_order,ORDER,B
4,I don't know how to cancel the order I made,cancel_order,ORDER,B


In [54]:
df_train.columns

Index(['utterance', 'intent', 'category', 'tags'], dtype='object')

## Importing the libraries:

In [55]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Flatten, Dense





In [56]:
# Split data into X and y
X_train = df_train["utterance"].values
y_train = df_train["intent"].values
X_test = df_test["utterance"].values
y_test = df_test["intent"].values
X_val = df_val["utterance"].values
y_val = df_val["intent"].values

# Tokenize text
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
X_val = tokenizer.texts_to_sequences(X_val)

# Pad sequences
max_len = 100
X_train = pad_sequences(X_train, maxlen=max_len)
X_test = pad_sequences(X_test, maxlen=max_len)
X_val = pad_sequences(X_val, maxlen=max_len)

# Encode categorical labels
label_encoder = LabelEncoder()
y_train = label_encoder.fit_transform(y_train)
y_test = label_encoder.transform(y_test)
y_val = label_encoder.transform(y_val)

# Define model
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=32, input_length=max_len))
model.add(Flatten())
model.add(Dense(16, activation="relu"))
model.add(Dense(8, activation="relu"))
model.add(Dense(len(label_encoder.classes_), activation="softmax"))

# Compile model
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

# Train model
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, batch_size=64)

# Evaluate model on test set
loss, acc = model.evaluate(X_test, y_test)
print("Test loss:", loss)
print("Test accuracy:", acc)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test loss: 0.38723069429397583
Test accuracy: 0.9767726063728333


In [57]:
# Tokenize and pad test data
X_test = tokenizer.texts_to_sequences(df_test["utterance"].values)
X_test = pad_sequences(X_test, maxlen=max_len)

# Predict on test data
y_pred = model.predict(X_test)

# Decode categorical labels
y_pred = np.argmax(y_pred, axis=1)
y_pred = label_encoder.inverse_transform(y_pred)

# Print predicted labels
print(y_pred)




['cancel_order' 'cancel_order' 'cancel_order' 'cancel_order'
 'cancel_order' 'cancel_order' 'cancel_order' 'cancel_order'
 'cancel_order' 'cancel_order' 'cancel_order' 'cancel_order'
 'cancel_order' 'cancel_order' 'cancel_order' 'cancel_order'
 'cancel_order' 'cancel_order' 'cancel_order' 'cancel_order'
 'cancel_order' 'cancel_order' 'cancel_order' 'cancel_order'
 'cancel_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_order' 'change_order' 'change_order' 'change_order'
 'change_shipping_addres

In [58]:
len(y_pred)

818

In [59]:
df_test.shape

(818, 4)

In [60]:
results_df = pd.DataFrame({"Actual Intent": df_test['intent'], "Predicted Intent": y_pred})
print(results_df)


    Actual Intent Predicted Intent
0    cancel_order     cancel_order
1    cancel_order     cancel_order
2    cancel_order     cancel_order
3    cancel_order     cancel_order
4    cancel_order     cancel_order
..            ...              ...
813  track_refund     track_refund
814  track_refund     track_refund
815  track_refund     track_refund
816  track_refund     track_refund
817  track_refund     track_refund

[818 rows x 2 columns]


In [61]:
values_match = results_df['Actual Intent'] == results_df['Predicted Intent']

# print the results
print(values_match)

0      True
1      True
2      True
3      True
4      True
       ... 
813    True
814    True
815    True
816    True
817    True
Length: 818, dtype: bool


In [62]:
true_count = 0
false_count = 0

for i in range(len(results_df)):
    if results_df["Actual Intent"][i] == results_df["Predicted Intent"][i]:
        true_count += 1
    else:
        false_count += 1

print("Number of correct predictions:", true_count)
print("Number of incorrect predictions:", false_count)


Number of correct predictions: 799
Number of incorrect predictions: 19


In [63]:
# Save model to file
model.save('intent_classifier.h5')


In [64]:
import os

print(os.getcwd())  # print current working directory

# list contents of current working directory
for file in os.listdir():
    print(file)


/content
.config
intent_classifier.h5
drive
label_encoder.pkl
tokenizer.pkl
sample_data


In [65]:
import pickle

# Save tokenizer
with open("tokenizer.pkl", "wb") as f:
    pickle.dump(tokenizer, f)

# Save label encoder
with open("label_encoder.pkl", "wb") as f:
    pickle.dump(label_encoder, f)


In [66]:
# Import necessary libraries
from tensorflow.keras.preprocessing.sequence import pad_sequences
from keras.models import load_model
import pandas as pd
import pickle

# Load saved model
model = load_model("/content/intent_classifier.h5")

# Load tokenizer
with open("/content/tokenizer.pkl", "rb") as f:
    tokenizer = pickle.load(f)

# Load label encoder
with open("/content/label_encoder.pkl", "rb") as f:
    label_encoder = pickle.load(f)

# Define function to make predictions
def predict_intent(utterance):
    # Tokenize text and pad sequence
    seq = tokenizer.texts_to_sequences([utterance])
    padded_seq = pad_sequences(seq, maxlen=100)

    # Make prediction
    pred = model.predict(padded_seq)[0]

    # Get predicted label and confidence score
    label = label_encoder.inverse_transform([np.argmax(pred)])[0]
    confidence = round(np.max(pred), 4)

    # Return prediction results
    return label, confidence

# Test function with sample utterance
utterance = "What's the trending product today?"
utterance1 = "how can i cancle product?"

label, confidence = predict_intent(utterance)
label1, confidence1 = predict_intent(utterance1)

print(f"Predicted intent: {label}, Confidence: {confidence}")
print(f"Predicted intent: {label1}, Confidence: {confidence1}")



Predicted intent: switch_account, Confidence: 0.5774000287055969
Predicted intent: switch_account, Confidence: 0.9961000084877014
