# Hotel Booking Chatbot
## Core NLP: Deep Learning Intent Classifier - LSTM

This component implements the Intent Classification module, designed to categorize user queries into one of 10 predefined hotel booking intents.

- Model: Sequential LSTM Network (Long Short-Term Memory).
- Frameworks: Keras} and TensorFlow.
- Output: Includes detailed performance metrics (F1 Score, Precision, Recall) and necessary files exported for platform deployment (e.g., Streamlit).

This approach provides high accuracy and performance for complex text classification tasks.

In [None]:
# install necessary libraries
!pip install scikit-learn
!pip install pandas
!pip install tensorflow

## STEP 1: Setup and Configuration

This cell imports necessary libraries and defines configuraton parameters

In [1]:
import pandas as pd
import numpy as np
import joblib 

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, accuracy_score

# TensorFlow/Keras libraries
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout

# --- Configuration Parameters ---
MAX_SEQUENCE_LENGTH = 20  # Max number of words to consider per sentence
EMBEDDING_DIM = 100       # Dimension for the learned word embeddings
EPOCHS = 40               # Number of training epochs
BATCH_SIZE = 32           # Number of samples per gradient update
TEST_SIZE = 0.2           # 20% of data for testing
RANDOM_STATE = 42         # For reproducible results

# --- A. CHATBOT RESPONSE LOOKUP TABLE ---
# This dictionary maps the predicted intent name (string) to a fixed response.
RESPONSE_DICT = {
    "ask_room_price": "Our standard room price is 150 MYR per night, and a deluxe room is 250 MYR. Which room type would you like to inquire about?",
    "ask_availability": "Could you please provide the check-in and check-out dates? I can check real-time room availability for you.",
    "ask_facilities": "We offer free Wi-Fi, complimentary parking, an indoor swimming pool, and a 24-hour gym.",
    "ask_location": "Our hotel is situated in the city center, close to the central station and major shopping areas. You can find the full address on our website.",
    "ask_checkin_time": "Our standard check-in time is 3:00 PM. Please contact the front desk if you require early check-in.",
    "ask_checkout_time": "Please ensure you check out before 12:00 PM (noon). Late check-outs may incur an additional charge.",
    "ask_booking": "You can make a reservation through our official website, by calling our booking hotline, or via major online travel platforms.",
    "ask_cancellation": "Our cancellation policy depends on your booking type. Generally, cancellation is free if done 24 hours in advance.",
    "greeting": "Hello! I am happy to assist you. How may I help you with your booking or answer your questions?",
    "goodbye": "Thank you for reaching out! Have a wonderful day. Feel free to contact me if you have any other questions.",
    "default": "I apologize, but I currently cannot understand your request. Could you please try rephrasing your question?" # Default response for unrecognized intents
}

## STEP 2: Data Loading and Preprocessing

Load the data, encode that intent labels into numerical IDs, and split the data into training and testing sets.

In [2]:
# Load the dataset
df = pd.read_csv('dataset.csv')

# Initialize LabelEncoder
le = LabelEncoder()
df['label'] = le.fit_transform(df['intent'])
num_classes = len(le.classes_)

# Separate features (X) and labels (y)
X = df['text']
y = df['label']

# Split data: 80% train, 20% test (stratified for balanced classes)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y
)

print(f"Number of unique intent classes: {num_classes}")
print(f"Training samples: {len(X_train)}, Testing samples: {len(X_test)}")

Number of unique intent classes: 10
Training samples: 80, Testing samples: 20


## STEP 3: Tokenization and Data Preparation for DL

Convert text data into numerical sequences (tokens) and apply padding, and convert labels to One-Hot vectors.

In [3]:
# 1. Initialize and fit Tokenizer
tokenizer = Tokenizer(num_words=None, oov_token="<unk>") 
tokenizer.fit_on_texts(X_train)

# 2. Convert text to sequences of integers
X_train_sequences = tokenizer.texts_to_sequences(X_train)
X_test_sequences = tokenizer.texts_to_sequences(X_test)
word_index = tokenizer.word_index
VOCAB_SIZE = len(word_index) + 1 # Vocabulary size

# 3. Padding sequences (standardize input size)
X_train_padded = pad_sequences(X_train_sequences, maxlen=MAX_SEQUENCE_LENGTH, padding='post', truncating='post')
X_test_padded = pad_sequences(X_test_sequences, maxlen=MAX_SEQUENCE_LENGTH, padding='post', truncating='post')

# 4. One-Hot Encode the labels
y_train_one_hot = to_categorical(y_train, num_classes=num_classes)
y_test_one_hot = to_categorical(y_test, num_classes=num_classes)

print(f"Vocabulary Size: {VOCAB_SIZE}")
print(f"Padded Training Data Shape: {X_train_padded.shape}")

Vocabulary Size: 124
Padded Training Data Shape: (80, 20)


## STEP 4: Build, Compile and Train the LSTM Model

Define the Embedding + LSTM architecture and start training.

In [4]:
# Define the LSTM Model
model = Sequential([
    # Embedding Layer: Maps word indices to dense vectors
    Embedding(input_dim=VOCAB_SIZE, 
              output_dim=EMBEDDING_DIM, 
              input_length=MAX_SEQUENCE_LENGTH,
              name='embedding_layer'),
    
    # LSTM Layer: Captures sequential and long-term dependencies
    LSTM(units=128, 
         return_sequences=False, # Output only the final state for classification
         name='lstm_layer'),
    
    # Dropout: Regularization
    Dropout(0.5),
    
    # Dense Output Layer: Softmax for multi-class prediction
    Dense(units=num_classes, activation='softmax', name='output_layer')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

print("--- Model Structure and Training Started ---")
model.summary()

# Train the model
history = model.fit(X_train_padded, y_train_one_hot,
                    epochs=EPOCHS,
                    batch_size=BATCH_SIZE,
                    validation_split=0.1,
                    verbose=1)

--- Model Structure and Training Started ---




Epoch 1/40
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 208ms/step - accuracy: 0.0972 - loss: 2.3047 - val_accuracy: 0.0000e+00 - val_loss: 2.3179
Epoch 2/40
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.1250 - loss: 2.2961 - val_accuracy: 0.0000e+00 - val_loss: 2.3283
Epoch 3/40
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.1667 - loss: 2.3008 - val_accuracy: 0.0000e+00 - val_loss: 2.3306
Epoch 4/40
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.0833 - loss: 2.3123 - val_accuracy: 0.1250 - val_loss: 2.3281
Epoch 5/40
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.0833 - loss: 2.3032 - val_accuracy: 0.1250 - val_loss: 2.3274
Epoch 6/40
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.0972 - loss: 2.2913 - val_accuracy: 0.1250 - val_loss: 2.3291
Epoch 7/40
[1m3/3[0m [32m━━━━━

## STEP 5: Evaluate Intent Recognition Performance (Accuracy, F1, Precision, Recall)

Calculate and display the detailed metrics required for the assignment.

In [5]:
# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test_padded, y_test_one_hot, verbose=0)

# Make predictions on the test set
y_pred_probs = model.predict(X_test_padded)
y_pred = np.argmax(y_pred_probs, axis=1) # Convert probabilities to class IDs
y_true = y_test.values                   # Get true class IDs

# Get intent names for the report
intent_names = le.classes_


print(f"\n--- Intent Recognition Performance Evaluation ---")
print(f"Test Accuracy (Accuracy of Intent Recognition): {accuracy:.4f}")
print(f"Test Loss: {loss:.4f}")

# Print F1 Score, Precision, and Recall report
print("\n--- Detailed Classification Report (F1, Precision, Recall) ---")
print(classification_report(y_true, y_pred, target_names=intent_names, zero_division=0))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 219ms/step

--- Intent Recognition Performance Evaluation ---
Test Accuracy (Accuracy of Intent Recognition): 0.7500
Test Loss: 1.3146

--- Detailed Classification Report (F1, Precision, Recall) ---
                   precision    recall  f1-score   support

 ask_availability       0.00      0.00      0.00         2
      ask_booking       1.00      1.00      1.00         2
 ask_cancellation       0.67      1.00      0.80         2
 ask_checkin_time       1.00      1.00      1.00         2
ask_checkout_time       1.00      1.00      1.00         2
   ask_facilities       0.00      0.00      0.00         2
     ask_location       1.00      1.00      1.00         2
   ask_room_price       1.00      1.00      1.00         2
          goodbye       0.67      1.00      0.80         2
         greeting       1.00      0.50      0.67         2

         accuracy                           0.75        20
        macro avg       0.73 

## STEP 6: Model Export for Streamlit Deployment

Export the Keras model and preprocessing components (Tokenizer, LabelEncoder) to the current working directory using .h5 and .joblib.

In [8]:
# 1. Export Keras Model (.h5 format)
model_path = 'lstm_intent_model.h5'
model.save(model_path)
print(f"\n[EXPORT SUCCESS] Keras Model saved to: {model_path}")

# 2. Export Tokenizer (using joblib)
tokenizer_path = 'tokenizer.joblibLSTM'
joblib.dump(tokenizer, tokenizer_path)
print(f"[EXPORT SUCCESS] Tokenizer saved to: {tokenizer_path}")

# 3. Export LabelEncoder (using joblib)
label_encoder_path = 'label_encoder.joblib'
joblib.dump(le, label_encoder_path)
print(f"[EXPORT SUCCESS] LabelEncoder saved to: {label_encoder_path}")

print("\n--- Deployment Files Ready in Current Directory ---")




[EXPORT SUCCESS] Keras Model saved to: lstm_intent_model.h5
[EXPORT SUCCESS] Tokenizer saved to: tokenizer.joblibLSTM
[EXPORT SUCCESS] LabelEncoder saved to: label_encoder.joblib

--- Deployment Files Ready in Current Directory ---


## STEP 7: Prediction Example

Use the exported components to demonstrate how the model predicts the intent for new, unseen user queries.

In [6]:
print("\n--- 7. Prediction Example - Includes Chatbot Response ---")

# Example user queries
new_text = [
    "I need to cancel my room now", # Expected: ask_cancellation
    "What is the cheapest room?",    # Expected: ask_room_price
    "Hello",                         # Expected: greeting
    "Where is the hotel located?"    # Expected: ask_location
]

# Step 1: Preprocess the new text using the trained Tokenizer and Padding
new_sequences = tokenizer.texts_to_sequences(new_text)
new_padded = pad_sequences(new_sequences, maxlen=MAX_SEQUENCE_LENGTH, padding='post', truncating='post')

# Step 2: Make predictions
predictions = model.predict(new_padded)

# Step 3: Find the intent ID with the highest probability
predicted_intent_id = np.argmax(predictions, axis=1)

# Step 4: Convert the predicted ID back to the intent name using the LabelEncoder
predicted_intent_name = le.inverse_transform(predicted_intent_id)

for query, intent in zip(new_text, predicted_intent_name):
    # Retrieve the response based on the predicted intent
    # Use .get() with 'default' key as a fallback for robustness
    response = RESPONSE_DICT.get(intent, RESPONSE_DICT['default'])
    
    print(f"\n--- USER QUERY: {query}")
    print(f"--- PREDICTED INTENT: {intent}")
    print(f"--- CHATBOT RESPONSE: {response}")


--- 7. Prediction Example - Includes Chatbot Response ---
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 207ms/step

--- USER QUERY: I need to cancel my room now
--- PREDICTED INTENT: ask_booking
--- CHATBOT RESPONSE: You can make a reservation through our official website, by calling our booking hotline, or via major online travel platforms.

--- USER QUERY: What is the cheapest room?
--- PREDICTED INTENT: ask_room_price
--- CHATBOT RESPONSE: Our standard room price is 150 MYR per night, and a deluxe room is 250 MYR. Which room type would you like to inquire about?

--- USER QUERY: Hello
--- PREDICTED INTENT: greeting
--- CHATBOT RESPONSE: Hello! I am happy to assist you. How may I help you with your booking or answer your questions?

--- USER QUERY: Where is the hotel located?
--- PREDICTED INTENT: ask_location
--- CHATBOT RESPONSE: Our hotel is situated in the city center, close to the central station and major shopping areas. You can find the full address on our web