In [29]:
import pandas as pd
import numpy as np
import pickle
import os
import matplotlib.pyplot as plt
import warnings

import tensorflow as tf
from tensorflow import keras
from HODL import PositionalEmbedding, TransformerEncoder

tf.random.set_seed(42)
warnings.simplefilter(action="ignore", category=FutureWarning)

Data Preprocessing

Let's begin extracting the data from the ATIS dataset and turning into a form that we can use in our Deep Learning models.

The ATIS dataset is standard benchmark dataset widely used to build models for intent classification and slot filling tasks (we will explain all this shortly). You can find a very detailed explanation here.

We will begin by loading the file and then partitioning into a test and a training set.

In [30]:
# Read in the training and testing data
# YOUR CODE HERE
df_train = pd.read_csv('/content/atis_train_data.csv')
df_test = pd.read_csv('/content/atis_test_data.csv')


Let's visualize all of this on a dataframe. Below we display an example query for each intent class in a nice layout.

The first column of the Dataframe below contains the actual query that was asked. The second column indicates the intent (flight, flight time, etc), whereas the last column contains the slot filling structure.

In [31]:
# create the sample dataframe with examples of intent to df_small so you can visualize the training inputs
# YOUR CODE HERE
pd.set_option('display.max_colwidth', None)
df_small = pd.DataFrame(columns=['query','intent','slot filling'])
j = 0
for i in df_train.intent.unique():
  df_small.loc[j] = df_train[df_train.intent==i].iloc[0]
  j = j+1
df_small

Unnamed: 0,query,intent,slot filling
0,i want to fly from boston at 838 am and arrive in denver at 1110 in the morning,flight,O O O O O B-fromloc.city_name O B-depart_time.time I-depart_time.time O O O B-toloc.city_name O B-arrive_time.time O O B-arrive_time.period_of_day
1,what is the arrival time in san francisco for the 755 am flight leaving washington,flight_time,O O O B-flight_time I-flight_time O B-fromloc.city_name I-fromloc.city_name O O B-depart_time.time I-depart_time.time O O B-fromloc.city_name
2,cheapest airfare from tacoma to orlando,airfare,B-cost_relative O O B-fromloc.city_name O B-toloc.city_name
3,what kind of aircraft is used on a flight from cleveland to dallas,aircraft,O O O O O O O O O O B-fromloc.city_name O B-toloc.city_name
4,what kind of ground transportation is available in denver,ground_service,O O O O O O O O B-city_name
5,what 's the airport at orlando,airport,O O O O O B-city_name
6,which airline serves denver pittsburgh and atlanta,airline,O O O B-fromloc.city_name B-fromloc.city_name O B-fromloc.city_name
7,how far is it from orlando airport to orlando,distance,O O O O O B-fromloc.airport_name I-fromloc.airport_name O B-toloc.city_name
8,what is fare code h,abbreviation,O O O O B-fare_basis_code
9,how much does the limousine service cost within pittsburgh,ground_fare,O O O O B-transport_type O O O B-city_name


Let's see how many different types of "intent" are present in the data.

In [32]:
#Get the counts for all intents of the train datasets and assign it to variable "intent_counts" to be printed

# YOUR CODE HERE
intent_counts=df_train['intent'].value_counts()
print (intent_counts)

intent
flight                        3666
airfare                        423
ground_service                 255
airline                        157
abbreviation                   147
aircraft                        81
flight_time                     54
quantity                        51
flight+airfare                  21
airport                         20
distance                        20
city                            19
ground_fare                     18
capacity                        16
flight_no                       12
meal                             6
restriction                      6
airline+flight_no                2
ground_service+ground_fare       1
airfare+flight_time              1
cheapest                         1
aircraft+flight+flight_no        1
Name: count, dtype: int64


In [13]:
# Extract query_data_train, intent_data_train, slot_data_train,
#    query_data_test, intent_data_test, slot_data_test from the train and test dataframes

# YOUR CODE HERE
query_data_train = df_train['query'].values
intent_data_train = df_train['intent'].values
slot_data_train = df_train['slot filling'].values

query_data_test = df_test['query'].values
intent_data_test = df_test['intent'].values
slot_data_test = df_test['slot filling'].values

We briefly mentioned what the difference were between slot filling and intent in the introduction, but is worth going into more detail.

As an example, let’s consider the user query “i want to fly from boston at 838 am and arrive in denver at 1110 in the morning”. The model should classify this user query as “flight” intent. It should also parse the query, identify and fill all slots necessary for understanding the query. Although the words “I”, “want”, “to”, “fly”, “from”, “at”, “and”, “arrive”, “in”, “the” contribute to understand the context of the intent, the model should correctly label the entities needed to fulfill user’s goal in its intention to take a flight. These are “boston” as departure city (B-fromloc.city), “8:38 am” as departure time (B-depart_time.time), “denver” as destination city (B-toloc.city_name), “11:10” as arrival time (B-arrive_time.time) and “morning” as arrival period of day (B-arrive_time.period_of_day). The 123 slot categories are shown below.

In [33]:
# Gather all the unique slots and put them in the set called "unique_slots"

# YOUR CODE HERE
unique_slots = set()
for s in slot_data_train:
  unique_slots = unique_slots.union(set(s.split()))
unique_slots


{'B-aircraft_code',
 'B-airline_code',
 'B-airline_name',
 'B-airport_code',
 'B-airport_name',
 'B-arrive_date.date_relative',
 'B-arrive_date.day_name',
 'B-arrive_date.day_number',
 'B-arrive_date.month_name',
 'B-arrive_date.today_relative',
 'B-arrive_time.end_time',
 'B-arrive_time.period_mod',
 'B-arrive_time.period_of_day',
 'B-arrive_time.start_time',
 'B-arrive_time.time',
 'B-arrive_time.time_relative',
 'B-city_name',
 'B-class_type',
 'B-connect',
 'B-cost_relative',
 'B-day_name',
 'B-day_number',
 'B-days_code',
 'B-depart_date.date_relative',
 'B-depart_date.day_name',
 'B-depart_date.day_number',
 'B-depart_date.month_name',
 'B-depart_date.today_relative',
 'B-depart_date.year',
 'B-depart_time.end_time',
 'B-depart_time.period_mod',
 'B-depart_time.period_of_day',
 'B-depart_time.start_time',
 'B-depart_time.time',
 'B-depart_time.time_relative',
 'B-economy',
 'B-fare_amount',
 'B-fare_basis_code',
 'B-flight_days',
 'B-flight_mod',
 'B-flight_number',
 'B-flight_st

In [34]:
len(unique_slots)

123

123 slot categories!!

Transformers
Explain the attention mechanism
Encoder Model
Because the code for transformer encoder architecture is a bit complicated to write, we have decided to package it. This means that you can import it directly from our own "library" (in the same way you do it for Keras layers).

Import TransformerEncoder, and PositionalEmbedding from 'HODL' file present in the current file directory.

Hint: Take a look at the HODL.py file on the left-sidebar menu.

In [36]:
# Import the transformer encoder and the positional embedding

# YOUR CODE HERE
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, Embedding, Dense, Dropout, MultiHeadAttention, LayerNormalization
from tensorflow.keras import Model, Sequential
from HODL import PositionalEmbedding, TransformerEncoder

In [37]:
query_data_train[:5]

array([' i want to fly from boston at 838 am and arrive in denver at 1110 in the morning ',
       ' what flights are available from pittsburgh to baltimore on thursday morning ',
       ' what is the arrival time in san francisco for the 755 am flight leaving washington ',
       ' cheapest airfare from tacoma to orlando ',
       ' round trip fares from pittsburgh to philadelphia under 1000 dollars '],
      dtype=object)

In [38]:
slot_data_train[:5]

array([' O O O O O B-fromloc.city_name O B-depart_time.time I-depart_time.time O O O B-toloc.city_name O B-arrive_time.time O O B-arrive_time.period_of_day ',
       ' O O O O O B-fromloc.city_name O B-toloc.city_name O B-depart_date.day_name B-depart_time.period_of_day ',
       ' O O O B-flight_time I-flight_time O B-fromloc.city_name I-fromloc.city_name O O B-depart_time.time I-depart_time.time O O B-fromloc.city_name ',
       ' B-cost_relative O O B-fromloc.city_name O B-toloc.city_name ',
       ' B-round_trip I-round_trip O O B-fromloc.city_name O B-toloc.city_name B-cost_relative B-fare_amount I-fare_amount '],
      dtype=object)

In [41]:
# Set the max_query_length to 30 to start

# YOUR CODE HERE
max_query_length = 30

# Textvec of slots. Assign this to "text_vectorization_slots"
# YOUR CODE HERE
text_vectorization_slots = keras.layers.TextVectorization(
    output_sequence_length=max_query_length,
    standardize=None
)

# Adapt the slot training data
# YOUR CODE HERE
text_vectorization_slots.adapt(slot_data_train)

# Assign the number of slots to slot_vocab_size
# YOUR CODE HERE
slot_vocab_size = text_vectorization_slots.vocabulary_size()

# Get the "target_train" and "target_test" data
# YOUR CODE HERE
target_train = text_vectorization_slots(slot_data_train)
target_test = text_vectorization_slots(slot_data_test)

# Get the text_vectorization_query
# YOUR CODE HERE
text_vectorization_query = keras.layers.TextVectorization(
    output_sequence_length=max_query_length
)


# Adapt the query train data
# YOUR CODE HERE
text_vectorization_query.adapt(query_data_train)

# Assign the number of unique query words
# YOUR CODE HERE
query_vocab_size = text_vectorization_query.vocabulary_size()

# Create source_train and source_test
# YOUR CODE HERE
source_train = text_vectorization_query(query_data_train)
source_test = text_vectorization_query(query_data_test)


In [42]:

# Create the 4 model params
# YOUR CODE HERE
embedding_dim = 512
encoder_units = 64
units = 128
num_heads = 5


# Embedding and Masking
# Create the inputs, embedding, and x
# YOUR CODE HERE
inputs = keras.Input(shape=(max_query_length,))
embedding = PositionalEmbedding(max_query_length, query_vocab_size, embedding_dim)
x = embedding(inputs)


# Transformer Encoding
# Create encoder_out
# YOUR CODE HERE
encoder_out = TransformerEncoder(embedding_dim, encoder_units, num_heads)(x)

# Classifier
# add layers to x and define the outputs
# YOUR CODE HERE
x = keras.layers.Dense(units, activation='relu')(encoder_out)
x = keras.layers.Dropout(0.5)(x)
outputs = keras.layers.Dense(slot_vocab_size, activation="softmax")(x)


### finally apply the inputs and outputs to your model
model = keras.Model(inputs, outputs)
model.summary()

ValueError: A KerasTensor cannot be used as input to a TensorFlow function. A KerasTensor is a symbolic placeholder for a shape and dtype, used when constructing Keras Functional models or Keras Functions. You can only use it as input to a Keras layer or a Keras operation (from the namespaces `keras.layers` and `keras.operations`). You are likely doing something like:

```
x = Input(...)
...
tf_fn(x)  # Invalid.
```

What you should do instead is wrap `tf_fn` in a layer:

```
class MyLayer(Layer):
    def call(self, x):
        return tf_fn(x)

x = MyLayer()(x)
```


In [43]:
# Compile your model
# YOUR CODE HERE
# Compile your model
# YOUR CODE HERE
model.compile(
    optimizer='adam',  # You can also use other optimizers like 'sgd', 'rmsprop', etc.
    loss='sparse_categorical_crossentropy',  # Or 'categorical_crossentropy' depending on your labels
    metrics=['accuracy']  # You can track other metrics like 'precision', 'recall', etc.
)



NameError: name 'model' is not defined

In [44]:
# Set the batch size to 64 and epochs to 10 to start
# YOUR CODE HERE
batch_size = 64
epochs = 10

# Fit the model
# YOUR CODE HERE
model.fit(
    x=source_train,               # Vectorized query data (input)
    y=target_train,               # Vectorized slot data (target)
    batch_size=batch_size,        # Batch size set to 64
    epochs=epochs,                # Train for 10 epochs
    validation_data=(source_test, target_test)  # Validation data

SyntaxError: incomplete input (<ipython-input-44-04957fcb5b23>, line 13)

In [45]:
# define slot_filling_accuracy function
# YOUR CODE HERE
def slot_filling_accuracy(actual, predicted, only_slots=False):
    not_padding = np.not_equal(actual, 0)  # Filter out padding tokens

    if only_slots:
        non_slot_token = text_vectorization_slots(['O']).numpy()[0, 0]
        slots = np.not_equal(actual, non_slot_token)
        correct_predictions = np.equal(actual, predicted)[not_padding & slots]
    else:
        correct_predictions = np.equal(actual, predicted)[not_padding]

    sample_length = len(correct_predictions)
    weights = np.ones(sample_length)

    return np.dot(correct_predictions, weights) / sample_length


# Get the predicted data
# YOUR CODE HERE
predicted = np.argmax(model.predict(source_test), axis=-1).reshape(-1)

# Get the actual data
# YOUR CODE HERE
actual = target_test.numpy().reshape(-1)

# Now get the accuracy "acc" and slot accuracy "acc_slots"
# YOUR CODE HERE
acc = slot_filling_accuracy(actual, predicted, only_slots=False)
acc_slots = slot_filling_accuracy(actual, predicted, only_slots=True)


print(f'Accuracy = {acc:.3f}')
print(f'Accuracy on slots = {acc_slots:.3f}')

NameError: name 'model' is not defined

Now we get 92% accuracy on the slots and 97% accuracy in general. This is so much better!!

Let's see some examples:

In [None]:
# Define the predict_slots_query() function which takes in a query
# YOUR CODE HERE
def predict_slots_query(query):
    # Vectorize the input query
    vectorized_query = text_vectorization_query([query])  # Assuming text_vectorization_query is defined

    # Predict using the model
    predicted_slots = model.predict(vectorized_query)

    # Convert predictions to slot labels
    predicted_slots = np.argmax(predicted_slots, axis=-1)[0]  # Get the slot predictions

    # Decode the slots back to their string labels
    slot_labels = text_vectorization_slots.get_vocabulary()
    decoded_slots = [slot_labels[i] for i in predicted_slots]

    # Pair each word in the query with its corresponding slot
    words = query.split()
    return list(zip(words, decoded_slots))

examples = [
            'from los angeles',
            'to los angeles',
            'from boston',
            'to boston',
            'cheapest flight from boston to los angeles tomorrow',
            'what is the airport at orlando',
            'what are the air restrictions on flights from pittsburgh to atlanta for the airfare of 416 dollars',
            'flight from boston to santiago',
            'flight boston to santiago'
]

for e in examples:
    print(e)
    print(predict_slots_query(e))
    print()

Even though 'Santiago' is not a city that is present in the training data set, it is still capable of recognizing it as a destination city name just by context! This is the power of the attention mechanism of transformers.

Can we get even better accuracy if we train for longer? Let's try!

In [None]:
# Try to change parameters to get your accuracy up
# YOUR CODE HERE
raise NotImplementedError()

# Fit your model
# YOUR CODE HERE
model.fit(
    x=source_train,               # Vectorized query data (input)
    y=target_train,               # Vectorized slot data (target)
    batch_size=batch_size,        # Batch size set to 64
    epochs=epochs,                # Train for 10 epochs
    validation_data=(source_test, target_test)  # Validation data

In [None]:
# Define slot_filling_accuracy
# YOUR CODE HERE
def slot_filling_accuracy (actual, predicted, only_slots=False):
    not_padding = np.not_equal(actual, 0)

    if only_slots:
        non_slot_token = text_vectorization_slots(['O']).numpy()[0, 0]
        slots = np.not_equal(actual, non_slot_token)
        correct_predictions = np.equal(actual, predicted)[not_padding * slots]
    else:
        correct_predictions = np.equal(actual, predicted)[not_padding]

    sample_length = len(correct_predictions)

    return np.sum(correct_predictions) / sample_length


predicted = np.argmax(model.predict(source_test), axis=-1).reshape(-1)
actual = target_test.numpy().reshape(-1)

acc = slot_filling_accuracy(actual, predicted, only_slots=False)
acc_slots = slot_filling_accuracy(actual, predicted, only_slots=True)

print(f'Accuracy = {acc:.3f}')
print(f'Accuracy on slots = {acc_slots:.3f}')