# NLP Architect - Intent Extraction tutorial

Let's start by importing all the important classes

In [1]:
import numpy as np
from nlp_architect.models.intent_extraction import MultiTaskIntentModel
from nlp_architect.data.intent_datasets import SNIPS
from nlp_architect.utils.embedding import load_word_embeddings
from nlp_architect.utils.metrics import get_conll_scores
from nlp_architect.utils.generic import one_hot

from tensorflow.python.keras.utils import to_categorical

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


---
## Preparing the data
The first step is the download the dataset into a folder and load the data into the memory
using the `SNIPS` data loader.

### SNIPS NLU Benchmark dataset

SNIPS dataset has 7 types of intents:
- ‘Add to playlist’
- ‘Rate book’
- ‘Check weather’
- ‘Play music’
- ‘Book restaurant’
- ‘Search event’
- ‘Search art’

73 types of labels (including `B-` and `I-` prefixed labels), train/test set sizes: ~14000/700

More info: [here](https://github.com/snipsco/nlu-benchmark)

(The terms and conditions of the data set license apply. Intel does not grant any rights to the data files)

Git clone the repository with the dataset:
```
git clone https://github.com/snipsco/nlu-benchmark.git
```

Point the source of the dataset to `nlu-benchmark/2017-06-custom-intent-engines/` 

In [2]:
sentence_length = 50
word_length = 12

In [3]:
dataset_path = 'nlu-benchmark/2017-06-custom-intent-engines/'
dataset = SNIPS(path=dataset_path,
                sentence_length=sentence_length,
                word_length=word_length)

Once the dataset is loaded, we can extract the ready made `train` and `test` sets. Each set is made up of a tuple of 4 elements:
- Words (`train_x` and `test_x`)
- Word character representation (`train_c` and `test_c`)
- Intent type (`train_i` and `test_i`)
- Token slot tags (`train_y` and `test_y`)

In [4]:
train_x, train_c, train_i, train_y = dataset.train_set
test_x, test_c, test_i, test_y = dataset.test_set

In [5]:
train_x.shape, train_c.shape, train_i.shape, train_y.shape

((13784, 50), (13784, 50, 12), (13784,), (13784, 50))

Sentences are encoded in sparse int representation (str->int vocabularies stored in the dataset object) as NumPy arrary.
Lets look at the sentence in index 5544, translate it back to strings so we could read the sentence, and look at the encoded label tags.

In [6]:
train_x[5544]

array([1261,  103,    6, 5286, 5295,  263, 2228, 3331, 3310,    5, 6370,
       5325, 2523, 1250,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0], dtype=int32)

In [7]:
[dataset.word_vocab.id_to_word(i) for i in train_x[5544] if dataset.word_vocab.id_to_word(i) is not None]

['what',
 'is',
 'the',
 'weather',
 'forecast',
 'for',
 'four',
 'pm',
 'close',
 'to',
 'stretch',
 'point',
 'state',
 'park']

In [8]:
[dataset.tags_vocab.id_to_word(i) for i in train_y[5544] if dataset.tags_vocab.id_to_word(i) is not None]

['O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'B-timeRange',
 'I-timeRange',
 'B-spatial_relation',
 'O',
 'B-geographic_poi',
 'I-geographic_poi',
 'I-geographic_poi',
 'I-geographic_poi']

---
### External word embedding

Now it's time to load the external word embedding model.
We'll use `load_word_embeddings` function that reads the file and loads up the words into numpy arrays.
Once done, we'll create a 2D array with the words we have in our dataset word lexicon - we'll save it in `embedding_matrix` and we'll use it later when we load the embedding layer of the words.

You can download the GloVe word embedding models from [here](https://nlp.stanford.edu/projects/glove/).

(The terms and conditions of the data set license apply. Intel does not grant any rights to the data files)


In [9]:
from nlp_architect.utils.embedding import get_embedding_matrix

embedding_path = 'glove.6B.100d.txt'
embedding_size = 100

embedding_model, _ = load_word_embeddings(embedding_path)
embedding_mat = get_embedding_matrix(embedding_model, dataset.word_vocab)

embedding_mat.shape

(11975, 100)

---
## Building the network

Now for the fun part, let's start by defining the parameters of the network we're going to build, such as, the LSTM layer's hidden state, the number of output labels and intents to predict and the size of the character embedding vectors.

The network topology looks as the following diagram

### High level topology

![image.png](attachment:image.png)

This network is defined in `nlp_architect.models.intent_extraction` packages as `MultiTaskIntentModel`.

We first convert the slot labels an intent classifications into 1-hot encoding

In [10]:
test_y = to_categorical(test_y, dataset.label_vocab_size)
train_y = to_categorical(train_y, dataset.label_vocab_size)
train_i = one_hot(train_i, len(dataset.intents_vocab))
test_i = one_hot(test_i, len(dataset.intents_vocab))

We define the input and output data sources

In [11]:
train_inputs = [train_x, train_c]
train_outs = [train_i, train_y]
test_inputs = [test_x, test_c]
test_outs = [test_i, test_y]

We initiate the model object and build the network with the defined parameters

In [16]:
model = MultiTaskIntentModel()
model.build(dataset.word_len,
            dataset.label_vocab_size,
            dataset.intent_size,
            dataset.word_vocab_size-1,
            dataset.char_vocab_size,
            word_emb_dims=embedding_size,
            tagger_lstm_dims=100,
            dropout=0.2)
model.load_embedding_weights(embedding_mat)

2023-04-27 13:47:54.947280: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-04-27 13:47:54.963468: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe5b6ea55a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-04-27 13:47:54.963481: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version


## Training the network
We've got a model, it's time to train the network.

We define the batch size and the number of epochs to run.

In [17]:
batch_size = 32
no_epochs = 5

# train the model
model.fit(train_inputs, train_outs,
          batch_size=batch_size,
          epochs=no_epochs,
          validation=(test_inputs, test_outs))

Train on 13784 samples, validate on 700 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Testing and evaluating the network
Great! we have a trained model, let's check how well it performs.

First, we need to run all the test data through the network and get the network's preditions. Once done, we can use `get_conll_scores` to get the actual CONLLEVAL benchmark results on the test data (in terms of precision/recall/F1 and per label type).

In [18]:
predictions = model.predict([test_x, test_c], batch_size=batch_size)

In [19]:
predictions[0].shape, predictions[1].shape

((700, 9), (700, 50, 75))

In [20]:
eval = get_conll_scores(predictions, test_y,
                            {v: k for k, v in dataset.tags_vocab.vocab.items()})

Overall performance

In [21]:
print(eval)

                            precision    recall  f1-score   support

               best_rating      0.981     1.000     0.990        51
                 timeRange      0.920     0.945     0.933       110
           restaurant_type      0.924     0.984     0.953        62
          spatial_relation      0.957     0.985     0.971        68
              rating_value      0.990     1.000     0.995       100
                    artist      0.884     0.908     0.896       109
               object_name      0.924     0.967     0.945       151
                     state      0.943     0.980     0.962        51
      object_location_type      1.000     1.000     1.000        20
                   service      1.000     0.974     0.987        39
                  facility      1.000     1.000     1.000         7
                   cuisine      1.000     0.909     0.952        11
               object_type      1.000     0.994     0.997       156
               rating_unit      1.000     1.000

Per label performance breakdown

Intent classification accuracy

In [22]:
from sklearn.metrics import accuracy_score
predicted_intents = predictions[0].argmax(1)
truth_intents = test_i.argmax(1)
accuracy_score(truth_intents, predicted_intents)

0.9871428571428571

Using GloVe 300 word embedding model and 50+ epochs of training should produce a model with:

- Intent detection: >99 F1
- Slot label classification: >95 F1
---

In [26]:
model.save("mtryfoss_model")

In [28]:
import pickle

with open("mtryfoss_model_info", "wb") as fp:
    info = {
        "type": "mtl",
        "tags_vocab": dataset.tags_vocab.vocab,
        "word_vocab": dataset.word_vocab.vocab,
        "char_vocab": dataset.char_vocab.vocab,
        "intent_vocab": dataset.intents_vocab.vocab,
    }
    pickle.dump(info, fp)