In [2]:
from utils.nlu_engine import NLUEngine
from utils.nlu_engine import DataUtils
from utils.nlu_engine import IntentMatcher, LR
from utils.nlu_engine import EntityExtractor
import nltk

# Example of intent and entity classification with NLU engine class
This is just a small example notebook to help users understand how to use the NLU engine.

* Intent example
* Entity example

Load data set. For this example, we will use the cleaned dataset, although you can load any dataset you like.

In [3]:
nlu_data_df = DataUtils.load_data(
    'data/NLU-Data-Home-Domain-Annotated-All-Cleaned.csv'
)

## Intent classification: example of a single utterance

Both the intents and the domains (scenarios/skills) can be used to label an utterance. In this example we will use domains to label the utterances' intents. 

In [4]:
domains = nlu_data_df.scenario.values

LR_domain_classifier_model, tfidf_vectorizer = NLUEngine.train_intent_classifier(
    data_df_path=nlu_data_df,
    labels_to_predict='domain',
    classifier=LR
)


Training LogisticRegression(random_state=0, solver='liblinear')


Example: Let's try to predict an utterances intent label using the domains.

In [5]:
utterance = "turn off the kitchen lights"

print(IntentMatcher.predict_label(
    LR_domain_classifier_model, tfidf_vectorizer, utterance))

iot


## Entity extraction

The entity extraction could be greatly improved by improving the features it uses. It would be great if someone would take a look at this. Perhaps the CRF features similar to what Snips uses would be better such as Brown clustering (probably).

It is important to have the NLTK tokenizer to be able to extract entities.

In [6]:
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

### Example: Extracting entities from an utterance

In [7]:
crf_model = NLUEngine.train_entity_classifier(data_df=nlu_data_df)

Training entity classifier


In [8]:
utterance = 'wake me up at five pm this week'

We can get the entity tags of a specific utterance with the EntityExtractor.


In [9]:
EntityExtractor.get_entity_tags(utterance, crf_model)

[('time', 'five'), ('time', 'pm'), ('date', 'this'), ('date', 'week')]

We can also get the entity tagged utterance with the NLUEngine.


In [10]:
entity_tagged_utterance = NLUEngine.create_entity_tagged_utterance(
    utterance, crf_model)

entity_tagged_utterance

'wake me up at [time : five pm] [date : this week]'