## Natural Language Understanding 

___

Rasa NLU is an open-source natural language processing tool for intent classification, response retrieval and
entity extraction in chatbots.

In this section, you will enable your bot to understand the user inputs by building a Rasa NLU model.

This model will take unstructured user inputs and extract structured data in a form of intents and entities:

   - **Intent** - a label which represents the overall intention of the user 's input
   - **Entites** - important detail which an bot should extract and use to steer the conversation

For example, taking a sentence like

> "I am looking for a Mexican restaurant in the center of town"

and returning structured data like

>```
{
      "intent": "search_restaurant",
      "entities": {
        "cuisine" : "Mexican",
        "location" : "center"
      }
 }
```

*<strong>Before you start writing the code, please run this command so that you can asynchronous Rasa code in Jupyter Notebooks since Jupyter Notebooks already run on event loops.</strong>*

In [10]:
import nest_asyncio

nest_asyncio.apply()
print("Event loop ready.")

Event loop ready.


### Step 1: Creating the NLU training data:
To train the NLU model you will need some labeled training data. Rasa NLU training data samples consist of the following components:

- intent label which starts with a prefix  **##**
- examples of text inputs which correspond to that label
- entities which follow the format [entity_value] (entity_label)

We will start by generating some training data examples by hand.

In [1]:
nlu_md = """

## intent:greet
- hey
- hello there
- hi
- hello there
- good morning
- good evening
- moin
- hey there
- let's go
- hey dude
- goodmorning
- goodevening
- good afternoon

## intent:goodbye
- cu
- good by
- cee you later
- good night
- good afternoon
- bye
- goodbye
- have a nice day
- see you around
- bye bye
- see you later

## intent:recommend_session
- What presentation would you recommend to [data scientists](relevant_audience)?
- Which talks are relevant to people in [Machine Learning](relevant_audience) field?
- I work as a [product manager](relevant_audience). What sessions would you recommend for me to attend today?
- Are the any talks you could recommend to [machine learning](relevant_audience) folks to attend tomorrow?
- Which talks today are relevant to [developers](relevant_audience)?

## intent:speaker
- Who is the speaker?
- And who's presenting?
- What's the name of the presenter?
- Who's presenting?
- Who's speaking?
- The name of the speaker?

## intent:length
- How long is the session?
- And what's the length of this?
- How long is this session?
- Can you tell me how long this session is?
- Is the session long?

## intent:abstract
- Show me the abstract
- Can you give me more details about this talk?
- Is there a description of this presetnation?
- Can you show me an abstract of this talk?
- Show me the abstract, please
- Can you show me the summary of the talk?
- What this talk will be about?

## intent:thanks
- Thank you.
- very useful. thank you so much!
- Thanks
- thanks a lot
- thank you so much

## intent:inform
- to [Data Scientists](relevant_audience)
- relevant to [machine learning engineers](relevant_audience)
- for [product](relevant_audience) people
"""

%store nlu_md > data/nlu.md

Writing 'nlu_md' (str) to file 'data/nlu.md'.


### Step 2: Designing the training pipeline:

Once the training data is ready, we can define the NLU model. We can do that by constructing the processing pipeline which defines how structured data will be extracted from unstructured user inputs: how the sentences will be tokenized, what intent classifier will be used, what entity extraction model will be used, etc. Each component in a training pipeline is trained one after another and can take inputs from the previously defined component as well as pass some information to subsequent ones.

In [3]:
configuration = """
language: "en"

pipeline:
- name: "WhitespaceTokenizer"       # splits the sentence into tokens           
- name: "CRFEntityExtractor"                   # uses the pretrained spacy NER model
- name: "CountVectorsFeaturizer"     # transform the sentence into a vector representation
- name: "EmbeddingIntentClassifier"   # intent classifier
""" 

%store configuration > config.yml

Writing 'configuration' (str) to file 'config.yml'.


### Step 3: Training the first Rasa NLU Model
Now, we're going to train the NLU model to recognise user inputs, so that when you send a message like "hello" to your bot, it will recognise this as a "greet" intent. Let's define the training function:

In [7]:
from rasa.nlu.training_data import load_data
from rasa.nlu.config import RasaNLUModelConfig
from rasa.nlu.model import Trainer
from rasa.nlu import config

def train_nlu_model():
    # loading the nlu training samples
    training_data = load_data("data/nlu.md")

    # trainer to educate our pipeline
    trainer = Trainer(config.load("config.yml"))

    # train the model!
    interpreter = trainer.train(training_data)

    # store it for future use
    model_directory = trainer.persist("./models/current", fixed_model_name="nlu")
    
    return interpreter, model_directory

Finally, let's train the model using the previously defined data and model configuration:

In [8]:
#initially it will throw some warnings related to Tensorflow, ignore the warnings
interpreter, model_directory = train_nlu_model()

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.







INFO:tensorflow:Entry Point [tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv] registered with id [T2TEnv-TicTacToeEnv-v0]






Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the 


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Epochs: 100%|█████████████████████████████████████████████████| 300/300 [00:13<00:00, 22.00it/s, loss=0.572, acc=1.000]





### Step 4: Testing the model

We have trained the first version of our NLU model! Let's test it on various inputs:

In [11]:
import logging, io, json, warnings
logging.basicConfig(level="INFO")
warnings.filterwarnings('ignore')

def pprint(o):
    # small helper function to make dict dumps a bit prettier
    print(json.dumps(o, indent=2))

#change the input message with your prefered inputs
input_message = "What talks would you recommend to data scientists?"
pprint(interpreter.parse(input_message))

### Step 5: Handling out-of-scope inputs
When dealing with conversational AI, out-of-scope user inputs are very common challenge. These inputs represent the user requests which have nothing to do with the chatbot's job. While it's very challenging to provide a sensible answer to each out-of-scope input, it's important to enable your chatbot to identify such inputs and guide the user back to the conversation. First, let's enable our assistant to identify out-of-scope inputs. To do that, we will add a new intent called out-of-scope to our training dataset and provde some corresponding inputs:

In [12]:
nlu_md = """


## intent:greet
- hey
- hello there
- hi
- hello there
- good morning
- good evening
- moin
- hey there
- let's go
- hey dude
- goodmorning
- goodevening
- good afternoon

## intent:goodbye
- cu
- good by
- cee you later
- good night
- good afternoon
- bye
- goodbye
- have a nice day
- see you around
- bye bye
- see you later

## intent:recommend_session
- What presentation would you recommend to [data scientists](relevant_audience)?
- Which talks are relevant to people in [Machine Learning](relevant_audience) field?
- I work as a [product manager](relevant_audience). What sessions would you recommend for me to attend today?
- Are the any talks you could recommend to [machine learning](relevant_audience) folks to attend tomorrow?
- Which talks today are relevant to [developers](relevant_audience)?

## intent:speaker
- Who is the speaker?
- And who's presenting?
- What's the name of the presenter?
- Who's presenting?
- Who's speaking?
- The name of the speaker?

## intent:length
- How long is the session?
- And what's the length of this?
- How long is this session?
- Can you tell me how long this session is?
- Is the session long?

## intent:abstract
- Show me the abstract
- Can you give me more details about this talk?
- Is there a description of this presetnation?
- Can you show me an abstract of this talk?
- Show me the abstract, please
- Can you show me the summary of the talk?
- What this talk will be about?

## intent:thanks
- Thank you.
- very useful. thank you so much!
- Thanks
- thanks a lot
- thank you so much

## intent:inform
- to [Data Scientists](relevant_audience)
- relevant to [machine learning engineers](relevant_audience)
- for [product](relevant_audience) people

## intent:out-of-scope
- I want pizza
- please help with my ice cream it's dripping
- no wait go back i want a dripping ice cream but a cone that catches it so you can drink the ice cream later
- i want a non dripping ice cream
- hey little mama let em whisper in your ear
- someone call the police i think the bot died
- show me a picture of a chicken
- neither
- I want french cuisine
- i am hungry
- restaurants
- restaurant
- you're a loser lmao
- can i be shown a gluten free restaurant
- i don't care!!!!
- i do not care how are you
- again?
- oh wait i gave you my work email address can i change it?
- hang on let me find it
- stop it, i do not care!!!
- really? you're so touchy?
- how come?
- I changed my mind
- what?
- did i break you

"""

%store nlu_md > nlu.md

Writing 'nlu_md' (str) to file 'nlu.md'.


Let's retrain the model and see how it deals with out-of-scope inputs now:

In [14]:
interpreter, model_directory = train_nlu_model()

In [21]:
input_message = "I want pizza"
pprint(interpreter.parse(input_message))

### Step 6: Adding synonyms
Synonyms are a very useful Rasa NLU feature which maps extracted entities to the same name. It's used when some extracted values have to be normalised so that they could be used to query the database or make an API call. In our example, the occupation of the relevant audience is a good candidate for the synonym because users can provide the same occupation in a variety of different ways (for example, Machine Learning and ML). Let's update our training examples with synonyms.

In [22]:
nlu_md = """

## intent:greet
- hey
- hello there
- hi
- hello there
- good morning
- good evening
- moin
- hey there
- let's go
- hey dude
- goodmorning
- goodevening
- good afternoon

## intent:goodbye
- cu
- good by
- cee you later
- good night
- good afternoon
- bye
- goodbye
- have a nice day
- see you around
- bye bye
- see you later

## intent:recommend_session
- What presentation would you recommend to [data scientists](relevant_audience)?
- Which talks are relevant to people in [Machine Learning](relevant_audience:ML) field?
- I work as a [product manager](relevant_audience). What sessions would you recommend for me to attend today?
- Are the any talks you could recommend to [machine learning](relevant_audience:ML) folks to attend tomorrow?
- Which talks today are relevant to [developers](relevant_audience)?

## intent:speaker
- Who is the speaker?
- And who's presenting?
- What's the name of the presenter?
- Who's presenting?
- Who's speaking?
- The name of the speaker?

## intent:length
- How long is the session?
- And what's the length of this?
- How long is this session?
- Can you tell me how long this session is?
- Is the session long?

## intent:abstract
- Show me the abstract
- Can you give me more details about this talk?
- Is there a description of this presetnation?
- Can you show me an abstract of this talk?
- Show me the abstract, please
- Can you show me the summary of the talk?
- What this talk will be about?

## intent:thanks
- Thank you.
- very useful. thank you so much!
- Thanks
- thanks a lot
- thank you so much

## intent:inform
- to [Data Scientists](relevant_audience)
- relevant to [machine learning engineers](relevant_audience:ML)
- for [product](relevant_audience) people


## intent:out-of-scope
- I want pizza
- please help with my ice cream it's dripping
- no wait go back i want a dripping ice cream but a cone that catches it so you can drink the ice cream later
- i want a non dripping ice cream
- hey little mama let em whisper in your ear
- someone call the police i think the bot died
- show me a picture of a chicken
- neither
- I want french cuisine
- i am hungry
- restaurants
- restaurant
- you're a loser lmao
- can i be shown a gluten free restaurant
- i don't care!!!!
- i do not care how are you
- again?
- oh wait i gave you my work email address can i change it?
- hang on let me find it
- stop it, i do not care!!!
- really? you're so touchy?
- how come?
- I changed my mind
- what?
- did i break you
"""

%store nlu_md > nlu.md

Writing 'nlu_md' (str) to file 'nlu.md'.


To train the NLU model with synonyms, we have to add the synonyms component to the model pipeline

In [23]:
configuration = """
language: "en"

pipeline:
- name: "WhitespaceTokenizer"       # splits the sentence into tokens          
- name: "CRFEntityExtractor"                   # uses the pretrained spacy NER model
- name: "CountVectorsFeaturizer"     # transform the sentence into a vector representation
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"   # intent classifier
""" 

%store configuration > config.yml

Writing 'configuration' (str) to file 'config.yml'.


Now, let's retrain the NLU model and test the performace.


In [24]:
interpreter, model_directory = train_nlu_model()

INFO:rasa.nlu.model:Starting to train component WhitespaceTokenizer
INFO:rasa.nlu.model:Finished training component.
INFO:rasa.nlu.model:Starting to train component CRFEntityExtractor
INFO:rasa.nlu.model:Finished training component.
INFO:rasa.nlu.model:Starting to train component CountVectorsFeaturizer
INFO:rasa.nlu.model:Finished training component.
INFO:rasa.nlu.model:Starting to train component EntitySynonymMapper
INFO:rasa.nlu.model:Finished training component.
INFO:rasa.nlu.model:Starting to train component EmbeddingIntentClassifier
Epochs: 100%|█████████████████████████████████████████████████| 300/300 [00:09<00:00, 31.20it/s, loss=0.540, acc=1.000]
INFO:rasa.utils.train_utils:Finished training embedding policy, train loss=0.540, train accuracy=1.000
INFO:rasa.nlu.model:Finished training component.
INFO:rasa.nlu.model:Successfully saved model into 'C:\Users\Salome\Desktop\desktopDocs\test-project\models\current\nlu'



See how 'machine learning engineers' now gets mapped to 'ML':

In [26]:
input_message = "For machine learning engineers"
pprint(interpreter.parse(input_message))

### Step 7: Implementing multi-intents
The NLU model we have built so far works pretty well, but it only supports inputs with only one intent per user input. In this step, we will use a tensorflow embedding model to enable the assistant to recognise multi-intents - more than one intention per user input. Let's start by defining multi-intents in our training data. Multi-intents are defined in a very similar way as regular intents, the only difference is that the label names consists of intent tokens and a character of your choice that separates them, for example intent_token1+intent_token2.

In [27]:
nlu_md = """

## intent:greet
- hey
- hello there
- hi
- hello there
- good morning
- good evening
- moin
- hey there
- let's go
- hey dude
- goodmorning
- goodevening
- good afternoon

## intent:goodbye
- cu
- good by
- cee you later
- good night
- good afternoon
- bye
- goodbye
- have a nice day
- see you around
- bye bye
- see you later

## intent:recommend_session
- What presentation would you recommend to [data scientists](relevant_audience)?
- Which talks are relevant to people in [Machine Learning](relevant_audience:ML) field?
- I work as a [product manager](relevant_audience). What sessions would you recommend for me to attend today?
- Are the any talks you could recommend to [machine learning](relevant_audience:ML) folks to attend tomorrow?
- Which talks today are relevant to [developers](relevant_audience)?

## intent:speaker
- Who is the speaker?
- And who's presenting?
- What's the name of the presenter?
- Who's presenting?
- Who's speaking?
- The name of the speaker?

## intent:length
- How long is the session?
- And what's the length of this?
- How long is this session?
- Can you tell me how long this session is?
- Is the session long?

## intent:abstract
- Show me the abstract
- Can you give me more details about this talk?
- Is there a description of this presetnation?
- Can you show me an abstract of this talk?
- Show me the abstract, please
- Can you show me the summary of the talk?
- What this talk will be about?

## intent:thanks
- Thank you.
- very useful. thank you so much!
- Thanks
- thanks a lot
- thank you so much

## intent:inform
- to [Data Scientists](relevant_audience)
- relevant to [machine learning engineers](relevant_audience:ML)
- for [product](relevant_audience) people


## intent:out-of-scope
- I want pizza
- please help with my ice cream it's dripping
- no wait go back i want a dripping ice cream but a cone that catches it so you can drink the ice cream later
- i want a non dripping ice cream
- hey little mama let em whisper in your ear
- someone call the police i think the bot died
- show me a picture of a chicken
- neither
- I want french cuisine
- i am hungry
- restaurants
- restaurant
- you're a loser lmao
- can i be shown a gluten free restaurant
- i don't care!!!!
- i do not care how are you
- again?
- oh wait i gave you my work email address can i change it?
- hang on let me find it
- stop it, i do not care!!!
- really? you're so touchy?
- how come?
- I changed my mind
- what?
- did i break you


## intent:speaker+length
 - Who is the presenter? Also, how long is the talk?
 - Who is the speaker and how long is the session?
 - Is the session long and who is presenting?
 - Do you know who is the presenter of the session? And how long is the session?
 - Is the talk long and who is presenting?
 - Who is the speaker? And how long is the talk?
"""

%store nlu_md > nlu.md

Writing 'nlu_md' (str) to file 'nlu.md'.


Next, let's modify the configuration of the model pipeline to use the tensorflow_embedding model with multi-intent support.

In [41]:
configuration = """
language: "en"

pipeline:
- name: "WhitespaceTokenizer"       # splits the sentence into tokens
  intent_split_symbol: "+"            #sets the delimiter string to split the intent label
- name: "CRFEntityExtractor"                   # uses the pretrained spacy NER model
- name: "CountVectorsFeaturizer"     # transform the sentence into a vector representation
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"   # intent classifier
  
""" 

%store configuration > config.yml

Writing 'configuration' (str) to file 'config.yml'.


Let's retrain the model with the new pipeline and test the performance:

In [44]:
interpreter, model_directory = train_nlu_model()

See how a two-question input now gets recognised as a multi-intent:

In [45]:
input_message = "Who is the speakers and how long is the session?"
pprint(interpreter.parse(input_message))

Congratulations! You have just implemented the natural language understanding part of your assistant which means that your assistant can now understand you. In the next part, we will explore more about Rasa NLU