## Understanding intents and entities

## 1. Understanding intents and entities

This chapter is all about the topic of NLU, or natural language understanding. NLU is a subfield of natural language processing, NLP, and is usually concerned with converting freeform text into structured data within a particular domain.

## 2. An example

For example, a restaurant booking bot should be able to understand a sentence like "I'm looking for a Mexican restaurant in the centre of town" and query a database or an API to find matching results. To do this, we need to identify the intent of the message, and extract a set of relevant entities.

## 3. Intents

An intent is a broad description of what a person is trying to say. For example, "hello", "hi", and "yoyoyo" are all ways that people might `greet` your bot. The example sentence I just mentioned could sensibly be described with the intent `restaurant_search`. There are many different ways someone might express this intent, for example: - I'm hungry - Show me good pizza spots - I want to take my boyfriend out for sushi Now, there is no universally correct way to assign intents to sentences . The 'correct' answer depends on your application. For example, if you expand your bot's capabilities so that it can actually book a table for you, the final sentence "I want to take my boyfriend out for sushi" might better be described as a `request_booking` intent than as a restaurant search.

## 4. Entities

The second part of the NLU problem is to extract `entities` from the text. In the restaurant search example, this means correctly identifying june tenth as a date, `sushi` as a cuisine type and `new york city` as a location. A well-studied problem in NLP is "Named Entity Recognition". This is almost exactly the same problem we are describing here, with the difference that NER usually aims to find 'universal' entities like the names of people, organizations, dates, etc. In the case of bots, you often want a narrower definition of your entities that are specific to your domain.

## 5. Regular expressions to recognize intents

In the next couple of exercises, you will build regular expressions for recognizing intents and entities. This is much simpler than the machine learning approaches we'll use in later parts of the chapter, and is highly computationally efficient. The main drawback is that writing and debugging regular expressions becomes really hard as your chatbot becomes more sophisticated.

## 6. Using regular expressions

We will use regex to look for keywords in text. We can build expressions which match any one of a set of keywords by using the pipe '|' operator. This corresponds to the logical operation OR. Remember that we can check if a string matched a pattern by checking if the returned match object is None. For example, to look for the keywords "hello", "hey", or "hi" we can write "hello|hey|hi". Notice, however, that this is just a string of characters, so "hi" will also match the words "which", "this", etc.

## 7. Using regular expressions

We can add the word boundary expression "\b" at the start and end to indicate that there 
shouldn't be any alphanumeric characters on either side of our keyword. Notice that we've put an 'r' before the start of the string. This creates a so-called raw string, which means that we can include special characters like the backslash without clashing with default python string behavior.

## 8. Using regex for entity recognition

If we're going to use a pattern multiple times, we can create a pattern object using the `re.compile` method. The pattern we've defined here uses some new syntax. Square brackets indicate a range of characters. As before, the asterisk means "0 or more of occurrences of this pattern", so the final part of the expression means "0 or more lower case letters". The first part of the pattern matches exactly one upper case letter. So this pattern will match any capitalized word. The findall method of the pattern object conveniently extracts all the matching substrings, so to find all the capitalized words in a sentence, we can run pattern.findall, passing the sentence as an argument.

## 9. Let's practice!

Now it's your turn to write some regular expressions, and use them to get intents and entities from the messages your bot receives.


In [3]:
import re 
op_1 =  re.search(r"(hello|hi|hey)", "hey there!") is not None
op_2 =  re.search(r"(hello|hi|hey)", "hey there!")
print(op_1)
print(op_2)

True
<re.Match object; span=(0, 3), match='hey'>


### Intent classification with regex I

You'll begin by implementing a very simple technique to recognize intents - looking for the presence of keywords.

A dictionary, keywords, has already been defined. It has the intents "greet", "goodbye", and "thankyou" as keys, and lists of keywords as the corresponding values. For example, keywords["greet"] is set to "["hello","hi","hey"].

Also defined is a second dictionary, responses, indicating how the bot should respond to each of these intents. It also has a default response with the key "default".

The function send_message(), along with the bot and user templates, have also already been defined. Your job in this exercise is to create a dictionary with the intents as keys and regex objects as values.

### Instructions

Iterate over the keywords dictionary, using intent and keys as your iterator variables.

Use '|'.join(keys) to create regular expressions to match at least one of the keywords and pass it to re.compile() to compile the regular expressions into pattern objects. Store the result as the value of the patterns dictionary.

In [4]:
# Define a dictionary of patterns
patterns = {}
keywords =  {'greet': ['hello', 'hi', 'hey'], 'goodbye': ['bye', 'farewell'], 'thankyou': ['thank', 'thx']}
# Iterate over the keywords dictionary
for intent, keys in keywords.items():
    # Create regular expressions and compile them into pattern objects
    patterns[intent] = re.compile('|'.join(keys))

# Print the patterns
print(patterns)

{'greet': re.compile('hello|hi|hey'), 'goodbye': re.compile('bye|farewell'), 'thankyou': re.compile('thank|thx')}


### Intent classification with regex II

With your patterns dictionary created, it's now time to define a function to find the intent of a message.

### Instructions

Iterate over the intents and patterns in the patterns dictionary using its .items() method.

Use the .search() method of pattern to look for keywords in the message.

If there is a match, return the corresponding intent.

Call your match_intent() function inside respond() with message as the argument and then hit 'Submit Answer' to see how the bot responds to the provided messages.

In [None]:
# Define a function to find the intent of a message
def match_intent(message):
    matched_intent = None
    for intent, pattern in patterns.items():
        # Check if the pattern occurs in the message 
        if pattern.search(message):
            matched_intent = intent
    return matched_intent

# Define a respond function
def respond(message):
    # Call the match_intent function
    intent = match_intent(message)
    # Fall back to the default response
    key = "default"
    if intent in responses:
        key = intent
    return responses[key]

# Send messages
send_message("hello!")
send_message("bye byeee")
send_message("thanks very much!")

### Entity extraction with regex
Now you'll use another simple method, this time for finding a person's name in a sentence, such as "hello, my name is David Copperfield".

You'll look for the keywords "name" or "call(ed)", and find capitalized words using regex and assume those are names. Your job in this exercise is to define a find_name() function to do this.

### Instructions

Use re.compile() to create a pattern for checking if "name" or "call" keywords occur.

Create a pattern for finding capitalized words.

Use the .findall() method on name_pattern to retrieve all matching words in message.

Call your find_name() function inside respond() and then hit 'Submit Answer' to see how the bot responds to the provided messages.

Take Hint (-30 XP)