# 10/8 Notebook - Customer Support Chatbot (Part A)

Hello and welcome to this week's notebook! Today, we'll be looking at how to create our own, customizable chat bot

Below are the methods you need to complete for the notebook:
1. asd
2. asd
3. asd

We'll start by importing our libraries as always. Make sure you run the cell with `pip install nltk`, which will let you download the `nltk` library we'll be using

In [1]:
pip install nltk

Note: you may need to restart the kernel to use updated packages.


In [2]:
# import our nltk libraries
import nltk
from nltk.stem import WordNetLemmatizer
# install specific downloads
nltk.download('punkt', quiet = True)
nltk.download('wordnet', quiet = True)

True

In [3]:
# other useful libraries (numpy == 🐐)
import numpy as np
import random
import json

## Part 1: Modify your intents

The great part about this chat bot is that it is fully customizable! Edit `intents.json` to your liking to create your own bot. Make sure that for each `intent`, you fill out the fields `tag`, `patterns`, and `responses`

You can look at my file, `taco-bell-intents.json`, for reference

Once you're done, you can continue to run the cells below!

**Note: if you're having JSON formatting issues in the next cell, use [this link](https://jsonlint.com) to validate your JSON**

In [4]:
data_file = open("intents.json").read()
intents = json.loads(data_file)
# when you print, you should see your JSON
print(intents)

{'intents': [{'tag': 'greeting', 'patterns': ['Hi there', 'How are you', 'Is anyone there?', 'Hey', 'Hola', 'Hello', 'Good day'], 'responses': ['Hello, thanks for asking', 'Good to see you again', 'Hi there, how can I help?']}, {'tag': 'goodbye', 'patterns': ['Bye', 'See you later', 'Goodbye', 'Nice chatting to you, bye', 'Till next time'], 'responses': ['See you!', 'Have a nice day', 'Bye! Come back again soon.']}, {'tag': 'thanks', 'patterns': ['Thanks', 'Thank you', "That's helpful", 'Awesome, thanks', 'Thanks for helping me'], 'responses': ['Happy to help!', 'Any time!', 'My pleasure']}, {'tag': 'noanswer', 'patterns': [], 'responses': ["Sorry, can't understand you", 'Please give me more info', 'Not sure I understand']}, {'tag': 'options', 'patterns': ['How you could help me?', 'What you can do?', 'What help you provide?', 'How you can be helpful?', 'What support is offered'], 'responses': ["I can direct you to your nearest Taco Bell, send you contact information, give you a recomm

## Part 2: Parsing the JSON

We'll practice a common first step in any NLP project, data cleaning

First, complete the function `process_words()` which will clean up our words according to the following steps:
1. Get the tokens using `nltk.word_tokenize()`
2. Set `cleaned_word` equal to the `lemmatized` and `lowercased` word

**Note: Make sure you run the cell immediately below this first; it stores values needed in `process_words()`**

In [5]:
# declare needed variables for process_words()
ignore_punctuation = ["?", "!", ".", ","]
lemmatizer = WordNetLemmatizer()

In [6]:
def process_words(pattern):
    # return variable
    words = []
    # get the tokens using nltk
    tokens = nltk.word_tokenize(pattern)
    for word in tokens:
        # check if the word should be ignored
        if word not in ignore_punctuation and word.isalnum():
            # clean the word and add it to the list
            cleaned_word = lemmatizer.lemmatize(word.lower())
            words.append(cleaned_word)
    # return the list
    return words

In [7]:
# ADD CELL FOR TESTING

Now that we have `process_words()` to clean our words, we can parse the data from our JSON

Complete the method `parse_intens()` which parses our JSON according to the following steps:
1. Set the value of `tag` from our `intent`
2. Set `tokenized_words` using the helper method in `process_words()`
3. Append a tuple of `tokenized_words` and `tag` to `tag_tokens`

In [8]:
def parse_intents(intents):
    # declare our needed variables
    tags = []
    all_words = []
    tag_tokens = []
    # iterate through each intent
    for intent in intents["intents"]:
        # if the intent has no patterns, we can skip
        if (len(intent["patterns"]) == 0):
            continue
        # add the tag to the list of tag
        tag = intent["tag"]
        tags.append(tag)
        # iterate through each pattern
        for pattern in intent["patterns"]:
            # create our tokenized words
            tokenized_words = process_words(pattern)
            # add all the tokenized words to our words
            all_words.extend(tokenized_words)
            # adds a tuple -> (list of tokens, tag) -> to the list
            tag_tokens.append((tokenized_words, tag))
    # return our values in a tuple
    return (np.array(tags), np.array(all_words), np.array(tag_tokens))

We can do this cool trick below to remove all duplicates from our arrays (and sort them)

In [9]:
tags, all_words, tag_tokens = parse_intents(intents)

tags = np.array(sorted(list(set(tags))))
all_words = np.array(sorted(list(set(all_words))))

Run the cell below and take a quick look to make sure that everything makes sense

In [10]:
print("Tags: {0}".format(tags))
print("------")
print("All Words: {0}".format(all_words))
print("------")
print("Tag-Token Mappings {0}".format(tag_tokens))

Tags: ['contact' 'deals' 'directions' 'fact' 'goodbye' 'greeting' 'options'
 'recommendation' 'thanks']
------
All Words: ['a' 'any' 'anyone' 'anything' 'are' 'awesome' 'be' 'best' 'bye' 'call'
 'can' 'chatting' 'contact' 'could' 'daily' 'day' 'deal' 'direction'
 'discount' 'do' 'eat' 'fact' 'find' 'for' 'fun' 'get' 'give' 'good'
 'goodbye' 'have' 'hello' 'help' 'helpful' 'helping' 'hey' 'hi' 'hola'
 'how' 'i' 'information' 'is' 'item' 'later' 'located' 'location' 'me'
 'menu' 'new' 'next' 'nice' 'number' 'of' 'offered' 'on' 'phone' 'provide'
 'recommendation' 'see' 'should' 'something' 'special' 'support' 'tell'
 'thank' 'thanks' 'that' 'the' 'there' 'till' 'time' 'to' 'today' 'what'
 'where' 'you' 'your']
------
Tag-Token Mappings [[list(['hi', 'there']) 'greeting']
 [list(['how', 'are', 'you']) 'greeting']
 [list(['is', 'anyone', 'there']) 'greeting']
 [list(['hey']) 'greeting']
 [list(['hola']) 'greeting']
 [list(['hello']) 'greeting']
 [list(['good', 'day']) 'greeting']
 [list(['b

## Part 3: Creating a Training Set

We know from previous lessons that the computer can't train a model without numeric values. To solve this, we'll use the `bag of words` technique we discussed in the Google Sheets

Complete the method `build_training_set()` below, which performs the following steps:
1. asd
2. asd
3. asd

In [23]:
def build_training_set(tags, all_words, tag_tokens):
    # define our variables to return
    train_x = []
    train_y = []
        
    # iterate through each tag-token mapping
    for tag_token in tag_tokens:
        
        # grab our needed values
        tokens = tag_token[0]
        tag = tag_token[1]
        
        # reset our current bag
        current_bag = []
    
        for word in all_words:
            # add 0/1 if the word is in our token
            in_token = (word in tokens)
            current_bag.append(1 * in_token)
            
        # update our training inputs
        train_x.append(current_bag)
        
        # set our outputs equal to 1 in the locatio
        train_y.append(1 * (tags == tag))
    
    # return our values
    return (np.array(train_x), np.array(train_y))

In [24]:
train_x, train_y = build_training_set(tags, all_words, tag_tokens)