# Building intelligent bots. Solution


You can find the solutions for **Building intelligent bots** O'Reilly training. The solutions are divided by sections: rule-based, retrieval-based and generative-based.

## Rule-based: Levenshtein distance

In [1]:
welcome = "Hi! I'm Arthur, the customer support chatbot. How can I help you?"

questions = (
    "The app if freezing after I click run button",
    "I don't know how to proceed with the invoice",
    "I get an error when I try to install the app",
    "It crash after I have updated it",
    "I cannot login in to the app",
    "I'm not able to download it"
            )

answers = (
        "You need to clean up the cache. Please go to ...",
        "Please go to Setting, next Subscriptions and there is the Billing section",
        "Could you plese send the log files placed in ... to ...",
        "Please restart your PC",
        "Use the forgot password button to setup a new password",
        "Probably you have an ad blocker plugin installed and it blocks the popup with the download link"
            )

In [4]:
import jellyfish

distance_threshold = 0.3

def levenstein_distance(sentence1,sentence2):
    distance = jellyfish.levenshtein_distance(sentence1,sentence2)
    normalized_distance = distance/max(len(sentence1),len(sentence2))
    return 1.0-normalized_distance

def get_highest_similarity(customer_question):
    max_distance = 0
    highest_prob_index = 0
    for question_id in range(len(questions)):
        distance = levenstein_distance(customer_question,questions[question_id])

        if distance > max_distance:
            highest_index = question_id
            max_distance = distance
    if max_distance > distance_threshold:
        return answers[highest_index]
    else:
        return "The issues has been saved. We will contact you soon."

In [5]:
def run_chatbot():
    print(welcome)
    question = ""
    while question != "thank you":
        question = input()
        answer = get_highest_similarity(question)
        print(answer)
    
run_chatbot()

Hi! I'm Arthur, the customer support chatbot. How can I help you?
cannot login
Use the forgot password button to setup a new password


KeyboardInterrupt: 

## Rule-based: bm25 ranking

In [14]:
import sqlite3, csv, re

conn = sqlite3.connect('rules.sqlite')

def db_setup():    
    conn.execute("CREATE VIRTUAL TABLE rules USING fts5(question,answer);")    
    cur = conn.cursor()
 
    cur.execute('INSERT INTO rules(question,answer) VALUES("The app if freezing after I click run button","You need to clean up the cache. Please go to ...");')
    cur.execute('INSERT INTO rules(question,answer) VALUES("I don t know how to proceed with the invoice","Please go to Setting, next Subscriptions and there is the Billing section");')
    cur.execute('INSERT INTO rules(question,answer) VALUES("I get an error when I try to install the app","Could you plese send the log files placed in ... to ...");')
    cur.execute('INSERT INTO rules(question,answer) VALUES("It crash after I have updated it","Please restart your PC");')
    cur.execute('INSERT INTO rules(question,answer) VALUES("I cannot login in to the app","Use the forgot password button to setup a new password");')                    
    cur.execute('INSERT INTO rules(question,answer) VALUES("I m not able to download it","Probably you have an ad blocker plugin installed and it blocks the popup with the download link");')                    
    
    conn.commit()

db_setup()    

In [35]:
def bm25(question):
    cur = conn.cursor()
    query = cur.execute("SELECT answer, bm25(rules) FROM rules WHERE rules MATCH 'question: "+str(question)+"' ORDER BY bm25(rules) LIMIT 0,1;")
    return cur.fetchall()

def get_highest_similarity(customer_question):
    max_distance = 0
    highest_prob_index = 0
    return bm25(customer_question)

In [36]:
def run_chatbot():
    print(welcome)
    question = ""
    while question != "thank you":
        question = input()
        answer = get_highest_similarity(question)
        print(answer)
    
run_chatbot()

Hi! I'm Arthur, the customer support chatbot. How can I help you?
cannot login
[('Use the forgot password button to setup a new password', -2.7155014368322457)]


KeyboardInterrupt: 

## Retrieval-based: Rasa

In [37]:
anna_common_examples = """
{
  "rasa_nlu_data": {
    "entity_synonyms": [
      {
        "value": "candidate",
        "synonyms": ["developer", "data scientist"]
      },
      {
        "value": "failed",
        "synonyms": ["failed", "decline","went badly"]      
      },
      {
        "value": "passed",
        "synonyms": ["went well", "passed","excellent"]
      }      
    ],
    "common_examples": [
      {
        "text": "the candidate passed the interview",
        "intent": "change_status",
        "entities": [
            {
      "start": 17,
      "end": 22,
      "value": "passed",
      "entity": "passed"
        }
        ]
      },     
      {
        "text": "the candidate is excellent",
        "intent": "change_status",
        "entities": [
            {
      "start": 19,
      "end": 28,
      "value": "excellent",
      "entity": "passed"
        }
        ]
      },    
      {
        "text": "the interview went well",
        "intent": "change_status",
        "entities": [
            {
      "start": 15,
      "end": 25,
      "value": "went well",
      "entity": "passed"
        }
        ]
      },       
      {
        "text": "the interview went badly",
        "intent": "change_status",
        "entities": [
            {
      "start": 15,
      "end": 25,
      "value": "went badly",
      "entity": "failed"
        }
        ]
      },
      {
        "text": "the candidate failed",
        "intent": "change_status",
        "entities": [
            {
      "start": 15,
      "end": 21,
      "value": "failed",
      "entity": "failed"
        }
        ]
      }, 
      {
        "text": "we need to decline this candidate",
        "intent": "change_status",
        "entities": [
            {
      "start": 12,
      "end": 19,
      "value": "decline",
      "entity": "failed"
        }
        ]
      },      
      {
        "text": "hey", 
        "intent": "greet", 
        "entities": []
      }, 
      {
        "text": "howdy", 
        "intent": "greet", 
        "entities": []
      }, 
      {
        "text": "hey there",
        "intent": "greet", 
        "entities": []
      }, 
      {
        "text": "hello", 
        "intent": "greet", 
        "entities": []
      }, 
      {
        "text": "hi", 
        "intent": "greet", 
        "entities": []
      },
      {
        "text": "good morning",
        "intent": "greet",
        "entities": []
      },
      {
        "text": "good evening",
        "intent": "greet",
        "entities": []
      },
      {
        "text": "dear sir",
        "intent": "greet",
        "entities": []
      },
      {
        "text": "yes", 
        "intent": "affirm", 
        "entities": []
      }, 
      {
        "text": "yep", 
        "intent": "affirm", 
        "entities": []
      }, 
      {
        "text": "yeah", 
        "intent": "affirm", 
        "entities": []
      },
      {
        "text": "indeed",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "that's right",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "ok",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "great",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "right, thank you",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "add candidate",
        "intent": "candidate_add",
        "entities": []
      }, 
      {
        "text": "add candidate",
        "intent": "candidate_add",
        "entities": [
            {
      "start": 5,
      "end": 13,
      "value": "candidate",
      "entity": "candidate"
        }
        ]
      },         
      {
        "text": "adding candidate",
        "intent": "candidate_add",
        "entities": [
            {
              "start": 8,
      "end": 16,
      "value": "candidate",
      "entity": "candidate"
        }        
        ]
      },
      {
        "text": "please add candidate",
        "intent": "candidate_add",
        "entities": []
      },              
      {
        "text": "please add new candidate",
        "intent": "candidate_add",
        "entities": []
      },           
      {
        "text": "we have new prescreening upcoming",
        "intent": "candidate_add",
        "entities": []
      }, 
      {
        "text": "we have a new candidate for prescreening",
        "intent": "candidate_add",
        "entities": []
      },         
      {
        "text": "correct",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "great choice",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "sounds really good",
        "intent": "affirm",
        "entities": []
      },
      {
        "text": "bye", 
        "intent": "goodbye", 
        "entities": []
      }, 
      {
        "text": "goodbye", 
        "intent": "goodbye", 
        "entities": []
      }, 
      {
        "text": "good bye", 
        "intent": "goodbye", 
        "entities": []
      }, 
      {
        "text": "stop", 
        "intent": "goodbye", 
        "entities": []
      }, 
      {
        "text": "end", 
        "intent": "goodbye", 
        "entities": []
      },
      {
        "text": "farewell",
        "intent": "goodbye",
        "entities": []
      },
      {
        "text": "Bye bye",
        "intent": "goodbye",
        "entities": []
      },
      {
        "text": "have a good one",
        "intent": "goodbye",
        "entities": []
      }
    ]
  }
}
"""

training_data = open("anna_new.json", "w")
training_data.write(anna_common_examples)
training_data.close()

In [38]:
from rasa_nlu.converters import load_data
from rasa_nlu.config import RasaNLUConfig
from rasa_nlu.model import Trainer

training_data = load_data('anna_new.json')
trainer = Trainer(RasaNLUConfig("config.json"))
trainer.train(training_data)
model_directory = trainer.persist('.')

Fitting 2 folds for each of 6 candidates, totalling 12 fits


[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.1s finished


In [39]:
from rasa_nlu.model import Metadata, Interpreter

interpreter = Interpreter.load(model_directory, RasaNLUConfig("config.json"))

interpreter.parse(u"he failed the interview")

  if diff:


{'entities': [],
 'intent': {'confidence': 0.5677957503564057, 'name': 'change_status'},
 'intent_ranking': [{'confidence': 0.5677957503564057,
   'name': 'change_status'},
  {'confidence': 0.15117949530413521, 'name': 'affirm'},
  {'confidence': 0.1296622284996081, 'name': 'candidate_add'},
  {'confidence': 0.0912076159882022, 'name': 'greet'},
  {'confidence': 0.06015490985164846, 'name': 'goodbye'}],
 'text': 'he failed the interview'}

## Generative-based: n-gram model



In [40]:
from nltk.book import *

wall_street = text7.tokens

import re

tokens = wall_street

def cleanup():
    compiled_pattern = re.compile("^[a-zA-Z0-9.!?]")
    clean = list(filter(compiled_pattern.match,tokens))
    return clean
tokens = cleanup()

def build_ngrams():
    ngrams = []
    for i in range(len(tokens)-N+1):
        ngrams.append(tokens[i:i+N])
    #print(ngrams)
    return ngrams

def ngram_freqs(ngrams):
    counts = {}

    for ngram in ngrams:
        token_seq  = SEP.join(ngram[:-1])
        last_token = ngram[-1]

        if token_seq not in counts:
            counts[token_seq] = {}

        if last_token not in counts[token_seq]:
            counts[token_seq][last_token] = 0

        counts[token_seq][last_token] += 1;

    return counts;
#ngram_freqs(ngrams)

def next_word(text, N, counts):

    token_seq = SEP.join(text.split()[-(N-1):]);
    choices = counts[token_seq].items();

    total = sum(weight for choice, weight in choices)
    r = random.uniform(0, total)
    upto = 0
    for choice, weight in choices:
        upto += weight;
        if upto > r: return choice
    assert False # should not reach here


*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908


In [94]:
import random

def clean_generated(generated):
    sentences = generated.split('.')
    clean = ""
    for sentence in sentences:
        if len(sentence) > 0:
            clean += sentence[0].upper()
            clean += sentence[1:]
            clean += sentence[0:-1]+'.'
        elif len(sentence) == 1:
            clean += sentence        
    return clean
   

N=5

SEP=" "

sentence_count=5

ngrams = build_ngrams()

start_seq="Was named a nonexecutive"

counts = ngram_freqs(ngrams)

if start_seq is None: start_seq = random.choice(list(counts.keys()))
generated = start_seq.lower();

sentences = 0
while sentences < sentence_count:
    generated += SEP + next_word(generated, N, counts)
    sentences += 1 if generated.endswith(('.','!', '?')) else 0


print(clean_generated(generated))

Was named a nonexecutive director of this British industrial conglomerate was named a nonexecutive director of this British industrial conglomerate. A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago researchers reported 0  A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago researchers reported 0. The asbestos fiber crocidolite is unusually resilient once it enters the lungs with even brief exposures to it causing symptoms that show up decades later researchers said 0  The asbestos fiber crocidolite is unusually resilient once it enters the lungs with even brief exposures to it causing symptoms that show up decades later researchers said 0. Lorillard Inc Lorillard In. the unit of New York-based Exxon Corp the unit of New York-based Exxon Cor.
