# Building a Simple Chatbot from Scratch in Python (using NLTK)

Instruction: https://medium.com/analytics-vidhya/building-a-simple-chatbot-in-python-using-nltk-7c8c8215ac6e

In [3]:
import nltk
import numpy as np
import random
import string

In [10]:
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /Users/jeong-
[nltk_data]     ugim/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/jeong-
[nltk_data]     ugim/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [7]:
f=open('chatbot.txt', 'r', errors='ignore')
raw=f.read()
raw=raw.lower()

In [19]:
# converts to list of sentences
sent_tokens = nltk.sent_tokenize(raw)
# converts to list of words
word_tokens = nltk.word_tokenize(raw)

In [30]:
# pre-processing the raw text
lemmer = nltk.stem.WordNetLemmatizer()

def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

remove_punct_dict = dict((ord(punct), None) 
                         for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

In [31]:
# keyword matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup",
                   "what's up", "hey",)

GREETING_RESPOSES = ["hi", "hey", "*nods*", "hi there",
                     "hello", "I am glad! You are talking to me"]

def greeting(sentence):
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPOSES)

In [32]:
# Generating Response

# To convert a collection of raw documents to a matrix of TF-IDF features
from sklearn.feature_extraction.text import TfidfVectorizer
# Cosine similarity module
from sklearn.metrics.pairwise import cosine_similarity

# we will find the similarity between words entered by the user
# and the words in the corpus

def response(user_response):
    robo_response = ''
    sent_tokens.append(user_response)
    
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize,
                              stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx = vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    
    if (req_tfidf==0):
        robo_response = robo_response+"I am sorry! I don't understand you"
        return robo_response
    else:
        robo_response = robo_response+sent_tokens[idx]
        return robo_response


In [33]:
flag = True
print("ROBO: My name is Robo. I will answer your queries about Chatbots.\
      If you want to exit, type Bye!")

while(flag==True):
    user_response = input()
    user_response = user_response.lower()
    if (user_response!='bye'):
        if (user_response == 'thanks' or user_response == 'thank you'):
            flag=False
            print("ROBO: You are welcome..")
        else:
            if (greeting(user_response)!=None):
                print("ROBO: "+greeting(user_response))
            else:
                print("ROBO: ", end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("ROBO: Bye! taks care..")

ROBO: My name is Robo. I will answer your queries about Chatbots.      If you want to exit, type Bye!
What is ELIZA?
ROBO: what is eliza?
yes
ROBO: I am sorry! I don't understand you
Describe chatbot design?
ROBO: [3][4] chatbots can be classified into usage categories such as conversational commerce (e-commerce via chat), analytics, communication, customer support, design, developer tools, education, entertainment, finance, food, games, health, hr, marketing, news, personal, productivity, shopping, social, sports, travel and utilities.
Who was Alan Turing?
ROBO: [5]


contents
1	background
2	development
3	application
3.1	messaging apps
3.1.1	as part of company apps and websites
3.2	company internal platforms
3.3	toys
3.4	chatbots in medicine and for mental health
4	chatbot development platforms
5	malicious use
6	see also
7	citations
8	references
background[edit]
in 1950, alan turing's famous article "computing machinery and intelligence" was published,[6] which proposed what is now ca