# Jarvis the Chatbot

### Steps:
### 1. Import corpus-- corpus data: training data for chatbot to learn
### 2. Preprocess the data ---clean the data
### 3. Text case handling ---convert data to one case
### 4. Tokenization ---convert sentences into words
### 5. Stemming ---finding similarities between words i.e get the root word
### 6. Bag of words ---convert words to numbers by generating vector encoding
### 7. One hot encoding ---pass the vectors to the ml algo


In [1]:
#import libraries
import numpy as np
import nltk
import string
import random

In [2]:
f=open('chatbot_train_data.txt','r', errors='ignore')
file=f.read()
file=file.lower()
nltk.download('punkt') #inbuilt tokenizer named punkt
nltk.download('wordnet') #use wordnet dictionary
sent_tokens=nltk.sent_tokenize(file) #converts file to sentences
word_tokens=nltk.word_tokenize(file) #converts file to words

[nltk_data] Downloading package punkt to C:\Users\Sharvari
[nltk_data]     Pradhan\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Sharvari
[nltk_data]     Pradhan\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [3]:
sent_tokens[:3] #first 3 sentences

['small talk\nfrom wikipedia, the free encyclopedia\njump to navigationjump to search\nthis article is about the type of discourse.',
 'for other uses, see small talk (disambiguation).',
 '"chit chat" redirects here.']

In [4]:
word_tokens[:5] #printing first 5 words

['small', 'talk', 'from', 'wikipedia', ',']

In [5]:
#preprocess data
#wordnet dictionary included with nltk library. helps to remove punctuations

lemmer=nltk.stem.WordNetLemmatizer()

def LemTokens(tokens):
    return [lemmer.lemmatize(i) for i in tokens]

remove_punctuations=dict((ord(punct),None) for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punctuations)))

In [6]:
greet_inputs=['hey','hi','sup','whats up','hello']
greet_response=['hey','hi','sup','whats up','hello','hi. how are you', 'hi nice to meet you']

def greet(greeting):
    for word in greeting.split():
        if word.lower() in greet_inputs:
            return random.choice(greet_response)


In [7]:
#response generation
#tfidf--term frequency inverse document frequency--how many times a word is repeated. inverse doc-how rare is the word
#cosine similarity--gives a normalized output and tells us how rare it is

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [9]:
def response(user_response):
    robo_response=''
    vector=TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf=vector.fit_transform(sent_tokens)
    vals=cosine_similarity(tfidf[-1],tfidf)
    idx=vals.argsort()[0][-2]
    flat=vals.flatten()
    flat.sort()
    req=flat[-2]
    
    if req==0:
        robo_response=robo_response+"I dont understand this" #invalid response
        return robo_response
    else:
        robo_response=robo_response+sent_tokens[idx]
        return robo_response

In [10]:
flag=True
print('Hi, I am Jarvis. How can I help you today? If you wish to end the chat then just say bye.')
while(flag==True): #keep running the bot untill user ends it or there is a pause of more than 1 min
    user_response=input()
    user_response=user_response.lower()
    if user_response!='bye':
        if user_response=='thanks' or user_response=='thank you':
            flag=False
            print('Jarvis: You are welcome')
        else:
            if greet(user_response) != None:
                print('Jarvis: '+greet(user_response))
            else:
                sent_tokens.append(user_response)
                word_tokens=word_tokens+nltk.word_tokenize(user_response)
                final_words=list(set(word_tokens))
                print('Jarvis: ',end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print('Jarvis: Goodbye! Take care!')
    

Hi, I am Jarvis. How can I help you today? If you wish to end the chat then just say bye.
hi
Jarvis: hi nice to meet you
small talk
Jarvis: 



for other uses, see small talk (disambiguation).
references
Jarvis: [26] [27]

see also
active listening
cheap talk (game theory)
contact call
sociolinguistics
transactional analysis
phatic expression
tritsch-tratsch-polka by johann strauss ii, from the german for "chit-chat"
references
 "dummies - learning made easy".
bye
Jarvis: Goodbye! Take care!
