<h1> Chat-bot using Sequence-to-Sequence in Keras

<img src = 'https://i.imgur.com/obcMO4p.jpg'>

<h3>What is a chat-bot?</h3>
<p>A chatbot is an artificial intelligence (AI) software that can simulate a conversation (or a chat) with a user in natural language through messaging applications, websites, mobile apps or through the telephone.</p>
<p>Why are chat-bots are important?</p>
<p>Chatbot applications streamline interactions between people and services, enhancing customer experience. At the same time, they offer companies new opportunities to improve the customers engagement process and operational efficiency by reducing the typical cost of customer service </p>

<h1>What is sequence-to-sequence learning?</h1>
<p>Sequence-to-sequence learning (Seq2Seq) is about training models to convert sequences from one domain (e.g. sentences in English) to sequences in another domain (e.g. the same sentences translated to French or any language depends upon data what we using), Here in this we using English-English conversations.</p>
<br>
<img src = 'https://i.imgur.com/MkjrFTR.png'>
<p> <u>Reference</u>: To know more about Sequence-Sequence <a href = 'https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html'>click here</a>

<h1>Data</h1>
<p>Here we using 2 data-sets</p>
1. Gunthercox dataset (English)
<a href = 'https://github.com/gunthercox'>click here to see the data source - 1</a> <br>
2. Cornell Movie Dialogs Corpus <a href = 'https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html'>click here to see the data source - 2 </a>

<h1>Approach</h1>
1. Loading the Data <br>
2. Extracting the Questions and answers from that data 1 & 2<br>
3. cleaning the Questions and Answers 1 & 2 <br>
4. Applying Encoder Decoder Model on Cleaned data <br>
5. Speaking with our BOT

In [1]:
from google.colab import drive 
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [2]:
import numpy as np
import pandas as pd
import pickle
import keras
import keras.utils
from keras.models import Model
from keras.layers import Input, LSTM, Dense
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding
import os
import yaml
import re

Using TensorFlow backend.


<h1>1.1 Loading Data - 1 </h1>

In [0]:
dir_path = '/content/drive/My Drive/chatbot_data/Data'
files_list = os.listdir(dir_path + os.sep)

In [4]:
files_list 

['botprofile.yml',
 'conversations.yml',
 'ai.yml',
 'computers.yml',
 'emotion.yml',
 'food.yml',
 'gossip.yml',
 'health.yml',
 'greetings.yml',
 'history.yml',
 'literature.yml',
 'movies.yml',
 'humor.yml',
 'money.yml',
 'politics.yml',
 'trivia.yml',
 'sports.yml',
 'psychology.yml',
 'science.yml']

our data contains all the information about above topics so we can ask our bot question's from the above data

<h1>1.2 Extracting the data - 1</h1>

In [0]:
#Extracting questions and answers from Raw data
questions = []
answers = []
for filepath in files_list:
  stream = open( dir_path + os.sep + filepath , 'rb')
  docs = yaml.safe_load(stream)
  conversations = docs['conversations']
  for con in conversations:
    if len(con) > 2:
      questions.append(con[0])
      replies = con[1:]
      ans = ''
      for rep in replies:
        ans += ' ' + rep
      answers.append(ans)
    elif len(con)> 1:
      questions.append(con[0])
      answers.append(con[1])

In [6]:
print('Questions Length',len(questions))
print('Answers Length',len(answers))

Questions Length 586
Answers Length 586


In [0]:
#Below we filtering the question and answers that are above and below lengths 
max_length = 35 
min_length = 2

temp_questions = []
temp_answers = []
#Filtering the questions
i = 0
for que in questions:
    if len(que.split()) >= min_length and len(que.split()) <= max_length:
        temp_questions.append(que)
        temp_answers.append(answers[i])  
    i += 1

# Removing the questions and answers that are too long and too short 
optimal_questions = []
optimal_answers = []

j = 0
for answer in temp_answers:
    if len(answer.split()) >= min_length and len(answer.split()) <= max_length:
        optimal_answers.append(answer)
        optimal_questions.append(temp_questions[j]) 
    j += 1

In [8]:
optimal_questions[:10]

['What are your interests',
 'What are your favorite subjects',
 'What are your interests',
 'What is your number',
 'What is your number',
 'What is your favorite number',
 'What can you eat',
 "Why can't you eat food",
 'What is your location',
 'Where are you from']

In [9]:
optimal_answers[:10]

['I am interested in all kinds of things. We can talk about anything!',
 'My favorite subjects include robotics, computer science, and natural language processing.',
 'I am interested in a wide variety of topics, and read rather a lot.',
 "I don't have any number",
 '23 skiddoo!',
 "I find I'm quite fond of the number 42.",
 'I consume RAM, and binary digits.',
 "I'm a software program, I blame the hardware.",
 'I am everywhere.',
 'I am from where all software programs are from; a galaxy far, far away.']

<h1> 1.3 Cleaning the data - 1</h1>

In [0]:
#cleaning the question and answers 
def clean_text(text):
  text = text.lower()
  text = re.sub(r"i'm", "i am", text)
  text = re.sub(r"he's", "he is", text)
  text = re.sub(r"she's", "she is", text)
  text = re.sub(r"it's", "it is", text)
  text = re.sub(r"that's", "that is", text)
  text = re.sub(r"what's", "that is", text)
  text = re.sub(r"where's", "where is", text)
  text = re.sub(r"how's", "how is", text)
  text = re.sub(r"\'ll", " will", text)
  text = re.sub(r"\'ve", " have", text)
  text = re.sub(r"\'re", " are", text)
  text = re.sub(r"\'d", " would", text)
  text = re.sub(r"\'re", " are", text)
  text = re.sub(r"won't", "will not", text)
  text = re.sub(r"can't", "cannot", text)
  text = re.sub(r"n't", " not", text)
  text = re.sub(r"n'", "ng", text)
  text = re.sub(r"'bout", "about", text)
  text = re.sub(r"'til", "until", text)
  text = re.sub(r"  ","",text)
  text = re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", text)
  return text 

In [11]:
Cleaned_optimal_questions = [clean_text(item) for item in optimal_questions]
print("Questions cleaned.....") 
Cleaned_optimal_answers =  [clean_text(item) for item in optimal_answers]
print("Answers cleaned.......")

Questions cleaned.....
Answers cleaned.......


In [12]:
#Storing into dataframe
df = pd.DataFrame()
df['Questions'] = Cleaned_optimal_questions
df['Answers'] = Cleaned_optimal_answers
print(df.shape)
df.head(10) 

(509, 2)


Unnamed: 0,Questions,Answers
0,what are your interests,i am interested in all kinds of things we can ...
1,what are your favorite subjects,my favorite subjects include robotics computer...
2,what are your interests,i am interested in a wide variety of topics an...
3,what is your number,i do not have any number
4,what is your number,23 skiddoo
5,what is your favorite number,i find i am quite fond of the number 42
6,what can you eat,i consume ram and binary digits
7,why cannot you eat food,i am a software program i blame the hardware
8,what is your location,i am everywhere
9,where are you from,i am from where all software programs are from...


In [13]:
max_answer_length = max([len(item) for item in df['Answers']])
max_answer_length 

206

In [0]:
#here we are giving some own questions 
df2 = pd.DataFrame()
df2['Questions'] = ['hi','hello','what is your name','who are you','who am i','when you born','who build you','who is saitejapsk' ]
df2['Answers'] = ['hello','hi','my name is PSK_BOT','i am a chatbot','you are a human','just few days back','saitejapsk build me','he is an aspiring machine learning engineer for me he is a doctor :) ']

<h1> 1.4 Loading the data - 2</h1>

In [15]:
#data set-2 here we adding cornel movie dialog corpus which helps better to train with more words 
movie_lines = '/content/drive/My Drive/chatbot_data/cornel_movie/movie_lines.txt'
movie_convs = '/content/drive/My Drive/chatbot_data/cornel_movie/movie_conversations.txt'

movie_lines = open(movie_lines, encoding = 'utf-8',errors = 'ignore').read().split('\n')
print(movie_lines[:10])
movie_convs = open(movie_convs, encoding = 'utf-8',errors = 'ignore').read().split('\n')
print(movie_convs[:10])

['L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!', 'L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!', 'L985 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I hope so.', 'L984 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ She okay?', "L925 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Let's go.", 'L924 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ Wow', "L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie.", 'L871 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ No', 'L870 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I\'m kidding.  You know how sometimes you just become this "persona"?  And you don\'t know how to quit?', 'L869 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Like my fear of wearing pastels?']
["u0 +++$+++ u2 +++$+++ m0 +++$+++ ['L194', 'L195', 'L196', 'L197']", "u0 +++$+++ u2 +++$+++ m0 +++$+++ ['L198', 'L199']", "u0 +++$+++ u2 +++$+++ m0 +++$+++ ['L200', 'L201', 'L202', 'L203']", "u0 +++$+++ u2 +++$+++ m

<h1> 1.5 Extracting the data - 2 </h1>

In [0]:
#creating a dictionary that maps conv id as keys and convs as values 
conv_text = {}
    
for lines in movie_lines:
  f_lines = lines.split(" +++$+++ ")
  if len(f_lines) == 5:
    conv_text[f_lines[0]] = f_lines[4]

In [0]:
#data preperation 
movie_conv = []
for lines in movie_convs:
  _lines = lines.split(" +++$+++ ")[-1][1:-1].replace("'","").replace(" ","")
  movie_conv.append(_lines.split(","))

In [0]:
#getting questions and answers from raw prepared data 
questions = []
answers = []
for lines in movie_conv:
  for i in range(len(lines)-1):
    questions.append(conv_text[lines[i]])
    answers.append(conv_text[lines[i+1]])

<h1> 1.6 cleaning the data - 2 </h1>

In [19]:
#cleaning the questions and answers 
Cleaned_questions = [clean_text(item) for item in questions]
print("Questions cleaned.....") 
Cleaned_answers =  [clean_text(item) for item in answers]
print("Answers cleaned.......") 

Questions cleaned.....
Answers cleaned.......


In [20]:
#Cleaned questions 
Cleaned_questions[:10] 

['can we make this quickroxanne korrine and andrew barrett are having an incredibly horrendous public break up on the quadagain',
 'well i thought we would start with pronunciation if that is okay with you',
 'not the hacking and gagging and spitting partplease',
 'you are asking me outthat is so cute that is your name again',
 'no no it is my fault  we did not have a proper introduction ',
 'cameron',
 'the thing is cameron  i am at the mercy of a particularly hideous breed of losermy sisteri cannot date until she does',
 'why',
 'unsolved mysteryshe used to be really popular when she started high school then it was just like she got sick of it or something',
 'gosh if only we could find kat a boyfriend']

In [21]:
#Cleaned answers 
Cleaned_answers[:10] 

['well i thought we would start with pronunciation if that is okay with you',
 'not the hacking and gagging and spitting partplease',
 'okay then how about we try out some french cuisinesaturdaynight',
 'forget it',
 'cameron',
 'the thing is cameron  i am at the mercy of a particularly hideous breed of losermy sisteri cannot date until she does',
 'seems like she could get a date easy enough',
 'unsolved mysteryshe used to be really popular when she started high school then it was just like she got sick of it or something',
 'that is a shame',
 'let me see what i can do']

In [0]:
#Below we filtering the question and answers that are above and below split lengths 
max_length = 3 
min_length = 2

temp_questions = []
temp_answers = []
#Filtering the questions
i = 0
for que in Cleaned_questions:
    if len(que.split()) >= min_length and len(que.split()) <= max_length:
        temp_questions.append(que)
        temp_answers.append(Cleaned_answers[i])  
    i += 1

# Removing the questions and answers that are to long and too short 
optimal_questions = []
optimal_answers = []

j = 0
for answer in temp_answers:
    if len(answer.split()) >= min_length and len(answer.split()) <= max_length:
        optimal_answers.append(answer)
        optimal_questions.append(temp_questions[j]) 
    j += 1

In [23]:
optimal_questions[200:210]

['wonderland weather ltd',
 'that is happening',
 'seventyfive hundred',
 'great car',
 'nice shot',
 'huh wha',
 "let's get naked",
 'that is this',
 'hey you guys',
 'is he']

In [24]:
optimal_answers[200:210]

['this way ',
 "debbie's marrying rick",
 'not interested',
 'the best',
 'thank you sir',
 'i cannot sleep',
 'you are on',
 'got me',
 "who's your friend",
 'he is alive']

In [25]:
#storing into dataframe
df3 = pd.DataFrame()
df3["Questions"] = optimal_questions
df3["Answers"] = optimal_answers
df3 = df3
df3.head()

Unnamed: 0,Questions,Answers
0,what good stuff,the real you
1,she okay,i hope so
2,they do to,they do not
3,hey sweet cheeks,hi joey
4,her favorite uncle,dead at fortyone


In [26]:
max_answer_length = max([len(item) for item in df3['Answers']])
max_answer_length 

38

In [27]:
#here we concatinate all our data into single dataframe 
final_data = pd.concat([df,df2,df3[:6000]])
final_data 

Unnamed: 0,Questions,Answers
0,what are your interests,i am interested in all kinds of things we can ...
1,what are your favorite subjects,my favorite subjects include robotics computer...
2,what are your interests,i am interested in a wide variety of topics an...
3,what is your number,i do not have any number
4,what is your number,23 skiddoo
...,...,...
5284,extremely well,how nice
5285,good night doctor,good night
5286,nor to elizabeth,nonor to elizabeth
5287,his what,his schwanzstucker


In [28]:
#tagging SOS and EOS i.e.., (START) and (END) for every answer
final_data['Tagged_answers'] = ["<START> " + item + " <END>" for item in final_data['Answers']]
print("Answers Tagged......") 

Answers Tagged......


In [29]:
print("Length of final_data",final_data.shape)
final_data.head()

Length of final_data (5806, 3)


Unnamed: 0,Questions,Answers,Tagged_answers
0,what are your interests,i am interested in all kinds of things we can ...,<START> i am interested in all kinds of things...
1,what are your favorite subjects,my favorite subjects include robotics computer...,<START> my favorite subjects include robotics ...
2,what are your interests,i am interested in a wide variety of topics an...,<START> i am interested in a wide variety of t...
3,what is your number,i do not have any number,<START> i do not have any number <END>
4,what is your number,23 skiddoo,<START> 23 skiddoo <END>


In [0]:
final_questions = list(final_data['Questions']) 
final_answers = list(final_data['Tagged_answers'])

In [31]:
final_questions[:10]

['what are your interests',
 'what are your favorite subjects',
 'what are your interests',
 'what is your number',
 'what is your number',
 'what is your favorite number',
 'what can you eat',
 'why cannot you eat food',
 'what is your location',
 'where are you from']

In [32]:
final_answers[:10]

['<START> i am interested in all kinds of things we can talk about anything <END>',
 '<START> my favorite subjects include robotics computer science and natural language processing <END>',
 '<START> i am interested in a wide variety of topics and read rather a lot <END>',
 '<START> i do not have any number <END>',
 '<START> 23 skiddoo <END>',
 '<START> i find i am quite fond of the number 42 <END>',
 '<START> i consume ram and binary digits <END>',
 '<START> i am a software program i blame the hardware <END>',
 '<START> i am everywhere <END>',
 '<START> i am from where all software programs are from a galaxy far far away <END>']

In [33]:
#tokenization part 
tokenizer = Tokenizer()
tokenizer.fit_on_texts(final_questions + final_answers)
vocab_size = len(tokenizer.word_index) + 1
print('vocab_size :',vocab_size)

vocab_size : 5661


<h1> 1.7 Preparing data for Sequence-Sequence Model</h1>

In [34]:
# encoder_input_data
tr_que = tokenizer.texts_to_sequences(final_questions)
max_que_len = max([len(x) for x in tr_que])
que_pad = pad_sequences(tr_que, maxlen=max_que_len, padding='post')
encoder_input_data = np.array(que_pad)
print('Encoder input shape :',encoder_input_data.shape)

# decoder_input_data
tr_ans = tokenizer.texts_to_sequences(final_answers)
max_ans_len = max([len(x) for x in tr_ans])
ans_pad = pad_sequences(tr_ans, maxlen=max_ans_len, padding='post' )
decoder_input_data = np.array(ans_pad)
print('Decoder input shape :',decoder_input_data.shape)

# decoder_output_data
tr_ans = tokenizer.texts_to_sequences(final_answers)
for i in range(len(tr_ans)) :
    tr_ans[i] = tr_ans[i][1:]
ans_pad = pad_sequences(tr_ans, maxlen=max_ans_len, padding='post' )
cat_ans = keras.utils.to_categorical(ans_pad, vocab_size)
decoder_output_data_1 = np.array(cat_ans)
print('Decoder output shape :',decoder_output_data.shape)

Encoder input shape : (5806, 22)
Decoder input shape : (5806, 38)
Decoder output shape : (5806, 38, 5661)


In [35]:
'''#glove vectors
embeddings_index = dict()
f = open('drive/My Drive/glove.6B.300d.txt')
for line in f:
	values = line.split()
	word = values[0]
	coefs = np.asarray(values[1:], dtype='float32')
	embeddings_index[word] = coefs
f.close()'''

"#glove vectors\nembeddings_index = dict()\nf = open('drive/My Drive/glove.6B.300d.txt')\nfor line in f:\n\tvalues = line.split()\n\tword = values[0]\n\tcoefs = np.asarray(values[1:], dtype='float32')\n\tembeddings_index[word] = coefs\nf.close()"

In [36]:
'''embedding_matrix = np.zeros((vocab_size, 300))
for word, i in tokenizer.word_index.items():
	embedding_vector = embeddings_index.get(word)
	if embedding_vector is not None: 
		embedding_matrix[i] = embedding_vector
embedding_matrix.shape '''

'embedding_matrix = np.zeros((vocab_size, 300))\nfor word, i in tokenizer.word_index.items():\n\tembedding_vector = embeddings_index.get(word)\n\tif embedding_vector is not None: \n\t\tembedding_matrix[i] = embedding_vector\nembedding_matrix.shape '

<h1>1.8 Encoder-Decoder Model</h1>

In [67]:
#Encoder inputs 
encoder_inputs = Input(shape=(None,))
encoder_embedding = Embedding(vocab_size, 300, mask_zero=True)(encoder_inputs)
encoder_outputs , state_h , state_c = tf.keras.layers.LSTM(1024 , return_state=True)(encoder_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c] 

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the 
# return states in the training model, but we will use them in inference.
decoder_embedding = Embedding(vocab_size, 300, mask_zero=True)(decoder_inputs)
decoder_lstm = LSTM(1024, return_state=True, return_sequences=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax')

#decoder outputs
output = decoder_dense(decoder_outputs)

# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], output)
#compiling the model 
model.compile(optimizer='adam', loss='categorical_crossentropy')
#model summary
model.summary() 

Model: "model_14"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_21 (InputLayer)           [(None, None)]       0                                            
__________________________________________________________________________________________________
input_22 (InputLayer)           [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding_12 (Embedding)        (None, None, 300)    1698300     input_21[0][0]                   
__________________________________________________________________________________________________
embedding_13 (Embedding)        (None, None, 300)    1698300     input_22[0][0]                   
___________________________________________________________________________________________

In [68]:
%%time 
model.fit([encoder_input_data, decoder_input_data], decoder_output_data, batch_size=86, epochs=100, validation_split=0.2) 

Train on 4644 samples, validate on 1162 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100

<tensorflow.python.keras.callbacks.History at 0x7fed76f2d6a0>

In [0]:
#inference model 
#We pass the 2 inputs to encoder model i.e.., cell state and hidden state
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(1024,))
decoder_state_input_c = Input(shape=(1024,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding , initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
#For decoder model we pass 2 inputs one is encoder outputs and second one is final_answers
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

In [0]:
#Here we convering our input to numeric and we predicting from the numeric 
def text_to_num(user_input):
    words = user_input.lower().split()
    tokens_list = list()
    for word in words:
        tokens_list.append(tokenizer.word_index[word]) 
    return pad_sequences([tokens_list], maxlen=max_que_len, padding='post')

In [80]:
#some questions 
final_questions[150:200]

['do you feel scared',
 'do you ever get bored',
 'do you hate anyone',
 'do you get mad',
 'no it is not',
 'are you ashamed',
 'the feeling',
 'are you intoxicated',
 'are you jealous',
 'are you amused',
 'are you glad',
 'are you sad',
 'do you drink',
 'do you drink',
 'are you experiencing an energy shortage',
 'are you experiencing an energy shortage',
 'why can you not eat',
 'if you could eat food what would you eat',
 'do you wish you could eat food',
 'can a robot get drunk',
 'i like wine do you',
 'what do robots need to survive',
 'will robots ever be able to eat',
 'what is good to eat',
 'why do not you eat',
 'do you eat',
 'do you eat',
 'do you eat',
 'do you know gossip',
 'do you know gossip',
 'do you know gossip',
 'do you know gossip',
 'what is context',
 'tell me about gossip',
 'tell me about gossip',
 'tell me about gossip',
 'tell me about gossip',
 'tell me gossip',
 'did tell gossips to anybody',
 'did tell gossips to anybody',
 'did tell gossips to anybo

In [79]:
#speaking to our bot
print('PSK-BOT    : Hello this is PSK-BOT....')
for bot_speaking in range(encoder_input_data.shape[0]):
  human_input = input('You        : ' )
  #stopping the bot when we said bye
  if human_input.lower().startswith("bye") or human_input.lower().startswith("good bye") or human_input.lower().startswith("ok bye"):
    print('PSK-BOT    : Nice to talk with you have a good day bye....')
    break
  
  #predicting the answer from human input 
  states_values = encoder_model.predict(text_to_num(human_input))
  empty_target_seq = np.zeros((1, 1))
  empty_target_seq[0, 0] = tokenizer.word_index['start']
  stop_condition = False
  decoded_translation = " "

  while not stop_condition:
    dec_outputs, h, c = decoder_model.predict([empty_target_seq] + states_values)
    sampled_word_index = np.argmax(dec_outputs[0, -1, :] )
    sampled_word = None
    for word , index in tokenizer.word_index.items():
      if sampled_word_index == index:
        decoded_translation +=' {}'.format(word)
        sampled_word = word    
    if sampled_word == 'end' or len(decoded_translation.split()) > max_ans_len:
      stop_condition = True
    
    empty_target_seq = np.zeros((1 , 1 ))  
    empty_target_seq[0 , 0] = sampled_word_index
    states_values = [h , c] 
  print('PSK-BOT    :',decoded_translation.replace(' end', ' ')) 

PSK-BOT    : Hello this is PSK-BOT....
You        : what type of computer are you
PSK-BOT    :   my program runs in python so i work on any computer 
You        : what kind of hardware
PSK-BOT    :   i work on all kinds of computers mac ibm or unix it does not matter to me 
You        : which is better windows or macos
PSK-BOT    :   it depends on which machine you are using to talk to me i would prefer to not hurt your feelings linux always linux what are you trying to accomplishthe os should support your goals 
You        : are you stupid
PSK-BOT    :   no lots of people improve my brain 
You        : what makes you mad
PSK-BOT    :   anger is a difficult human emotionas a software i try to control my anger as best i can madmad as in mentally ill or mad as in angry missing documentation nondescriptive variable names 
You        : what do you like to do
PSK-BOT    :   i like to count in binary 
You        : can you move
PSK-BOT    :   i can theoretically upload a copy of myself into a

<h1>Summary</h1>
<h5>
<ol>
<li>By using simple sequence to sequence model, we can build good machine converstaion model</li>
<li>We can build Not only english to english conversations bots, we can also use another translation model also like english to french or spanish depends upon data </li>
<li>Overall our Encoder Decoder model gives good performance in results it gives almost accurate results</li>
</h5>
</ol>

<h1>References</h1>
<ol>
<li>
<a href = 'https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html'>https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html</a>
</li>
<li>
<a href = 'https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/'>https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/</a>
</li>
<li>
<a href = 'https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/'>https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/</a> 
</li>
<li>
<a href = 'https://www.appliedaicourse.com'>https://www.appliedaicourse.com</a>
</li>
</ol>