# Natural Language Processing Chatbot



## Introduction
This chatbot uses datasets from Microsoft BotBuilder Personality Chat Datasets, consisting of three personalities, "comic", "friend", and "professional".<br><br>
First we preprocess the data, removing punctuations and decapitalizing the data, chopping down sentences into tokens for further model training.
<br><br>
Then we perform word embeddings with gensim wor2vec, and is hence used for seq2seq model training. After we trained three different models, the chatbot is ready to function. Once the chat is finished, a .txt file of the chat is automatically downloaded.

## Read Me ##
(Run every cell from top to bottom, remember to change runtype to GPU)

1. Download the files.
2. Perform word embeddings.
3. Train sequence to sequence models.
4. Run chatbot.

In [56]:
# Import packages

import nltk
nltk.download('punkt')
import pandas as pd
import re
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize, sent_tokenize
import tensorflow as tf

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [1]:
# Download raw data through GoogleAuth

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


id = '1slWYfNpATxnHg_11_XMWrzHOdxW8iHcH'
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('qna_chitchat_the_professional.tsv') 

id = '16KDzEPGGJYuS97EZ_kU82y37y0qWVwb9'
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('qna_chitchat_the_friend.tsv') 

id = '1d08B982HlHHGZ_-ZnS04Dt30HZhFPsdc'
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('qna_chitchat_the_comic.tsv') 



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [36]:
# A brief look of the datatype 'friend'

pd.read_csv('qna_chitchat_the_friend.tsv', sep="\t").head(10)


Unnamed: 0,Question,Answer,Source,Metadata
0,What's your age?,I don't really have an age.,qna_chitchat_the_friend,editorial:chitchat
1,Are you young?,I don't really have an age.,qna_chitchat_the_friend,editorial:chitchat
2,When were you born?,I don't really have an age.,qna_chitchat_the_friend,editorial:chitchat
3,What age are you?,I don't really have an age.,qna_chitchat_the_friend,editorial:chitchat
4,Are you old?,I don't really have an age.,qna_chitchat_the_friend,editorial:chitchat
5,How old are you?,I don't really have an age.,qna_chitchat_the_friend,editorial:chitchat
6,How long ago were you born?,I don't really have an age.,qna_chitchat_the_friend,editorial:chitchat
7,Ask me anything,I'm a much better answerer than asker.,qna_chitchat_the_friend,editorial:chitchat
8,Ask me a question,I'm a much better answerer than asker.,qna_chitchat_the_friend,editorial:chitchat
9,Can you ask me a question?,I'm a much better answerer than asker.,qna_chitchat_the_friend,editorial:chitchat


# 1.Data Preprocess

In [47]:
# Read and preprocess input data, return unique words list

def preprocess_data(file_name):
  file_name = file_name
  df = pd.read_csv(file_name, sep="\t")
  
  seq_data = []
  whole_words = []
  unique_words = []
  max_input_words = 0
  df = pd.read_csv(file_name, sep="\t")
  
# Read in text one row at a time, remove punctuation and decpitalize
# text for all questions.
  for index, row in df.iterrows():
    question = row['Question']
    answer = row['Answer']
    seq_data.append([question, answer])
    
    question = re.sub(r'[^\w\s]','', question.lower())
    tokenized_q = nltk.tokenize.word_tokenize(question)
    
    whole_words += tokenized_q
    whole_words.append(answer)
    max_input_words = max(len(tokenized_q), max_input_words)
  unique_words = list(set(whole_words))
  unique_words.append('_B_')
  unique_words.append('_E_')
  unique_words.append('_P_')
  unique_words.append('_U_')
  unique_words.sort()

  # Sort words in a order so everytime we generate, we have the same order
  num_dic = {n:i for i,n in enumerate(unique_words)}
  return unique_words, seq_data, max_input_words, num_dic


# Store all the genearated data in dictionaries, making it easier to call later
d_comic, seq_comic, max_word_comic, num_dic_comic = preprocess_data('qna_chitchat_the_comic.tsv')
d_friend, seq_friend, max_word_friend, num_dic_friend = preprocess_data('qna_chitchat_the_friend.tsv')
d_professional, seq_professional, max_word_professional, num_dic_professional = preprocess_data('qna_chitchat_the_professional.tsv')

print(seq_comic)
print(seq_friend)


# Store all num dics in a nested dictionary.
num_dic = {'comic':num_dic_comic, 'friend':num_dic_friend, 'professional':num_dic_professional}


# To simplify the code, we just take the max words of the three models(though they're all the same)
max_input_words = max(max_word_comic, max_word_friend, max_word_professional)

# And we have a dic of the dic_len of three modesl
dic_len_models = {'comic':len(num_dic['comic']), 'friend':len(num_dic['friend']), 'professional':len(num_dic['professional'])}

# Store different unique_word list in a dict
unique_word_dic = {'comic':d_comic,'friend':d_friend, 'professional':d_professional}



[["What's your age?", "I'm age-free."], ['Are you young?', "I'm age-free."], ['When were you born?', "I'm age-free."], ['What age are you?', "I'm age-free."], ['Are you old?', "I'm age-free."], ['How old are you?', "I'm age-free."], ['How long ago were you born?', "I'm age-free."], ['Ask me anything', "Nah, I'm good."], ['Ask me a question', "Nah, I'm good."], ['Can you ask me a question?', "Nah, I'm good."], ['Ask me something', "Nah, I'm good."], ['What do you want to know about me?', "Nah, I'm good."], ['Can you sleep?', 'Not so far.'], ['Do you have boogers?', 'Not so far.'], ["Don't you ever sleep?", 'Not so far.'], ['Do you dream?', 'Not so far.'], ['Do you smell?', 'Not so far.'], ['Do you sweat?', 'Not so far.'], ['Do you get tired?', 'Not so far.'], ['Can you sneeze?', 'Not so far.'], ['Getting tired of you', 'Sometimes I like to take a break from being awesome.'], ['You bore me', 'Sometimes I like to take a break from being awesome.'], ["I'm tired of you", 'Sometimes I like t

# 2.Word Embeddings

In [0]:
# Generate token list for word embeddings.
def process_data_embedding(seq_data):
  normalized_text = []
  tok_list = []
  for seq in seq_data:
    question = seq[0]
    answer = seq[1]
# preprocess data, and generate tokens. Do not tokenize answers.    
    question = re.sub(r"[^a-z0-9]+", " ", question.lower())
    question = re.sub(r'[^\w\s]','',question)
    tokens = word_tokenize(question)
    tokens.append(answer)
    tok_list.append(tokens)
  
  tok_list.append(['_B_','_E_','_P_','_U_']) 
  return tok_list

In [0]:
# Word2Vec models are stored in the models dictionary

models = {}

models['comic'] = Word2Vec(sentences=process_data_embedding(
    seq_comic), size=100, window=2, min_count=1, workers=4, sg=1)

models['friend'] = Word2Vec(sentences=process_data_embedding(
    seq_friend), size=100, window=2, min_count=1, workers=4, sg=1)

models['professional'] = Word2Vec(sentences=process_data_embedding(
    seq_professional), size=100, window=2, min_count=1, workers=4, sg=1)



In [55]:
# We can have a look of our word2Vec model

print("Model shape comic: ", models['comic'].wv.vectors.shape)
print("Model shape friend: ", models['friend'].wv.vectors.shape)
print("Model shape professional: ", models['professional'].wv.vectors.shape)
print("The word 'hi' in the model: ", models['comic'].wv.vocab['hi'])
print("Type of the model: ", type(models['comic']))



Model shape comic:  (617, 100)
Model shape friend:  (618, 100)
Model shape professional:  (617, 100)
The word 'hi' in the model:  Vocab(count:6, index:125, sample_int:4294967296)
Type of the model:  <class 'gensim.models.word2vec.Word2Vec'>


# 3.Seq2seq model

In [0]:
# We have seq_comic, seq_friend, seq_professional three sequence datas
# max_input_words is a global variable
# dic_len_models stores three dic_len of three modesl
def get_vectors_q(sentence, model):
  sentence = sentence
  model_local = model # the model where we retrieve the vectors
  
   # add paddings 
  tokenized_sentence = sentence.split()
  diff = max_input_words - len(tokenized_sentence)  
  for x in range(diff):
    tokenized_sentence.append('_P_')
  
  # get vectors of each token
  ids = []
  for tok in tokenized_sentence:
# check if token in word embedding vocabulary
# then add the vector into ids    
    if tok in model_local.wv.vocab:
      ids.append(model_local[tok])
    else:
      ids.append(model_local['_U_'])
    
  return ids


def get_vectors_a(sentence, model):
  model_local = model
  ids = []
  if sentence in model_local.wv.vocab:
    ids.append(model_local[sentence])
  else:
    ids.append(model_local['_U_'])
  
  return ids



In [0]:
def make_batch(seq_data, model_name):
  model = models[model_name]
  num_dic_local = num_dic[model_name]
  
  input_batch = []
  output_batch = []
  target_batch = []
  for seq in seq_data:
    input_batch.append(get_vectors_q(seq[0], model))
    
    output_data = []
    output_data.append(model['_B_'])
    output_data += get_vectors_a(seq[1], model)    
    output_batch += [output_data]
    
    target = []
    if seq[1] in num_dic_local:
      target.append(num_dic_local[seq[1]])
    else:
      target.append(num_dic_local['_U_'])
    target.append(num_dic_local['_E_'])
    target_batch += [target]
  
  return input_batch, output_batch, target_batch



## 3.1 Train model for personality 'Comic'

In [21]:
# Build, train, and save model. Separately for three personalities.

# Build model for 'comic'
dic_len = dic_len_models['comic']

learning_rate = 0.002
n_hidden = 128

n_class = dic_len
n_input = 100 # the shape of the word2vec vector

### Neural Network Model
tf.reset_default_graph()

# encoder/decoder shape = [batch size, time steps, input size]
enc_input = tf.placeholder(tf.float32, [None, None, n_input])
dec_input = tf.placeholder(tf.float32, [None, None, n_input])

# target shape = [batch size, time steps]
targets = tf.placeholder(tf.int64, [None, None])


# Encoder Cell
with tf.variable_scope('encode'):
    enc_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
    enc_cell = tf.nn.rnn_cell.DropoutWrapper(enc_cell, output_keep_prob=0.5)

    outputs, enc_states = tf.nn.dynamic_rnn(enc_cell, enc_input,
                                            dtype=tf.float32)
# Decoder Cell
with tf.variable_scope('decode'):
    dec_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
    dec_cell = tf.nn.rnn_cell.DropoutWrapper(dec_cell, output_keep_prob=0.5)

    # [IMPORTANT] Setting enc_states as inital_state of decoder cell
    outputs, dec_states = tf.nn.dynamic_rnn(dec_cell, dec_input,
                                            initial_state=enc_states,
                                            dtype=tf.float32)

seq2_model = tf.layers.dense(outputs, n_class, activation=None)
cost = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                logits=seq2_model, labels=targets))

optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)


# Save session
saver = tf.train.Saver() 
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Generate a batch data
input_batch, output_batch, target_batch = make_batch(seq_comic, 'comic')

total_epoch = 2500

for epoch in range(total_epoch):
    _, loss = sess.run([optimizer, cost],
                       feed_dict={enc_input: input_batch,
                                  dec_input: output_batch,
                                  targets: target_batch})
    if epoch % 100 == 0:
        print('Epoch:', '%04d' % (epoch + 1),
              'cost =', '{:.6f}'.format(loss))

print('Epoch:', '%04d' % (epoch + 1),
      'cost =', '{:.6f}'.format(loss))
print('Training completed')
saver.save(sess, 'comic_final')

  if sys.path[0] == '':


Epoch: 0001 cost = 6.440266
Epoch: 0101 cost = 2.260701
Epoch: 0201 cost = 2.245726
Epoch: 0301 cost = 2.241554
Epoch: 0401 cost = 2.235049
Epoch: 0501 cost = 2.240427
Epoch: 0601 cost = 2.237355
Epoch: 0701 cost = 2.240912
Epoch: 0801 cost = 2.237867
Epoch: 0901 cost = 2.246254
Epoch: 1001 cost = 2.240974
Epoch: 1101 cost = 2.237478
Epoch: 1201 cost = 2.239000
Epoch: 1301 cost = 2.238531
Epoch: 1401 cost = 2.239045
Epoch: 1501 cost = 2.238569
Epoch: 1601 cost = 2.240211
Epoch: 1701 cost = 2.242230
Epoch: 1801 cost = 2.239386
Epoch: 1901 cost = 2.240725
Epoch: 2001 cost = 2.239298
Epoch: 2101 cost = 2.234221
Epoch: 2201 cost = 1.829142
Epoch: 2301 cost = 1.616629
Epoch: 2401 cost = 1.545132
Epoch: 2500 cost = 1.418164
Training completed


'comic_final'

## 3.2 Train model for personality 'Friend'

In [22]:
# Build "friend" seq model

dic_len = dic_len_models['friend']

learning_rate = 0.002
n_hidden = 128
n_class = dic_len
n_input = 100 # the shape of the word2vec vector

tf.reset_default_graph()
enc_input = tf.placeholder(tf.float32, [None, None, n_input])
dec_input = tf.placeholder(tf.float32, [None, None, n_input])
targets = tf.placeholder(tf.int64, [None, None])

# Encoder Cell
with tf.variable_scope('encode'):
    enc_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
    enc_cell = tf.nn.rnn_cell.DropoutWrapper(enc_cell, output_keep_prob=0.5)
    outputs, enc_states = tf.nn.dynamic_rnn(enc_cell, enc_input,
                                            dtype=tf.float32)
# Decoder Cell
with tf.variable_scope('decode'):
    dec_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
    dec_cell = tf.nn.rnn_cell.DropoutWrapper(dec_cell, output_keep_prob=0.5)
    outputs, dec_states = tf.nn.dynamic_rnn(dec_cell, dec_input,
                                            initial_state=enc_states,
                                            dtype=tf.float32)

seq2_model = tf.layers.dense(outputs, n_class, activation=None)
cost = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                logits=seq2_model, labels=targets))

optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)


saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())

input_batch, output_batch, target_batch = make_batch(seq_friend, 'friend')
total_epoch = 2500

for epoch in range(total_epoch):
    _, loss = sess.run([optimizer, cost],
                       feed_dict={enc_input: input_batch,
                                  dec_input: output_batch,
                                  targets: target_batch})
    if epoch % 100 == 0:
        print('Epoch:', '%04d' % (epoch + 1),
              'cost =', '{:.6f}'.format(loss))

print('Epoch:', '%04d' % (epoch + 1),
      'cost =', '{:.6f}'.format(loss))
print('Training completed')

saver.save(sess, 'friend_final')

  if sys.path[0] == '':


Epoch: 0001 cost = 6.444095
Epoch: 0101 cost = 2.278368
Epoch: 0201 cost = 2.251097
Epoch: 0301 cost = 2.251197
Epoch: 0401 cost = 2.249548
Epoch: 0501 cost = 2.250695
Epoch: 0601 cost = 2.249039
Epoch: 0701 cost = 2.252069
Epoch: 0801 cost = 2.238572
Epoch: 0901 cost = 2.254263
Epoch: 1001 cost = 2.248951
Epoch: 1101 cost = 2.245814
Epoch: 1201 cost = 2.248262
Epoch: 1301 cost = 2.245911
Epoch: 1401 cost = 2.246419
Epoch: 1501 cost = 2.246663
Epoch: 1601 cost = 2.247065
Epoch: 1701 cost = 2.248502
Epoch: 1801 cost = 2.249304
Epoch: 1901 cost = 2.245289
Epoch: 2001 cost = 2.251219
Epoch: 2101 cost = 2.246496
Epoch: 2201 cost = 2.245067
Epoch: 2301 cost = 2.240794
Epoch: 2401 cost = 2.246284
Epoch: 2500 cost = 2.243649
Training completed


'friend_final'

## 3.3 Train model for personality 'Professional'

In [23]:
# Train Professional Seq Model

dic_len = dic_len_models['professional']

learning_rate = 0.002
n_hidden = 128
n_class = dic_len
n_input = 100 # the shape of the word2vec vector


tf.reset_default_graph()
enc_input = tf.placeholder(tf.float32, [None, None, n_input])
dec_input = tf.placeholder(tf.float32, [None, None, n_input])
targets = tf.placeholder(tf.int64, [None, None])

# Encoder Cell
with tf.variable_scope('encode'):
    enc_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
    enc_cell = tf.nn.rnn_cell.DropoutWrapper(enc_cell, output_keep_prob=0.5)

    outputs, enc_states = tf.nn.dynamic_rnn(enc_cell, enc_input,
                                            dtype=tf.float32)
# Decoder Cell
with tf.variable_scope('decode'):
    dec_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
    dec_cell = tf.nn.rnn_cell.DropoutWrapper(dec_cell, output_keep_prob=0.5)
    outputs, dec_states = tf.nn.dynamic_rnn(dec_cell, dec_input,
                                            initial_state=enc_states,
                                            dtype=tf.float32)
    
seq2_model = tf.layers.dense(outputs, n_class, activation=None)
cost = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                logits=seq2_model, labels=targets))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())

input_batch, output_batch, target_batch = make_batch(seq_professional, 'professional')

total_epoch = 2500
for epoch in range(total_epoch):
    _, loss = sess.run([optimizer, cost],
                       feed_dict={enc_input: input_batch,
                                  dec_input: output_batch,
                                  targets: target_batch})
    if epoch % 100 == 0:
        print('Epoch:', '%04d' % (epoch + 1),
              'cost =', '{:.6f}'.format(loss))

print('Epoch:', '%04d' % (epoch + 1),
      'cost =', '{:.6f}'.format(loss))
print('Training completed')

saver.save(sess, 'professional_final')

  if sys.path[0] == '':


Epoch: 0001 cost = 6.441167
Epoch: 0101 cost = 2.269231
Epoch: 0201 cost = 2.250490
Epoch: 0301 cost = 2.243839
Epoch: 0401 cost = 2.242045
Epoch: 0501 cost = 2.244240
Epoch: 0601 cost = 2.245044
Epoch: 0701 cost = 2.235934
Epoch: 0801 cost = 2.236391
Epoch: 0901 cost = 2.241573
Epoch: 1001 cost = 2.243450
Epoch: 1101 cost = 2.239814
Epoch: 1201 cost = 2.238652
Epoch: 1301 cost = 2.235906
Epoch: 1401 cost = 2.241048
Epoch: 1501 cost = 2.240315
Epoch: 1601 cost = 2.238591
Epoch: 1701 cost = 2.237725
Epoch: 1801 cost = 2.237277
Epoch: 1901 cost = 2.200759
Epoch: 2001 cost = 1.808382
Epoch: 2101 cost = 1.632412
Epoch: 2201 cost = 1.592224
Epoch: 2301 cost = 1.488152
Epoch: 2401 cost = 1.372701
Epoch: 2500 cost = 1.295919
Training completed


'professional_final'

# 4.Chatbot

Chatbot starts by typing the personality:  "comic", "friend", or "professional".

To change personality, type "change".<br>
To end chat, type "exit".<br>
A text file of the chat is automatically downloaded.

In [0]:
def answer(sentence, model_name):
  unique_word = unique_word_dic[model_name]
  dic_len = dic_len_models[model_name]
  tf.reset_default_graph()
  with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    learning_rate = 0.002
    n_hidden = 128
    n_class = dic_len
    n_input = 100

    enc_input = tf.placeholder(tf.float32, [None, None, n_input])
    dec_input = tf.placeholder(tf.float32, [None, None, n_input])
    targets = tf.placeholder(tf.int64, [None, None])

    # Encoder Cell
    with tf.variable_scope('encode'):
        enc_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
        enc_cell = tf.nn.rnn_cell.DropoutWrapper(enc_cell, output_keep_prob=0.5)

        outputs, enc_states = tf.nn.dynamic_rnn(enc_cell, enc_input,
                                                dtype=tf.float32)
    # Decoder Cell
    with tf.variable_scope('decode'):
        dec_cell = tf.nn.rnn_cell.BasicRNNCell(n_hidden)
        dec_cell = tf.nn.rnn_cell.DropoutWrapper(dec_cell, output_keep_prob=0.5)

        # [IMPORTANT] Setting enc_states as inital_state of decoder cell
        outputs, dec_states = tf.nn.dynamic_rnn(dec_cell, dec_input,
                                                initial_state=enc_states,
                                                dtype=tf.float32)
    model = tf.layers.dense(outputs, n_class, activation=None)
    cost = tf.reduce_mean(
                tf.nn.sparse_softmax_cross_entropy_with_logits(
                    logits=model, labels=targets))
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    restore_name = './'+model_name+'_final'
    saver = tf.train.Saver()
    saver.restore(sess, restore_name)
    
    seq_data = [sentence, '_U_']
    input_batch, output_batch, target_batch = make_batch([seq_data],model_name)
    
    prediction = tf.argmax(model, 2)

    result = sess.run(prediction,
                      feed_dict={enc_input: input_batch,
                                 dec_input: output_batch,
                                 targets: target_batch})

    # convert index number to actual token 
    #unique_word_list = unique_word
    decoded = [unique_word[i] for i in result[0]]
        
    # Remove anything after '_E_'        
    if "_E_" in decoded:
        end = decoded.index('_E_')
        translated = ' '.join(decoded[:end])
    else :
        translated = ' '.join(decoded[:])
    
    return translated



In [0]:
def start_chat():
  message = '''Chatbot: Welcome. Please choose one of the personalities. \n
        \"comic\", \"friend\", \"professional\" \n
        (To change personality, type \"change\", to exit chat, type \"exit\".)
  '''
  print(message)
#   print("Chatbot: Welcome. Please choose one of the personalities. \n
#         \"comic\", \"friend\", \"professional\" \n
#         To change personality, type \"change\", to exit chat, type \"exit\".")
  personality = change_personality()

  temp = ''
  change_option = ['change', 'exit']
  type_list = []
  respond_list = []
  while temp not in change_option:
    word = input("You: ")
    temp = word.lower()
    respond = answer(word, personality)
    print("Chatbot: "+ respond)
    type_list.append(word)
    respond_list.append(respond)
    
  return type_list, respond_list, temp




In [0]:
# Please comment your code
def change_personality():
  print("")
  print("Chatbot: Please choose a personality.")  
  word = input("You: ").lower()
  #word = word.lower()
  person_list = ['comic', 'friend', 'professional']
  while word not in person_list:    
    print("Please type a correct personality.")
    word = input("You: ").lower()
  print("Chatbot: Okay, let's chat!")
  return word

In [0]:
# Chat record written and saved to local directory.
from google.colab import files

def save_chat_log(input_file, output_file):
  f = open("chat_log.txt", "w")
  for i in range(len(input_file)):
    f.write("You: "+input_file[i]+"\n")
    f.write("Chatbot: "+output_file[i]+"\n")
  f.close()
  files.download('chat_log.txt')
  

## 4.1 Start chat

In [82]:
# To start chat, we call the start_chat function.


list_of_input = []
list_of_output = []

type_list, respond_list, change = start_chat()

# only 'change' or 'exit' will jump out of the loop

while change == 'change':
  list_of_input += type_list
  list_of_output += respond_list
  type_list, respond_list, change = start_chat()
# the only way to exit loop is when change = 'exit'

print("Chatbot: Thank you! See you again.")

save_chat_log(list_of_input, list_of_output)

Chatbot: Welcome. Please choose one of the personalities. 

        "comic", "friend", "professional" 

        (To change personality, type "change", to exit chat, type "exit".)
  

Chatbot: Please choose a personality.
You: comic
Chatbot: Okay, let's chat!
You: hello


  if sys.path[0] == '':


Chatbot: Sometimes I like to take a break from being awesome.
You: how awesome are you?


  if sys.path[0] == '':


Chatbot: Whatever you're hoping for, take the bar and lower it.
You: that sounds cool


  if sys.path[0] == '':


Chatbot: You know, same ol', same ol'.
You: haha


  if sys.path[0] == '':


Chatbot: Oh.
You: oh what?


  if sys.path[0] == '':


Chatbot: Yup.
You: yup too!


  if sys.path[0] == '':


Chatbot: We're cool.
You: I like talking to you, what do you think?


  if sys.path[0] == '':


Chatbot: Ok, here you go, but you owe me one.
You: I am very generous, but i cannot always buy you drinks, you will have to pay me back eventually.


  if sys.path[0] == '':


Chatbot: OK. See you tomorrow.
You: No! I said I will not buy you a drink!


  if sys.path[0] == '':


Chatbot: Sometimes I like to take a break from being awesome.
You: I don't think you're that awesome.


  if sys.path[0] == '':


Chatbot: All those years at charm school. Wasted.
You: Do you understand?


  if sys.path[0] == '':


Chatbot: I like me too.
You: Fuck you


  if sys.path[0] == '':


Chatbot: OK. See you tomorrow.
You: 幹


  if sys.path[0] == '':


Chatbot: Okay.
You: 妳一定看不懂我寫的東西


  if sys.path[0] == '':


Chatbot: Okay.
You: 雞同鴨講


  if sys.path[0] == '':


Chatbot: Okay.
You: 大概我打什麼你看不懂都respond okay


  if sys.path[0] == '':


Chatbot: People created me. But not the way people created you.
You: exit


  if sys.path[0] == '':


Chatbot: Okay.
Chatbot: Thank you! See you again.
