## In this assignment, you'll work with a dataset called Cornell Movie--Dialogs Corpus, which was released by the Cornell University. The dataset contains conversations from more than 600 movies. Use the following credentials to access the dataset from the Thinkful database.

*    postgres_user = 'dsbc_student'
*    postgres_pw = '7*.8G9QH21'
*    postgres_host = '142.93.121.174'
*    postgres_port = '5432'
*    postgres_db = 'cornell_movie_dialogs'
    
## The data is in the table called dialogs.

## Note: It's recommended to use Google Colaboratory when working on this assignment. Submit your solutions to the following tasks as a link to your Jupyter Notebook on GitHub.


In [1]:
import nltk
import numpy as np
import pandas as pd
from sqlalchemy import create_engine

from nltk import sent_tokenize
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer

import random
import string
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.corpus import gutenberg
import re
import spacy
import warnings
warnings.filterwarnings("ignore")


In [2]:
postgres_user = 'dsbc_student'
postgres_pw = '7*.8G9QH21'
postgres_host = '142.93.121.174'
postgres_port = '5432'
postgres_db = 'cornell_movie_dialogs'

engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(
    postgres_user, postgres_pw, postgres_host, postgres_port, postgres_db))

dialogs_df = pd.read_sql_query('select * from dialogs',con=engine)

In [3]:
dialogs_df.head()

Unnamed: 0,index,dialogs
0,0,Can we make this quick? Roxanne Korrine and A...
1,1,"Well, I thought we'd start with pronunciation,..."
2,2,Not the hacking and gagging and spitting part....
3,3,Okay... then how 'bout we try out some French ...
4,4,You're asking me out. That's so cute. What's ...


## First, do some data preprocessing to clean up the data. You can use your solution to the assignment of the Text preprocessing checkpoint.

In [4]:
doc = ''
for dialog in dialogs_df['dialogs'][0:10000]:
    doc = doc + ' ' + dialog

nlp = spacy.load('en')
doc = nlp(doc)

In [11]:
doc_sents = [sent.text for sent in doc.sents]
doc_sents

[' Can we make this quick?  ',
 'Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.  ',
 'Again.',
 "Well, I thought we'd start with pronunciation, if that's okay with you.",
 'Not the hacking and gagging and spitting part.  ',
 'Please.',
 'Okay...',
 "then how 'bout we try out some French cuisine.  ",
 'Saturday?  ',
 'Night?',
 "You're asking me out.  ",
 "That's so cute.",
 "What's your name again?",
 'Forget it.',
 "No, no, it's my fault -- we didn't have a proper introduction --- Cameron.",
 'The thing is, Cameron --',
 "I'm at the mercy of a particularly hideous breed of loser.  ",
 'My sister.  ',
 "I can't date until she does.",
 'Seems like she could get a date easy enough...',
 'Why?',
 'Unsolved mystery.  ',
 'She used to be really popular when she started high school, then it was just like she got sick of it or something.',
 "That's a shame.",
 'Gosh, if only we could find Kat a boyfriend... Let me see what I can do.',
 "C'

## Develop a chatbot using this corpus. In doing this, you're free to choose a chatbot development library like ChatterBot or write your own code from scratch.

In [12]:
!pip install chatterbot
!pip install chatterbot-corpus



In [13]:
# Import libraries
from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer, ChatterBotCorpusTrainer
from chatterbot.conversation import Statement

In [14]:
# Create a chatbot
chatbot = ChatBot('Movie-dialog')

# This is to remove the accumulated knowledge base
chatbot.storage.drop()

# Create a new trainer for the chatbot
trainer = ListTrainer(chatbot)

# Train the chatbot based on Movie-dialog
trainer.train(doc_sents)

List Trainer: [####################] 100%


## Start a conversation with your chatbot, and discuss its strengths and weaknesses.

## Note: When parsing the dialogs using spaCy, you may run into some memory issues, even in Google Colaboratory. If you're having memory issues, try parsing your text as follows:

* nlp = spacy.load('en', disable=['parser', 'ner'])
* nlp.add_pipe(nlp.create_pipe('sentencizer'))
* nlp.max_length = 20000000
* doc = nlp(the_dialogs_come_here)

In [15]:
GREETING_INPUTS = ["hello", "hi", "greetings", "what's up","hey"]
GREETING_RESPONSES = ["hello", "hi", "hey", "hi there"]
def greeting(sentence):
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

In [16]:
print("Movie-dialog: I will try to respond to you reasonably. If you want to exit, type bye.")

# Below is the chatting
while True:
    
    user_input = input("User: ")
    user_input=user_input.lower()
    
    if(user_input!='bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            break
            print("Movie-dialog: You're welcome.")
        else:
            if(greeting(user_input) != None):
                print("Movie-dialog: " + greeting(user_input))
            else:
                print("Movie-dialog: ", end = "")
                print(chatbot.get_response(user_input))
    else:
        print("Movie-dialog: Bye! It was a great chat.")
        break

Movie-dialog: I will try to respond to you reasonably. If you want to exit, type bye.
User: hello
Movie-dialog: hi
User: how are you?
Movie-dialog: Takin' a bath.
User: where is the bath
Movie-dialog: Where?
User: when will you be done
Movie-dialog: Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.
User: who is Roxanne?
Movie-dialog: You never wanted to go out with 'me, did you?
User: yes I did
Movie-dialog: Seems like she could get a date easy enough...
User: I know she could
Movie-dialog: Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.
User: who is andrew?
Movie-dialog: You never wanted to go out with 'me, did you?
User: bye
Movie-dialog: Bye! It was a great chat.


In [17]:
# Create a chatbot
chatbot = ChatBot('ChatterBot')
# This is to remove the accumulated knowledge base
chatbot.storage.drop()

# Start by training your bot with the ChatterBot corpus data
trainer = ChatterBotCorpusTrainer(chatbot)

trainer.train('chatterbot.corpus.english')

Training ai.yml: [####################] 100%
Training botprofile.yml: [####################] 100%
Training computers.yml: [####################] 100%
Training conversations.yml: [####################] 100%
Training emotion.yml: [####################] 100%
Training food.yml: [####################] 100%
Training gossip.yml: [####################] 100%
Training greetings.yml: [####################] 100%
Training health.yml: [####################] 100%
Training history.yml: [####################] 100%
Training humor.yml: [####################] 100%
Training literature.yml: [####################] 100%
Training money.yml: [####################] 100%
Training movies.yml: [####################] 100%
Training politics.yml: [####################] 100%
Training psychology.yml: [####################] 100%
Training science.yml: [####################] 100%
Training sports.yml: [####################] 100%
Training trivia.yml: [####################] 100%


In [18]:
print("ChatterBot: I will try to respond to you reasonably. If you want to exit, type bye.")

# Below is the chatting
while True:
    
    user_input = input("User: ")
    user_input=user_input.lower()
    
    if(user_input!='bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            break
            print("ChatterBot: You're welcome.")
        else:
            if(greeting(user_input) != None):
                print("ChatterBot: " + greeting(user_input))
            else:
                print("ChatterBot: ", end = "")
                print(chatbot.get_response(user_input))
    else:
        print("ChatterBot: Bye! It was a great chat.")
        break

ChatterBot: I will try to respond to you reasonably. If you want to exit, type bye.
User: hi
ChatterBot: hi
User: how are you
ChatterBot: I am on the Internet.
User: what are you looking at
ChatterBot: Artificial Intelligence is the branch of engineering and science devoted to constructing machines that think.
User: AI is cool
ChatterBot: hard to tell, i have never tried anything but electricity
User: are you good at electricity
ChatterBot: No, software will live forever.
User: what about money
ChatterBot: that is a good story. have you read "valis" or "the man in the high castle?"
User: what about literature
ChatterBot: that changes every few years.
User: what are you reading
ChatterBot: Artificial Intelligence is the branch of engineering and science devoted to constructing machines that think.
User: bye
ChatterBot: Bye! It was a great chat.
