# An Interactive NLP-based AI System: Chatbot

NLP-based chatbot system that helps users to find information about movies and assist with ticket booking for Savoy Cinema Nottingham. The user can:
* Have a general conversation with the chatbot
* Ask for information about movies,so the user can query about movies by name, genre, director, actor, year, and plot
* Get recommendations for movies
* Assist the user in booking tickets for movies at Savoy Cinema Nottingham

## Requirements

In [1]:
#!pip install nltk
#!pip install pandas
#!pip install requests
#!pip install beautifulsoup4
#!pip install pickle
#!pip install numpy

## Import Library

In [1]:
import numpy as np
import random
import nltk
import json
import pandas as pd
import datetime
import pickle
import requests
import webbrowser
from math import log10
from bs4 import BeautifulSoup
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer

## Load Data

There are three datasets used in this project:
* small_talk.json: This dataset contains a list of questions and answers for small talk.
* refined_dataset.csv: This dataset contains a list of movies with their information.

In [3]:
talk_data = []
with open("small_talk.json") as file:
    talk_data = json.load(file)

In [4]:
movies = pd.read_csv('refined_dataset.csv', delimiter=',')
movies.dataframeName = 'refined_dataset.csv'

## Data Retrieval

For the chatbot to enable the users to book tickets for movies at Savoy Cinema Nottingham, the data is scraped from the Savoy Cinema website. The data is scraped using the BeautifulSoup library and the data is stored in a json file. The data is loaded from the json file when needed.

In [5]:
URL = "https://savoyonline.co.uk/SavoyNottingham.dll/Home"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
script = soup.find_all('script')[3]
json_part = script.contents[0][17:]

# print(json_part)
data = json.loads(json_part)['Events']

## Similarity Functions

The functions are used to calculate the similarity between two documents.

In [2]:
"""
    This function is used to calculate the Jaccard similarity between two documents.
    The function takes two documents as input and returns the Jaccard similarity score.
"""
def Jaccard_Similarity(doc1, doc2):
    # List the unique words in a document
    words_doc1 = set(doc1.lower().split())
    words_doc2 = set(doc2.lower().split())

    # Find the intersection of words list of doc1 & doc2
    intersection = words_doc1.intersection(words_doc2)

    # Find the union of words list of doc1 & doc2
    union = words_doc1.union(words_doc2)

    # Calculate Jaccard similarity score
    # using length of intersection set divided by length of union set
    return float(len(intersection)) / len(union)

"""
    This function is used to calculate the log frequency weighting of a vector.
    The function takes a vector as input and returns the log frequency weighted vector.
"""
def logfreq_weighting(vector):
    lf_vector = []
    for frequency in vector:
        lf_vector.append(log10(1+frequency))
    return np.array(lf_vector)

"""
    This function is used to calculate manhattan distance between two vectors.
    The function takes two vectors as input and returns the similarity score.
"""
def sim_manhattan(vector_1, vector_2):
    diff = abs(vector_1 - vector_2)
    manhattan_distance = diff.sum()
    similarity = 1 / (1+manhattan_distance)
    return similarity

## Data Preprocessing


For the movie recommendation system plots need to be pre-processed to provide suitable recommendations. Hence, the movie plots from the movie dataset is tokenized, filtered, and stemmed. The bag of words is created and log frequency weighting is applied to the bag of words.To avoid the time-consuming process of preprocessing the data, the preprocessed data is saved in a pickle file and loaded when needed. The 'bow.pkl' file contains the bag of words and 'vocab.pkl' file contains the vocabulary.

In [6]:
#Tokensize
tokenizer = nltk.RegexpTokenizer(r"\w+")
tok_plots = []
for movie_plot in movies['Plot']:
    tok_plots.append(tokenizer.tokenize(movie_plot))
# print(tok_plots )

filtered_plots = []
english_stopwords = stopwords.words('english')
for tok_words in tok_plots:
    filtered_plots.append([word.lower() for word in tok_words
                           if word.lower() not in english_stopwords])

sb_stemmer = SnowballStemmer('english')
stemmed_plots = []
for filt_words in filtered_plots:
    stemmed_plots.append([sb_stemmer.stem(word)for word in filt_words])

# vocabulary = []
# for plots in stemmed_plots:
#     for stem in plots:
#         if stem not in vocabulary:
#             vocabulary.append(stem)

bow = {}
for plot in range(len(stemmed_plots)):
    bow[plot] = np.zeros(len(vocabulary))
    for stem in stemmed_plots[plot]:
        index = vocabulary.index(stem)
        bow[plot][index] += 1


# for plot in range(len(stemmed_plots)):
#     bow[plot] = logfreq_weighting(bow[plot])

# bow_file = open("bow.pkl", "wb")
# pickle.dump(bow, bow_file)
# bow_file.close()

# vocab_file = open("vocab.pkl", "wb")
# pickle.dump(vocabulary, vocab_file)
# vocab_file.close()

# a_file = open("data.pkl", "rb")
# bow = pickle.load(a_file)

In [6]:
a_file = open("bow.pkl", "rb")
bow = pickle.load(a_file)

a_file = open("vocab.pkl", "rb")
vocabulary = pickle.load(a_file)

## Chat Bot

The chatbot serves 3 main purposes of having a conversation, providing movie information, and assist booking tickets for movies at Savoy Cinema Nottingham. The chatbot is exited when the user enters 'bye'.

In [7]:
# Main Menu options
small_talk = '1'
movie_info = '2'
savoy_cinema = '3'
end_string = "bye"

name = input('Hi I am chatty bot, To start off what is your name?:\n')

addons = ["Loving this game {}".format(name), "This is fun!,", "I could do this forever,{}".format(name)]

# Main Menu and Introduction
print("Chatty Bot: {},A bit about me, I love movies! and I know all about them".format(name)) 
print("\t 1 if you just want to have a chat")
print("\t 2 if you want to know about movies ")
print("\t 3 if you want to book a movie at Savoy Cinema Nottingham")
print("\t bye to exit the chat bot")
query = input("Me:")

# bye is used to make sure the user stays in chatbot until they want to exit
bye = False

while not bye:
    # ---- ChatBot Exit Case---
    if query == end_string:
        print("Chatty Bot: See you later, have a good day!")
        bye = True
        break
        
    # --- Movie Information retrieval ----
    elif query == movie_info:
        query = '0'
        print("Chatty Bot: {},I just love movies! and I know all about them. Give me a movie name?(if you want to exit this type 'exit' and press return)".format(name))
        info_ret = input("Me: ")
        
        # info_ret_exit is used to make sure the user stays in Movie Information retrieval until they want to exit
        info_ret_exit = False
        
        while not info_ret_exit:
            if info_ret == "exit":
                info_ret_exit = True
                print("Chatty Bot: You have now exited the info retreival option!")
                break
                
            # processing the query to find the corresponding movie information from the movie dataset
            most_similar_movie_count = 0
            temp_similarity = 0
            for movie_name in movies['Title']:
                simialrity_score = Jaccard_Similarity(info_ret, movie_name)
                if temp_similarity < simialrity_score:
                    temp_similarity = simialrity_score
                    most_similar_movie = movie_name
                    
            # if there is no such movie in the dataset        
            if temp_similarity < 0.2:
                print("Chatty Bot: Ahh that's a shame, I dont know what movie it is")
                print("Chatty Bot: "+ addons[random.randint(0, len(addons)-1)]+" .Give me another movie name? (if you want to exit type 'exit' and press return)".format(name))
                info_ret = input("Me:" )
                continue
                
            # if the movie exists then output the information and ask if user needs recommendations    
            else:                
                most_similar_movie_info = movies.loc[movies[movies['Title'] == most_similar_movie].index]
                most_similar_movie_plot = most_similar_movie_info['Plot'].values[0]
                most_similar_movie_release_year = most_similar_movie_info['Release Year'].values[0]
                most_similar_movie_origin = most_similar_movie_info['Origin/Ethnicity'].values[0]
                most_similar_movie_dir = most_similar_movie_info['Director'].values[0]
                most_similar_movie_cast = most_similar_movie_info['Cast'].values[0]
                most_similar_movie_genre = most_similar_movie_info['Genre'].values[0]
                most_similar_movie_wiki = most_similar_movie_info['Wiki Page'].values[0]

                print("Name: ", most_similar_movie)
                print("\nYear: ", most_similar_movie_release_year)
                print("\nOrigin: ", most_similar_movie_origin)
                print("\nDirector: ", most_similar_movie_dir)
                print("\nGenre: ", most_similar_movie_genre)
                print("\nWiki_page: ", most_similar_movie_wiki)
                print("\nPlot: ", most_similar_movie_plot)
                print("\n\n")
                print("Chatty Bot: Do you want me to recommend similar movies to {} say yes or no".format(most_similar_movie))
                movie_recomm = input("Me: ")
                if  movie_recomm == "exit":
                    info_ret_exit = True
                    print("Chatty Bot: See you later!")
                    break
                if movie_recomm == 'yes':
                    
                ###Pre-process the queried movies plot to find the most similar plot from mvoie dataset 
                    query_plot = most_similar_movie_plot

                    # Tokensize
                    tokeniser = nltk.RegexpTokenizer(r"\w+")
                    tok_query = tokeniser.tokenize(query_plot)

                    # Remove stop words
                    english_stopwords = stopwords.words('english')
                    filtered_query = [word.lower() for word in tok_query if word.lower() not in english_stopwords]

                    # Stemming
                    sb_stemmer = SnowballStemmer('english')
                    stemmed_query = [sb_stemmer.stem(word) for word in filtered_query]
                    
                    #Vectorisation
                    vector_query = np.zeros(len(vocabulary))
                    for stem in stemmed_query:
                        try:
                            index = vocabulary.index(stem)
                            vector_query[index] += 1
                        except ValueError:
                            continue
                    vector_query = logfreq_weighting(vector_query)
                    
                    #Finding similar plots
                    similarities = []
                    temp_similarity = 0
                    for document in bow.keys():
                        similarity = sim_manhattan(bow[document], vector_query)
                        similarities.append(similarity)
                    temp = sorted(similarities, reverse=True)
                    
                    #find the movie titles for the similar plots
                    top_5 = []
                    for num in range(1, 6):
                        plot_index = temp[num]
                        top_5.append(similarities.index(plot_index))
                        
                    #output the top 5 similar movies to the user
                    for num in top_5:
                        most_similar_movie_info = movies.loc[num]
                        print(most_similar_movie_info['Title'])


            print("Chatty Bot: "+ addons[random.randint(0, len(addons)-1)]+" .Give me another movie name? (if you want to exit type 'exit' and press return)".format(name))
            info_ret = input("Me:" )

            

    # --- Small Talk ----                     
    elif query == small_talk:
        query = '0'
        # exit is used to make sre the user stays in small talk until they want to exit
        exit = False
        print("Chatty Bot: {},whats up(to exit Small Talk type 'exit' and press return)?".format(name))

        while not exit:
            talk = input("Me: ")
            if talk == "exit":
                exit = True
                print("Chatty Bot: See you later!")
                break
                
            # process the queried question and compare with the questions in the small talk dataset
            else:
                most_similar_question = 0
                temp_similarity = 0
                for intent in talk_data:
                    similarity = Jaccard_Similarity(talk, intent['Question'])
                    if temp_similarity < similarity:
                        temp_similarity = similarity
                        most_similar_question_ans = intent['Answer']
                
                if temp_similarity < 0.5:
                    print("Chatty Bot: I am sorry, I don't know what you mean")
                else:
                    print("Chatty Bot: "+ most_similar_question_ans)
                    
                    
                    
    # --- Movie Transaction----           
    elif query == savoy_cinema:
        query = '0'
        print("Chatty Bot: {},what movie are you looking to watch at Savoy Cinema Nottingham?(if you want to exit this type 'exit' and press return)".format(name))
        savoy_query = input("Me: ")
        
        # savoy_exit is used to make sre the user stays in the transaction options until they want to exit
        savoy_exit = False
        
        while not savoy_exit: 
            if savoy_query == "exit":
                savoy_exit = True
                print("Chatty Bot: See you later!")
                break
                
            # getting all the movies premiering in Savoy Cinema
            savoy_movies = []
            for event in data:
                savoy_movies.append(event['Title'])
                
            #processing the queried movie title   
            temp_similarity = 0
            for movie in savoy_movies:
                similarity = Jaccard_Similarity(movie,savoy_query)
                if temp_similarity < similarity:
                    temp_similarity = similarity
                    most_similar_movie = movie
                    
                    
            if temp_similarity < 0.5:
                print("Sorry the movie {} is not premiering in Savoy Cinema".format(savoy_query))
                
            elif temp_similarity >= 0.5:
                savoy_movie_index = savoy_movies.index(most_similar_movie)
                event = data[savoy_movie_index]['Performances']
                
                #processing the movie date
                print("Chat Bot: What date do you want to watch {} in form of dd/mm/yyyy:".format(most_similar_movie))       
                movie_date =  input("Me:")
                
                #making sure the user can still exit the transaction option
                if movie_date == 'exit':
                    print("Chatty Bot: See you later!")
                    savoy_exit = True
                    break
                print("Show times on the {} are: (Choose one of the options)".format(movie_date))
                movie_date = datetime.datetime.strptime(movie_date,"%d/%m/%Y" ).strftime("%Y-%m-%d")
                
                # using the both queried movie and the date to get the booking URL
                option = 0
                show_url = []
                for shows in event:
                    if(shows['StartDate'] == movie_date ):
                        option += 1
                        print(str(option)+ " "+shows['StartTime'] + " in "+ shows['AuditoriumName'])
                        p = shows['URL']
                        booking_URL = "https://savoyonline.co.uk/SavoyNottingham.dll/"+p
                        show_url.append(booking_URL)
                        
                if option == 0:
                    print("no showings on {}".format(movie_date))
                    
                else:
                    movie_option = input("Me:")
                    if movie_option == 'exit':
                        print("Chatty Bot: See you later")
                        savoy_exit = True
                        break
                    
                    else:
                        print("Thank you I will now redirect you to the booking link enjoy!")
                        webbrowser.open(show_url[int(movie_option)-1])  
            print("Chatty Bot: Wanna book another movie {}? (if you want to exit type 'exit' and press return)".format(name))
            savoy_query = input("Me:")
    
    #Error handling if the users inputs something other than 1,2,3 and 'bye' in the main menu
    else:
        print("Chatty Bot: {} I dont understand that".format(name))
    
    # Main menu        
    print("Chatty Bot: What do you want to do(type the option and press return)?") 
    print("\t 1 if you just want to have a chat")
    print("\t 2 if you want to know about movies ")
    print("\t 3 if you want to book a movie at Savoy Cinema Nottingham")

    print("\t bye : to exit the chat bot")
    query = input("Me:")


Chatty Bot: shrav,A bit about me, I love movies! and I know all about them
	 1 if you just want to have a chat
	 2 if you want to know about movies 
	 3 if you want to book a movie at Savoy Cinema Nottingham
	 bye to exit the chat bot
Chatty Bot: shrav I dont understand that
Chatty Bot: What do you want to do(type the option and press return)?
	 1 if you just want to have a chat
	 2 if you want to know about movies 
	 3 if you want to book a movie at Savoy Cinema Nottingham
	 bye : to exit the chat bot
Chatty Bot: shrav,whats up(to exit Small Talk type 'exit' and press return)?
Chatty Bot: I am sorry, I don't know what you mean
Chatty Bot: Sorry youre feeling blue today. Hope things get better soon.
Chatty Bot: I am sorry, I don't know what you mean
Chatty Bot: Hi!
Chatty Bot: I am sorry, I don't know what you mean
Chatty Bot: I cant really explain.
Chatty Bot: Hi!
Chatty Bot: I am sorry, I don't know what you mean
Chatty Bot: See you later!
Chatty Bot: What do you want to do(type the 

## Bench Marking

In [None]:
#Benchmarking for small talk
small_talk_benchmark = talk_data[0:20]
x = 0
for test_q in small_talk_benchmark:
    for intent in talk_data:
        similarity = Jaccard_Similarity(test_q['Question'], intent['Question'])
        if temp_similarity < similarity:
            temp_similarity = similarity
            most_similar_question_ans = intent['Answer']
    print("for test question "+ str(x) + " similarity is "+str(temp_similarity) )
    x= x + 1
            


In [None]:
#Benchmarking for info retrieval
movie_search_benchmark = movies['Title'][0:101]
x = 0 
for test_movie in movie_search_benchmark:
    for movie_name in movies['Title']:
        simialrity_score = Jaccard_Similarity(test_movie, movie_name)
        if temp_similarity < simialrity_score:
            temp_similarity = simialrity_score
            most_similar_movie = movie_name
    print("for test movie "+ str(x) + " similarity is "+str(temp_similarity) )
    x= x + 1