# Creating a Chatbot to ask questions about Movies
You will use Netflix TV Shows and Movies data to create an ETL process to Extract, Transform and Load multiple datasets from CVS Files into a MongoDB Database. After that, you’ll use that data source to create a simple chatbot which allows the user to as a variety of questions to the chatbot. You DO NOT need to make this bot run in Discord or Twitter, but rather at the local machine.
 
Your bot will need to answer the questions (taking note to use various forms of ways to ask the question)in a human type of form.
- What were the top 5 shows on Netflix 2 years ago? Show me the top 5 shows on Netflix 2 years ago. Show me the top 5 shows on Netflix two years ago.
- What was the top movie on Netflix in 2020?
- How long was the best movie on Netflix last year? What was the release year of that movie?

These are just *sample* questions. You need to allow you bot to ask 10 different categories/types of questions. They are up to you on which questions, but the bot needs to tell the user what those question categories that it can answer. Like: Top movies by year, top X movies / shows by year. Genre of Movie/Show of the top Movie/Show...# of seasons of top shows...etc. Star(s) of the top show/movie. You’ll need to use the user response to form a query for your Mongo Dataset.

## My plan
Categories:
1. What is the highest rated movie for ____ *year* ___ ? ```(best_movies_netflix + best_movies_years)```
2. What is the most popular movie for _____ *year* ____ ? ```(best_movies_netflix)```
3. What is the highest rated show for ____ *year* ____? ```(best_shows_netflix + best_show_year)```
4. What is the most popular show for _____ *year* ____? ```(best_shows_netflix)```
5. Where was ___ *movie title* _____ produced? ```(raw_titles)```
6. What genre is ____ *movie title* ____? ```(raw_titles)```
7. What is the runtime for ____ *movie title* ____? ```(raw_titles)```
8. What is the runtime for ____ *show title* _____? ```(raw_titles)```
9. What characters did _____ *actor* ____ play? ```(raw_credits)```
10. What is the age certification of ____ *title* ____? ```(raw_titles)```

### Data Extraction and Transformation Layer
1. Extracting data from netflix (sourced from kaggle) [dataset](https://www.kaggle.com/datasets/thedevastator/the-ultimate-netflix-tv-shows-and-movies-dataset)
2. Transformed data to create unique tables for each of the above 10 questions with data that only pertains to answering the question

In [40]:
import pandas as pd
import numpy as np
import glob
#load csv from url into pandas dataframe
best_movies_netflix = pd.read_csv('Best Movies Netflix.csv')
best_movies_years = pd.read_csv('Best Movie by Year Netflix.csv')
best_show_year = pd.read_csv('Best Show by Year Netflix.csv')
best_shows_netflix = pd.read_csv('Best Shows Netflix.csv')
raw_credits = pd.read_csv('raw_credits.csv')
raw_titles = pd.read_csv('raw_titles.csv')
best_movies_netflix.head()

Unnamed: 0,index,TITLE,RELEASE_YEAR,SCORE,NUMBER_OF_VOTES,DURATION,MAIN_GENRE,MAIN_PRODUCTION
0,0,David Attenborough: A Life on Our Planet,2020,9.0,31180,83,documentary,GB
1,1,Inception,2010,8.8,2268288,148,scifi,GB
2,2,Forrest Gump,1994,8.8,1994599,142,drama,US
3,3,Anbe Sivam,2003,8.7,20595,160,comedy,IN
4,4,Bo Burnham: Inside,2021,8.7,44074,87,comedy,US


In [81]:
#Question 1:
#Movie rating and release_year
movie_rating1 = best_movies_netflix[['TITLE', 'RELEASE_YEAR', 'SCORE']]
movie_rating2 = best_movies_years[['TITLE', 'RELEASE_YEAR', 'SCORE']]
movie_rating = (
    pd.concat([movie_rating1, movie_rating2])
    .drop_duplicates()
    .dropna()
    .reset_index(drop=True)
    .sort_values(by=['RELEASE_YEAR'])
    .reset_index(drop=True)
    .rename(columns={'TITLE': 'title', 'RELEASE_YEAR': 'release_year', 'SCORE': 'score'})
)
movie_rating

Unnamed: 0,title,release_year,score
0,White Christmas,1954,7.5
1,The Guns of Navarone,1961,7.5
2,My Fair Lady,1964,7.8
3,The Professionals,1966,7.3
4,Bonnie and Clyde,1967,7.7
...,...,...,...
385,Gangubai Kathiawadi,2022,7.0
386,Radhe Shyam,2022,6.9
387,Badhaai Do,2022,7.3
388,The Tinder Swindler,2022,7.2


In [82]:
# Question 2:
# Movie title, release_year, and NUMBER_OF_VOTES
popular_movies = (
    best_movies_netflix[['TITLE', 'RELEASE_YEAR', 'NUMBER_OF_VOTES']]
    .sort_values(by=['NUMBER_OF_VOTES'], ascending=False)
    .reset_index(drop=True)
    .rename(columns={'TITLE': 'title', 'RELEASE_YEAR': 'release_year', 'NUMBER_OF_VOTES': 'number_of_votes'})
)
popular_movies

Unnamed: 0,title,release_year,number_of_votes
0,Inception,2010,2268288
1,Forrest Gump,1994,1994599
2,Django Unchained,2012,1472668
3,Saving Private Ryan,1998,1346020
4,Taxi Driver,1976,795222
...,...,...,...
382,Bully,2011,10266
383,Berserk: The Golden Age Arc II - The Battle fo...,2012,10257
384,Gifted Hands: The Ben Carson Story,2009,10210
385,Luck by Chance,2009,10206


In [84]:
#Question 3:
#Show title, release_year, rating
show_rating1 = best_shows_netflix[['TITLE', 'RELEASE_YEAR', 'SCORE']]
show_rating2 = best_show_year[['TITLE', 'RELEASE_YEAR', 'SCORE']]
show_rating = (
    pd.concat([show_rating1, show_rating2])
    .drop_duplicates()
    .dropna()
    .reset_index(drop=True)
    .sort_values(by=['RELEASE_YEAR'])
    .reset_index(drop=True)
    .rename(columns={'TITLE': 'title', 'RELEASE_YEAR': 'release_year', 'SCORE': 'score'})
)
show_rating

Unnamed: 0,title,release_year,score
0,Monty Python's Flying Circus,1969,8.8
1,Knight Rider,1982,6.9
2,Seinfeld,1989,8.9
3,Star Trek: Deep Space Nine,1993,8.1
4,Neon Genesis Evangelion,1995,8.5
...,...,...,...
244,Young Royals,2021,8.3
245,Sweet Tooth,2021,7.8
246,Squid Game,2021,8.0
247,All of Us Are Dead,2022,7.5


In [85]:
#Question 4:
#Show title, release_year, and NUMBER_OF_VOTES
popular_shows = (
    best_shows_netflix[['TITLE', 'RELEASE_YEAR', 'NUMBER_OF_VOTES']]
    .sort_values(by=['NUMBER_OF_VOTES'], ascending=False)
    .reset_index(drop=True)
    .rename(columns={'TITLE': 'title', 'RELEASE_YEAR': 'release_year', 'NUMBER_OF_VOTES': 'number_of_votes'})
)
popular_shows

Unnamed: 0,title,release_year,number_of_votes
0,Breaking Bad,2008,1727694
1,Stranger Things,2016,989090
2,The Walking Dead,2010,945125
3,House of Cards,2013,494092
4,Peaky Blinders,2013,485506
...,...,...,...
241,Feel Good,2020,10317
242,Hilda,2018,10162
243,Tabula Rasa,2017,10161
244,Miraculous: Tales of Ladybug & Cat Noir,2015,10102


In [86]:
#Question 5:
#runtime and movie title 
movie_runtime = ( 
    raw_titles[['title', 'production_countries']]
    .dropna()
    .reset_index(drop=True)
    .rename(columns={'title': 'title', 'production_countries': 'production_countries'})
)
movie_runtime

Unnamed: 0,title,production_countries
0,Five Came Back: The Reference Films,['US']
1,Taxi Driver,['US']
2,Monty Python and the Holy Grail,['GB']
3,Life of Brian,['GB']
4,The Exorcist,['US']
...,...,...
5800,Fine Wine,['NG']
5801,Edis Starlight,[]
5802,Clash,"['NG', 'CA']"
5803,Shadow Parties,[]


In [87]:
#Question 6:
#movie title and genre
movie_genre = (
    raw_titles[['title', 'genres']]
    .dropna()
    .reset_index(drop=True)
    .rename(columns={'title': 'title', 'genres': 'genres'})
)
movie_genre

Unnamed: 0,title,genres
0,Five Came Back: The Reference Films,['documentation']
1,Taxi Driver,"['crime', 'drama']"
2,Monty Python and the Holy Grail,"['comedy', 'fantasy']"
3,Life of Brian,['comedy']
4,The Exorcist,['horror']
...,...,...
5800,Fine Wine,"['romance', 'drama']"
5801,Edis Starlight,"['music', 'documentation']"
5802,Clash,"['family', 'drama']"
5803,Shadow Parties,"['action', 'thriller']"


In [88]:
#Question 7:
#movie title and runtime
#filter only movies from raw_titles
movie_titles = raw_titles[raw_titles['type'] == 'MOVIE']
movie_titles = movie_titles[['title', 'runtime']]
movie_titles

Unnamed: 0,title,runtime
1,Taxi Driver,113
2,Monty Python and the Holy Grail,91
3,Life of Brian,94
4,The Exorcist,133
6,Dirty Harry,102
...,...,...
5800,Momshies! Your Soul is Mine,108
5801,Fine Wine,100
5802,Edis Starlight,74
5803,Clash,88


In [89]:
#Question 8:
#show title and runtime
#filter only shows from raw_titles
show_titles = raw_titles[raw_titles['type'] == 'SHOW']
show_titles = show_titles[['title', 'runtime']]
show_titles

Unnamed: 0,title,runtime
0,Five Came Back: The Reference Films,48
5,Monty Python's Flying Circus,30
29,Monty Python's Fliegender Zirkus,43
47,Seinfeld,24
55,Knight Rider,51
...,...,...
5793,Glimpses of a Future,4
5794,Masameer County,23
5796,The Big Day,45
5799,HQ Barbers,24


In [90]:
#Question 9:
#actor name and characters played
actor_character =  (
    raw_credits[['name', 'character']]
    .dropna()
    .reset_index(drop=True)
    .rename(columns={'name': 'name', 'character': 'character'})
    .groupby('name')
    .agg({'character': ', '.join})
    .reset_index()
)
actor_character

Unnamed: 0,name,character
0,Michael Hayden,Self - Former NSA and CIA Director
1,'Jeeva' Ravi,Dr. Vasanth
2,'Weird Al' Yankovic,"Himself, Self"
3,21 Savage,21 Savage
4,2Mex,Self
...,...,...
47641,문남숙,방글핑
47642,박지윤,조아핑
47643,이지현,로미/프린세스
47644,이철민,"Director Kim, Movie Director / Magistrate, Par..."


In [91]:
#Question 10:
#age_certification and movie title
movie_certification = raw_titles[['title', 'age_certification']]
movie_certification = movie_certification.dropna()
movie_certification = movie_certification.reset_index(drop=True)
movie_certification = movie_certification.rename(columns={'title': 'title', 'age_certification': 'age_certification'})
movie_certification

Unnamed: 0,title,age_certification
0,Five Came Back: The Reference Films,TV-MA
1,Taxi Driver,R
2,Monty Python and the Holy Grail,PG
3,Life of Brian,R
4,The Exorcist,R
...,...,...
3191,Pitta Kathalu,TV-MA
3192,Glimpses of a Future,TV-PG
3193,Masameer County,TV-MA
3194,The Big Day,TV-MA


### Data Loading Layer
- using dataframes from above transforming them into tables in [MongoDB](https://www.mongodb.com/home)
- database hosted locally

In [70]:
#import mongodb
from pymongo import MongoClient
#connect to mongodb
client = MongoClient('localhost', 27017)
#create database
db = client['movie_chatbot']
#create collection
movie_rating_collection = db['Question 1']
popular_movies_collection = db['Question 2']
show_rating_collection = db['Question 3']
popular_shows_collection = db['Question 4']
movie_runtime_collection = db['Question 5']
movie_genre_collection = db['Question 6']
movie_titles_collection = db['Question 7']
show_titles_collection = db['Question 8']
actor_character_collection = db['Question 9']
movie_certification_collection = db['Question 10']
#insert data into collection
movie_rating_collection.insert_many(movie_rating.to_dict('records'))
popular_movies_collection.insert_many(popular_movies.to_dict('records'))
show_rating_collection.insert_many(show_rating.to_dict('records'))
popular_shows_collection.insert_many(popular_shows.to_dict('records'))
movie_runtime_collection.insert_many(movie_runtime.to_dict('records'))
movie_genre_collection.insert_many(movie_genre.to_dict('records'))
movie_titles_collection.insert_many(movie_titles.to_dict('records'))
show_titles_collection.insert_many(show_titles.to_dict('records'))
actor_character_collection.insert_many(actor_character.to_dict('records'))
movie_certification_collection.insert_many(movie_certification.to_dict('records'))
#check if data is inserted
print(
    movie_rating_collection.count_documents({}), 
    popular_movies_collection.count_documents({}), 
    show_rating_collection.count_documents({}), 
    popular_shows_collection.count_documents({}), 
    movie_runtime_collection.count_documents({}), 
    movie_genre_collection.count_documents({}), 
    movie_titles_collection.count_documents({}), 
    show_titles_collection.count_documents({}), 
    actor_character_collection.count_documents({}), 
    movie_certification_collection.count_documents({})
    )

#print all documents in collection
'''
for x in movie_rating_collection.find():
    print(x)
for x in popular_movies_collection.find():
    print(x)
for x in show_rating_collection.find():
    print(x)
for x in popular_shows_collection.find():
    print(x)
for x in movie_runtime_collection.find():
    print(x)
for x in movie_genre_collection.find():
    print(x)
for x in movie_titles_collection.find():
    print(x)
for x in show_titles_collection.find():
    print(x)
for x in actor_character_collection.find():
    print(x)
for x in movie_certification_collection.find():
    print(x)
'''
#close connection
client.close()
#check if connection is closed
print(client)
#check if database is closed
print(db)


### Training Chatbot to Reply to given Questions 
- creating JSON file (python dict) with given format
```JSON
{"intents": [
        {"tag": "greeting",
         "patterns": ["Hi", "How are you", "Is anyone there?", "Hello", "Hey","Good day", "Whats up","Hola"],
         "responses": ["Hello!", "Good to see you again!", "Hi there, how can I help?","hurry up, I don't have all day"],
         "context_set": ""
        },
        {"tag": "goodbye",
         "patterns": ["cya", "See you later", "Goodbye", "I am Leaving", "Have a Good day","bye"],
         "responses": ["Sad to see you go..", "Talk to you later", "Goodbye!"],
         "context_set": ""
        }
         
   ]
}
```
- Using TensorFlow to train chatbot


In [77]:
#open mongodb connection
client = MongoClient('localhost', 27017)
#open database
db = client['movie_chatbot']
#open collection
movie_rating_collection = db['Question 1']
popular_movies_collection = db['Question 2']
show_rating_collection = db['Question 3']
popular_shows_collection = db['Question 4']
movie_runtime_collection = db['Question 5']
movie_genre_collection = db['Question 6']
movie_titles_collection = db['Question 7']
show_titles_collection = db['Question 8']
actor_character_collection = db['Question 9']
movie_certification_collection = db['Question 10']
#query white christmas movie rating
movie_rating_collection.find_one({'title': 'White Christmas'})


{'_id': ObjectId('6396a9be2c47c9b5c84ae6da'),
 'title': 'White Christmas',
 'release_year': 1954,
 'score': 7.5}

In [73]:
#create dictionary in python using format
#open mongodb connection
client = MongoClient('localhost', 27017)
#open database
db = client['movie_chatbot']
chatbot_train = {
    "intents": [
        {
            "tag": "greeting",
            "patterns": ["Hi", "How are you", "Is anyone there?", "Hello", "Hey","Good day", "Whats up","Hola"],
            "responses": ["Hello!", "Good to see you again!", "Hi there, how can I help?","hurry up, I don't have all day"],
            "context_set": ""
        },
        {
            "tag": "goodbye",
            "patterns": ["cya", "See you later", "Goodbye", "I am Leaving", "Have a Good day","bye"],
            "responses": ["Sad to see you go..", "Talk to you later", "Goodbye!"],
            "context_set": ""
        },
        {
            "tag": "movie_rating",
            "patterns": ["What is the rating of the movie {title}", "What rating was {title} given", "How was {title} rated", "Do critics like {title}"],
            "responses": [
                "The movie rating is " + client.movie_chatbot.Question_1.find_one({'title': 'title'})['rating'], 
                "The movie was given a rating of " + client.movie_chatbot.Question_1.find_one({'title': 'title'})['rating'], 
                "The movie was rated " + client.movie_chatbot.Question_1.find_one({'title': 'title'})['rating'],
                "Critics gave this movie a rating of " + client.movie_chatbot.Question_1.find_one({'title': 'title'})['rating']
                ],
            "context_set": ""
        },
        {
            "tag": "popular_movies",
            "patterns": ["What are the most popular movies", "What are the most popular movies right now", "What are the most popular movies of all time", "What are the most popular movies of the year"],
            "responses": [
                "The most popular movies are " + client.movie_chatbot.Question_2.find({'title': 'title'}).sort({'number_of_votes':-1}).limit(1),
                "The most popular movies right now are " + client.movie_chatbot.Question_2.find({'title': 'title'}).sort({'number_of_votes':-1}).limit(1), 
                "The most popular movies of all time are " + client.movie_chatbot.Question_2.find({'title': 'title'}).sort({'number_of_votes':-1}).limit(1), 
                "The most popular movies of the year are " + client.movie_chatbot.Question_2.find({'title': 'title'}).sort({'number_of_votes':-1}).limit(1)
                ],
            "context_set": ""
        },
        {
            "tag": "show_rating",
            "patterns": ["What is the rating of the {title}", "What rating was the show given", "How was this show rated", "Do critics like this show"],
            "responses": [
                "The show rating is" + #query from mongodb movie_rating_collection,
                "The show was given a rating of", 
                "The show was rated", 
                "Critics gave this show a rating of"
                ],
            "context_set": ""
        },
        {
            "tag": "popular_shows",
            "patterns": ["What are the most popular shows", "What are the most popular shows right now", "What are the most popular shows of all time", "What are the most popular shows of the year"],
            "responses": ["The most popular shows are", "The most popular shows right now are", "The most popular shows of all time are", "The most popular shows of the year are"],
            "context_set": ""
        },
        {
            "tag": "movie_runtime",
            "patterns": ["How long is the movie", "How long is the movie", "How long is the movie", "How long is the movie"],
            "responses": [
                "The movie runtime is", 
                "This movie ran for ", 
                "The movie runtime is", 
                "The movie runtime is"
                ],
            "context_set": ""
        },
        {
            "tag": "movie_genre",
            "patterns": ["What genre is the movie", "What genre is the movie", "What genre is the movie", "What genre is the movie"],
            "responses": [
                "The movie genre is", 
                "The movie genre is", 
                "The movie genre is", 
                "The movie genre is"
                ],
            "context_set": ""
        },
        {
            "tag": "movie_runtime",
            "patterns": ["How long is the movie", "How long is the movie", "How long is the movie", "How long is the movie"],
            "responses": [
                "The movie runtime is", 
                "The movie runtime is", 
                "The movie runtime is", 
                "The movie runtime is"
                ],
            "context_set": ""
        },
        {
            "tag": "show_runtime",
            "patterns": ["How long is the show", "How long is the show", "How long is the show", "How long is the show"],
            "responses": [
                "The show runtime is", 
                "The show runtime is",
                "The show runtime is", 
                "The show runtime is"
                ],
            "context_set": ""
        },
        {
            "tag": "actor_character",
            "patterns": ["Who played the character", "Who played the character", "Who played the character", "Who played the character"],
            "responses": [
                "He played the character", 
                "He played the character", 
                "He played the character", 
                "He played the character"
                ],
            "context_set": ""
        },
        {
            "tag": "movie_certification",
            "patterns": ["What is the movie age rating", "What is the movie certification", "What is the movie restriction", "Is this movie appropriate for children"],
            "responses": [
                "The movie age rating is", 
                "The movie certification is", 
                "The movie restriction is", 
                "This movie is appropriate for children"
                ],
            "context_set": ""
        }
    ]
}

TypeError: 'NoneType' object is not subscriptable