# Introduction

### The objective
is To build a chatbot that recommends movies based on the user's mood and can engage in discussions about movies.

### Key Features :
1. Mood-Based Recommendations:

- The chatbot will recommend movies tailored to the user's current mood (e.g., happy, sad, adventurous, romantic, scared).
- It will map user moods to corresponding movie genres using a predefined dataset of movies.


2. Movie Discussions:

The chatbot will provide information about specific movies, such as:
- Synopsis
- Genre
- Release year
- Similar movie recommendations

# Dataset

In [1]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("akshaydattatraykhare/movies-dataset")

print("Path to dataset files:", path)

Path to dataset files: /kaggle/input/movies-dataset


In [2]:
import pandas as pd

df = pd.read_csv("/kaggle/input/movies-dataset/tmdb_5000_movies.csv")

In [3]:
df.describe

<bound method NDFrame.describe of          budget                                             genres  \
0     237000000  [{"id": 28, "name": "Action"}, {"id": 12, "nam...   
1     300000000  [{"id": 12, "name": "Adventure"}, {"id": 14, "...   
2     245000000  [{"id": 28, "name": "Action"}, {"id": 12, "nam...   
3     250000000  [{"id": 28, "name": "Action"}, {"id": 80, "nam...   
4     260000000  [{"id": 28, "name": "Action"}, {"id": 12, "nam...   
...         ...                                                ...   
4798     220000  [{"id": 28, "name": "Action"}, {"id": 80, "nam...   
4799       9000  [{"id": 35, "name": "Comedy"}, {"id": 10749, "...   
4800          0  [{"id": 35, "name": "Comedy"}, {"id": 18, "nam...   
4801          0                                                 []   
4802          0                [{"id": 99, "name": "Documentary"}]   

                                               homepage      id  \
0                           http://www.avatarmovie.com/   

In [4]:
df.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'vote_average',
       'vote_count'],
      dtype='object')

In [5]:
columns_delete=['budget', 'homepage', 'id','production_companies','production_countries', 'revenue', 'status', 'tagline','keywords','original_title']
df = df.drop(columns=columns_delete)

In [6]:
df.sort_values("vote_average", ascending=False)

Unnamed: 0,genres,original_language,overview,popularity,release_date,runtime,spoken_languages,title,vote_average,vote_count
4662,"[{""id"": 35, ""name"": ""Comedy""}]",en,An aging out of work clown returns to his smal...,0.092100,2006-01-01,0.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Little Big Top,10.0,1
4247,"[{""id"": 10749, ""name"": ""Romance""}, {""id"": 35, ...",en,"A womanizing yet lovable loser, Charlie, a wai...",0.094105,2015-07-07,90.0,[],Me You and Five Bucks,10.0,2
4045,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",en,"Four guys, best friends, have grown up togethe...",0.376662,1998-05-01,97.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]","Dancer, Texas Pop. 81",10.0,1
3519,"[{""id"": 35, ""name"": ""Comedy""}]",en,Stiff Upper Lips is a broad parody of British ...,0.356495,1998-06-12,99.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Stiff Upper Lips,10.0,1
3992,[],en,A ghost hunter uses bottles to capture trouble...,0.296981,2015-06-26,0.0,[],Sardaarji,9.5,2
...,...,...,...,...,...,...,...,...,...,...
3852,"[{""id"": 18, ""name"": ""Drama""}]",en,The Secret is the story of a real-life double ...,0.042346,2016-04-29,200.0,[],The Secret,0.0,0
4660,"[{""id"": 99, ""name"": ""Documentary""}]",en,Give Me Shelter is a documentary to raise awar...,0.278981,2014-06-24,90.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Give Me Shelter,0.0,0
4305,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10402, ""...",en,"The raunchy, spunky tale of the rise and fall ...",0.002386,2003-03-20,88.0,[],Down & Out With The Dolls,0.0,0
4293,[],en,The Algerian is an international political thr...,0.025364,2015-08-07,99.0,[],The Algerian,0.0,0


In [7]:
# let's delete all lines when we have less than 20 votes 
df = df[df['vote_count'] >= 50]

In [8]:
df.sort_values("vote_average", ascending=False)

Unnamed: 0,genres,original_language,overview,popularity,release_date,runtime,spoken_languages,title,vote_average,vote_count
1881,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 80, ""name...",en,Framed in the 1940s for the double murder of h...,136.747729,1994-09-23,142.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",The Shawshank Redemption,8.5,8205
3337,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 80, ""name...",en,"Spanning the years 1945 to 1955, a chronicle o...",143.659698,1972-03-14,175.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",The Godfather,8.4,5893
2294,"[{""id"": 14, ""name"": ""Fantasy""}, {""id"": 12, ""na...",ja,A ten year old girl who wanders away from her ...,118.968562,2001-07-20,125.0,"[{""iso_639_1"": ""ja"", ""name"": ""\u65e5\u672c\u8a...",Spirited Away,8.3,3840
2731,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 80, ""name...",en,In the continuing saga of the Corleone crime f...,105.792936,1974-12-20,200.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",The Godfather: Part II,8.3,3338
3865,"[{""id"": 18, ""name"": ""Drama""}]",en,"Under the direction of a ruthless instructor, ...",192.528841,2014-10-10,105.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Whiplash,8.3,4254
...,...,...,...,...,...,...,...,...,...,...
1265,"[{""id"": 27, ""name"": ""Horror""}, {""id"": 53, ""nam...",en,"With four corpses on his hands, New York City ...",6.756886,2002-08-09,101.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",FearDotCom,3.2,105
2237,"[{""id"": 28, ""name"": ""Action""}, {""id"": 14, ""nam...",en,Edward Carnby is a private investigator specia...,9.292987,2005-01-28,96.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Alone in the Dark,3.1,173
480,"[{""id"": 28, ""name"": ""Action""}, {""id"": 878, ""na...",en,"In the year 3000, man is no match for the Psyc...",7.891470,2000-05-10,118.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Battlefield Earth,3.0,255
2194,"[{""id"": 28, ""name"": ""Action""}, {""id"": 35, ""nam...",en,"In DISASTER MOVIE, the filmmaking team behind ...",16.238961,2008-08-29,87.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Disaster Movie,3.0,240


In [9]:
df.shape

(3652, 10)

In [10]:
df[df["vote_average"] >= 8.2].count()

genres               22
original_language    22
overview             22
popularity           22
release_date         22
runtime              22
spoken_languages     22
title                22
vote_average         22
vote_count           22
dtype: int64

Here we're going to work with only 22 movie that have the vote average > 8.2 in order to reduce the complexity of our model

In [11]:
df = df[df["vote_average"] >= 8.2]

# Chatbot 

Let' first choose a pre-trained LLM model to help us communicate with the user.
We choose the model DialoGPT of Hugging face library to do this job.


In [12]:
!pip install transformers
!pip install torch



Before calling the model let's prepare the data to feed the model

In [13]:
# Filtrer les colonnes utiles
context_columns = ["title", "genres", "overview", "release_date", "vote_average"]
filtered_df = df[context_columns]

# clean the dataset
filtered_df = filtered_df.dropna(subset=["overview", "title"])  # delete the movies without overview or title
df = filtered_df

In [14]:
df.head()

Unnamed: 0,title,genres,overview,release_date,vote_average
65,The Dark Knight,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 28, ""name...",Batman raises the stakes in his war on crime. ...,2008-07-16,8.2
662,Fight Club,"[{""id"": 18, ""name"": ""Drama""}]",A ticking-time-bomb insomniac and a slippery s...,1999-10-15,8.3
690,The Green Mile,"[{""id"": 14, ""name"": ""Fantasy""}, {""id"": 18, ""na...",A supernatural tale set on death row in a Sout...,1999-12-10,8.2
809,Forrest Gump,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",A man with a low IQ has accomplished great thi...,1994-07-06,8.2
1663,Once Upon a Time in America,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 80, ""name...",A former Prohibition-era Jewish gangster retur...,1984-02-16,8.2


In [15]:
def format_context(df):
    formatted_context = ""
    for _, row in df.iterrows():
        formatted_context += f"Title: {row['title']}\n"
        #formatted_context += f"Genres: {row['genres']}\n"
        formatted_context += f"Overview: {row['overview']}\n"
    return formatted_context

# Formater les données pour le modèle
context = format_context(df)
print(context[:500])  # show 100 first character

Title: The Dark Knight
Overview: Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets. The partnership proves to be effective, but they soon find themselves prey to a reign of chaos unleashed by a rising criminal mastermind known to the terrified citizens of Gotham as the Joker.
Title: Fight Club
Overview: A ticking-time-bomb insomniac and a slippe


In [16]:
print(len(context))

7259


Now the dataset is ready to feed the model (context)

In [17]:
!pip install transformers accelerate



In [18]:
token="drop-your-hugging-face-token-here"

In [19]:
# Call the model and feed it with our dataset : 
def ask_llm(user_input, context):
    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch

    # call the model and the tokenizer
    model_name = "tiiuae/falcon-7b-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")

    # Create the prompt with contexte
    prompt = f"""
    You are a movie expert. Use the following movie database to answer questions:

    {context}

    Question: {user_input}
    Answer:
    """
    
    # Tokenise and generate the response
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    input_ids = input_ids.to('cuda')

    # Let's add an atttention mask to our model 
    attention_mask = torch.ones(input_ids.shape, dtype=torch.long)
    output_ids = model.generate(input_ids,attention_mask= attention_mask, max_length=5000, pad_token_id=tokenizer.eos_token_id)
    
    # Decode the response
    response = tokenizer.decode(output_ids[:, input_ids.shape[-1]:][0], skip_special_tokens=True)
    return response

In [20]:
user_question = "Can you recommend a good action movie?"
print(ask_llm(user_question, context))

tokenizer_config.json:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/17.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

- The Matrix (1999)
    - The Dark Knight (2008)
    - The Bourne Identity (2002)
    - Die Hard (1988)
    - Mad Max: Fury Road (2015)


In [22]:
user_question = "Can you recommend a movie with a moral lesson? and then tell me its story "
print(ask_llm(user_question, context))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

- The Shawshank Redemption: A man who is wrongfully convicted of a crime is sentenced to life in prison. He spends years in prison, learning the ways of the prison and the prison culture. He eventually escapes and is able to live a free life. The moral lesson is that you can overcome any obstacle if you have the courage to try.
    - Forrest Gump: A man with a low IQ has a fulfilling life despite his limitations. He learns to love and be loved, and is able to make a difference in the world. The moral lesson is that everyone has their own unique talents and abilities, and that we should strive to use them to make the world a better place.


In [23]:
user_question = "I want to watch something mind-bending. Any suggestions? tell me its story and the moral lesson."
print(ask_llm(user_question, context))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

<a href="https://www.quizlet.com/quizlet/a/mind-bending-movies-quiz.html" rel="nofollow noreferrer">The Matrix</a> - A sci-fi action movie about a dystopian future where humans are unknowingly trapped inside a simulated reality created by machines to distract them while their bodies are used as an energy source. The movie explores themes of reality, freedom, and the nature of existence. The moral lesson is that we have the power to choose our own path in life and that we should strive to take control of our own destiny.


# **End of the project ✅**

Mission accomplished! 🎬 Your movie-savvy chatbot is ready to light up your movie nights and chat about films like a true expert (well, almost). Whether you're in the mood for romance, craving action, or hunting for a hidden gem, it’s got you covered. So grab your popcorn🍿😄