---
**Author:** Mohamed-Taqy Salmi — AI & Full-Stack Developer  
This project uses open-source libraries (NLTK, scikit-learn, etc.).  
See `README.md` for details and `requirements.txt` for dependencies.  
Connect with me on [LinkedIn](https://www.linkedin.com/in/mohamedtaqysalmi/)

---


## Data  
The analysis uses the **TMDB 5000 Movie Dataset**.  
**Users must download the data directly from Kaggle**:  
🔗 [Download here](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata)  

*(License: [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/))*

# **Botflix - Movie Recommendation System**
Botflix is a movie recommendation system designed to help users discover movies they might enjoy. It uses content-based filtering to recommend movies based on genres and provides additional functionalities like searching for movies, suggesting random movies, and finding top movies by theme. The system is interactive and user-friendly, making it easy for users to explore and find their next favorite movie.

## Features
1. Movie Recommendations:
  - Recommends movies similar to a given movie based on genres.
  - Displays detailed information about each recommended movie, including:
    - Title
    - Genres
    - Release Date
    - Runtime
    - Rating
    - Overview (description)

2. Top-N Movies by Theme:
  Displays the top-rated movies in a specific genre or theme (e.g., Action, Comedy, Drama).

3. Advanced Search:
  Allows users to search for movies by:
    - Title: Search for movies by their title.
    - Year: Search for movies released in a specific year.
    - Actor: Search for movies featuring a specific actor.
    - Rating: Search for movies with a minimum rating.

4. Random Movie Suggestion:
  Suggests a random movie with all its details.

5. Interactive Chatbot:
  - Provides a user-friendly interface for interacting with the system.
  - Guides users through the available functionalities.


## How It Works
1. Content-Based Filtering:
- The system uses TF-IDF (Term Frequency-Inverse Document Frequency) to analyze the genres of movies.
- It computes cosine similarity between movies based on their genres to find similar movies.
2. Data Preprocessing:
- The genres column in the dataset is cleaned and converted into a list of genre names.
- Missing or invalid data is handled gracefully.
3. User Interaction:
- Users interact with the system through a chatbot interface.
- They can choose from various options like getting recommendations, searching for movies, or getting a random movie suggestion.

## Technologies Used
- Python: The core programming language used for the project.
- Pandas: For data manipulation and analysis.
- NumPy: For numerical computations.
- Scikit-learn: For TF-IDF vectorization and cosine similarity.
- Random Module: For suggesting random movies.

## Dataset
The project uses the TMDB 5000 Movies Dataset, which contains information about 5000 movies, including:
- Title
- Genres
- Release Date
- Runtime
- Overview (description)
- Vote Average (rating)
---

# Install Required Libraries
This code installs the necessary Python libraries for data analysis and machine learning tasks. Specifically, it installs:
- Pandas: For data manipulation and analysis.
- Scikit-learn: For machine learning and data preprocessing.

In [None]:
!pip install pandas scikit-learn

# Import Required Libraries
This code imports essential Python libraries and modules needed for data analysis, text processing, and machine learning tasks. These libraries provide the tools for:
- Data manipulation and analysis.
- Text feature extraction.
- Similarity computation.
- Nearest neighbor search.

## Libraries Imported:
- pandas as pd: For data manipulation and analysis (e.g., DataFrames).
- numpy as np: For numerical computations and array operations.
- cosine_similarity (from sklearn.metrics.pairwise): For computing cosine similarity between vectors.
- TfidfVectorizer (from sklearn.feature_extraction.text): For converting text data into numerical features using TF-IDF.
- NearestNeighbors (from sklearn.neighbors): For finding nearest neighbors in a dataset.
- ast: For safely evaluating stringified data structures (e.g., converting strings to lists or dictionaries).

## Purpose:
- Prepares the environment for tasks like text processing, similarity computation, and recommendation systems.
- Enables efficient data handling and machine learning operations.

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
import ast

✅ Movies and Ratings Loaded


**Note:** The following line is required **only** if you're using Google Colab:

> ```python
 from google.colab import drive
 drive.mount('/content/drive')
 # Mount Google Drive in Google Colab
This code mounts Google Drive to your Google Colab environment, allowing you to access files and datasets stored in your Drive directly from the Colab notebook.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Load Datasets and Compute Similarity
This code performs the following tasks:
- Loads movie and ratings datasets from Google Drive.
- Preprocesses the genres column in the movies dataset.
- Computes cosine similarity between movies based on their genres.

In [None]:
movies = pd.read_csv('data/tmdb_5000_movies.csv') #Path should be changed
ratings = pd.read_csv('data/tmdb_5000_credits.csv') #Path should be changed

print("✅ Movies and Ratings Loaded")

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres'].fillna(''))

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
movie_indices = pd.Series(movies.index, index=movies['title']).drop_duplicates()

✅ Movies and Ratings Loaded


## Extract and Clean Genres
This function processes the genres column in the movies dataset, which contains stringified JSON data. It converts the JSON strings into a readable format by extracting genre names and joining them into a comma-separated string.

In [None]:
def extract_genres(genre_str):
    """
    Convert the stringified genres into a list of genre names.
    Example: '[{"id": 16, "name": "Animation"}, {"id": 35, "name": "Comedy"}]' -> "Animation, Comedy"
    """
    if pd.isna(genre_str) or genre_str == "[]":
        return "Unknown"
    try:
        genres = ast.literal_eval(genre_str)  # Convert string to list of dictionaries
        return ", ".join([g['name'] for g in genres])  # Extract genre names
    except (ValueError, SyntaxError, KeyError):
        return "Unknown"

## Preprocess Genres and Compute Similarity
This code performs the following tasks:
- Cleans and preprocesses the genres column in the movies dataset.
- Converts the cleaned genre data into a numerical format using TF-IDF.
- Computes cosine similarity between movies based on their genres.
- Creates a mapping of movie titles to their indices for easy lookup.

In [None]:
movies['genres_cleaned'] = movies['genres'].apply(extract_genres)

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres_cleaned'].fillna(''))

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

movie_indices = pd.Series(movies.index, index=movies['title']).drop_duplicates()

## Movie Recommendation Function
This function recommends movies similar to a given movie based on genre similarity. It uses the precomputed cosine similarity matrix (cosine_sim) to find the most similar movies and displays detailed information about each recommended movie.

In [None]:
def recommend_movies(movie_title, top_n=5):
    if movie_title not in movie_indices:
        return "Movie not found in dataset!"

    idx = movie_indices[movie_title]

    sim_scores = list(enumerate(cosine_sim[idx]))

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    sim_scores = sim_scores[1:top_n+1]

    movie_indices_recommended = [i[0] for i in sim_scores]
    movie_titles_recommended = movies['title'].iloc[movie_indices_recommended]
    movie_genres_recommended = movies['genres_cleaned'].iloc[movie_indices_recommended]
    movie_overviews_recommended = movies['overview'].iloc[movie_indices_recommended]
    movie_release_dates_recommended = movies['release_date'].iloc[movie_indices_recommended]
    movie_runtimes_recommended = movies['runtime'].iloc[movie_indices_recommended]
    movie_vote_averages_recommended = movies['vote_average'].iloc[movie_indices_recommended]

    print(f"\n🤖 Botflix: Because you liked **{movie_title}**, you might also like:\n")
    for i, (title, genres, overview, release_date, runtime, vote_average) in enumerate(zip(
        movie_titles_recommended, movie_genres_recommended, movie_overviews_recommended,
        movie_release_dates_recommended, movie_runtimes_recommended, movie_vote_averages_recommended
    )):
        print(f"{i+1}. 🎬 {title}")
        print(f"   📌 Genres: {genres}")
        print(f"   📅 Release Date: {release_date}")
        print(f"   ⏱️ Runtime: {runtime} minutes")
        print(f"   ⭐ Rating: {vote_average}")
        print(f"   📖 Overview: {overview}\n")

## Top-N Movies by Theme
This function finds and displays the top-rated movies in a specific genre or theme. It filters the movies dataset based on the provided theme, sorts the movies by their average rating, and displays the top top_n movies with detailed information.

In [None]:
def top_n_movies_by_theme(theme, top_n=5):
    theme_movies = movies[movies['genres_cleaned'].str.contains(theme, case=False)]

    if theme_movies.empty:
        print(f"No movies found for the theme: {theme}")
        return

    theme_movies = theme_movies.sort_values(by='vote_average', ascending=False).head(top_n)

    print(f"\n🏆 Top {top_n} Movies in the '{theme}' Theme:\n")
    for _, row in theme_movies.iterrows():
        print(f"🎬 {row['title']}")
        print(f"   📌 Genres: {row['genres_cleaned']}")
        print(f"   📅 Release Date: {row['release_date']}")
        print(f"   ⏱️ Runtime: {row['runtime']} minutes")
        print(f"   ⭐ Rating: {row['vote_average']}")
        print(f"   📖 Overview: {row['overview']}\n")

## Search Movies by Criteria
This function allows users to search for movies based on specific criteria such as title, year, actor, or rating. It filters the movies dataset according to the provided query and search type, then displays detailed information about the matching movies.

In [None]:
def search_movie(query, search_by="title"):
    if search_by == "title":
        results = movies[movies['title'].str.contains(query, case=False)]
    elif search_by == "year":
        results = movies[movies['release_date'].str.contains(query, case=False)]
    elif search_by == "actor":
        results = movies[movies['cast'].str.contains(query, case=False)]
    elif search_by == "rating":
        results = movies[movies['vote_average'] >= float(query)]
    else:
        print("Invalid search type. Please use 'title', 'year', 'actor', or 'rating'.")
        return

    if results.empty:
        print(f"No movies found for the search: {query}")
        return

    print(f"\n🔍 Search Results for '{query}' (by {search_by}):\n")
    for _, row in results.iterrows():
        print(f"🎬 {row['title']}")
        print(f"   📌 Genres: {row['genres_cleaned']}")
        print(f"   📅 Release Date: {row['release_date']}")
        print(f"   ⏱️ Runtime: {row['runtime']} minutes")
        print(f"   ⭐ Rating: {row['vote_average']}")
        print(f"   📖 Overview: {row['overview']}\n")

## Suggest a Random Movie
This function suggests a random movie from the dataset and displays detailed information about it. It is useful for users who want to discover a movie without any specific criteria.

In [None]:
def suggest_random_movie():
    random_movie = movies.sample(1).iloc[0]
    print("\n🎲 Botflix: Here's a random movie for you!\n")
    print(f"🎬 {random_movie['title']}")
    print(f"   📌 Genres: {random_movie['genres_cleaned']}")
    print(f"   📅 Release Date: {random_movie['release_date']}")
    print(f"   ⏱️ Runtime: {random_movie['runtime']} minutes")
    print(f"   ⭐ Rating: {random_movie['vote_average']}")
    print(f"   📖 Overview: {random_movie['overview']}\n")

## Interactive Chatbot for Movie Recommendations
This function implements an interactive chatbot that serves as a movie recommendation assistant. It provides a menu-driven interface for users to:
- Get movie recommendations based on a movie they like.
- Find top-N movies in a specific theme.
- Search for movies by title, year, actor, or rating.
- Get a random movie suggestion.
- Exit the chatbot.

In [None]:
def chatbot():
    print("🤖 Hello! I'm Botflix, your movie recommendation assistant.")
    while True:
        print("\nWhat would you like to do?")
        print("1. Get movie recommendations based on a movie you like.")
        print("2. Get top-N movies in a specific theme (e.g., Action, Comedy).")
        print("3. Search for a movie.")
        print("4. Suggest a random movie.")
        print("5. Exit.")

        choice = input("Enter your choice (1/2/3/4/5): ")

        if choice == "1":
            movie_title = input("Enter a movie title you like: ")
            top_n = int(input("How many recommendations would you like? "))
            recommend_movies(movie_title, top_n)

        elif choice == "2":
            theme = input("Enter a theme (e.g., Action, Comedy, Drama): ")
            top_n = int(input("How many movies would you like to see? "))
            top_n_movies_by_theme(theme, top_n)

        elif choice == "3":
            print("Search by:")
            print("1. Title")
            print("2. Year")
            print("3. Actor")
            print("4. Rating")
            search_type = input("Enter your choice (1/2/3/4): ")
            if search_type == "1":
                query = input("Enter a movie title to search for: ")
                search_movie(query, search_by="title")
            elif search_type == "2":
                query = input("Enter a year to search for: ")
                search_movie(query, search_by="year")
            elif search_type == "3":
                query = input("Enter an actor's name to search for: ")
                search_movie(query, search_by="actor")
            elif search_type == "4":
                query = input("Enter a minimum rating (e.g., 7.5): ")
                search_movie(query, search_by="rating")
            else:
                print("Invalid choice. Please try again.")

        elif choice == "4":
            suggest_random_movie()

        elif choice == "5":
            print("🤖 Thank you for using Botflix. Goodbye!")
            break

        else:
            print("Invalid choice. Please try again.")

## Entry Point for the Chatbot

In [None]:
if __name__ == "__main__":
    chatbot()

🤖 Hello! I'm Botflix, your movie recommendation assistant.

What would you like to do?
1. Get movie recommendations based on a movie you like.
2. Get top-N movies in a specific theme (e.g., Action, Comedy).
3. Search for a movie.
4. Suggest a random movie.
5. Exit.
Enter your choice (1/2/3/4/5): 1
Enter a movie title you like: Toy Story
How many recommendations would you like? 5

🤖 Botflix: Because you liked **Toy Story**, you might also like:

1. 🎬 Happy Feet Two
   📌 Genres: Animation, Comedy, Family
   📅 Release Date: 2011-11-17
   ⏱️ Runtime: 100.0 minutes
   ⭐ Rating: 5.8
   📖 Overview: Mumble the penguin has a problem: his son Erik, who is reluctant to dance, encounters The Mighty Sven, a penguin who can fly! Things get worse for Mumble when the world is shaken by powerful forces, causing him to brings together the penguin nations and their allies to set things right.

2. 🎬 Monsters, Inc.
   📌 Genres: Animation, Comedy, Family
   📅 Release Date: 2001-11-01
   ⏱️ Runtime: 92.0 min