<br>

<br>

<br>

# 🎬 **MOVIE RECOMMENDATION** 🎬

**K-NEAREST NEIGHBORS**

<br>

## **INDEX**

- **STEP 1: PROBLEM DEFINITION AND DATA COLLECTION**
- **STEP 2: DATA EXPLORATION AND CLEANING**
- **STEP 3: DATA PROCESSING**
- **STEP 4: FEATURE ENGINEERING**
- **STEP 5: MODEL DEVELOPMENT**
- **STEP 6: RECOMMENDATION SYSTEM IMPLEMENTATION**
- **STEP 7: MODEL SAVING**
- **STEP 8: CONCLUSION**

<br>

## **STEP 1: PROBLEM DEFINITION AND DATA COLLECTION**

- 1.1. Problem definition
- 1.2. Library Importing
- 1.3. Data Collection

<br>

**1.1. PROBLEM DEFINITION**


The goal of this project is to create a movie recommendation system that predicts which movies might be of interest to a user based on the similarity to a given movie. The project leverages the K-Nearest Neighbors (KNN) algorithm to calculate the similarity between movies by processing and analyzing metadata from two datasets: `tmdb_5000_movies` and `tmdb_5000_credits`.

<br>

**Datasets and Interrelation**
- **`tmdb_5000_movies.csv`**: Contains information like `movie_id`, `title`, `overview`, `genres`, and `keywords`.
- **`tmdb_5000_credits.csv`**: Includes cast and crew details for each movie.
- Both datasets share the `title` column, which is used to join them and create a unified dataset for analysis.

<br>

**Methodology: K-Nearest Neighbors (KNN)**

- **KNN** is a non-parametric, instance-based learning algorithm used for classification and regression.
- In essence, the term "non-parametric" means that **KNN** does not make rigid assumptions about the data and bases its decisions directly on the observed instances. It's like having an algorithm that "learns on the fly" every time it needs to make a prediction.
- The methodology involves vectorizing movie metadata into numerical representations, enabling comparisons between movies.
- Using cosine similarity as a distance metric, **KNN** identifies the closest neighbors in the feature space, where proximity signifies greater similarity.
- By leveraging metadata such as **genres**, **keywords**, **cast**, and **crew**, KNN directly aligns with the project's goal: to recommend movies based on their resemblance to a given input.

<br>

**1.2. LIBRARY IMPORTING**

In [29]:
import pandas as pd
import numpy as np
import sqlite3
import json
import pickle
import warnings
warnings.filterwarnings('ignore')

from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import TfidfVectorizer

**1.3. DATA COLLECTION**

In [30]:

movies_data_url = "https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_movies.csv"
credits_data_url = "https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_credits.csv"

In [31]:
# Load datasets
movies_df = pd.read_csv(movies_data_url)
credits_df = pd.read_csv(credits_data_url)

In [32]:
movies_df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


In [33]:
credits_df.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [34]:
# Merge datasets
df = movies_df.merge(credits_df, on='title', how='left')
df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


<br>

## **STEP 2: DATA EXPLORATION AND CLEANING**

- 2.1. Exploration
- 2.2. Cleaning

<br>

**2.1. EXPLORATION**

In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               

<br>

**2.2. CLEANING**

In [36]:
# Drop unnecessary column
df = df[['movie_id', 'title', 'overview', 'genres', 'keywords', 'cast', 'crew']] #hemos mantenido esto
df.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [37]:
# Check for missing values.
print(df.isnull().sum())

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64


In [38]:
# Fill or drop missing values
df.dropna(subset=['overview', 'genres', 'keywords', 'cast', 'crew'], inplace=True)

<br>

<br>

## **STEP 3: DATA PROCESSING**

- 3.1. Helper Functions for JSON Processing
- 3.2. Text Normalization Function
- 3.3. Applying the Functions to the Dataset

<br>

**3.1. HELPER FUNCTIONS FOR JSON PROCESSING**

In [39]:
# Define helper functions to process JSON strings
def extract_names_from_json(json_str, key="name"):
    try:
        data = json.loads(json_str)
        return [item[key] for item in data]
    except (TypeError, json.JSONDecodeError):
        return []

def extract_top_n_names_from_json(json_str, n=3, key="name"):
    try:
        data = json.loads(json_str)
        return [item[key] for item in data[:n]]
    except (TypeError, json.JSONDecodeError):
        return []

def extract_director(json_str, key="job", value="Director", name_key="name"):
    try:
        data = json.loads(json_str)
        for item in data:
            if item.get(key) == value:
                return item.get(name_key, "")
        return ""
    except (TypeError, json.JSONDecodeError):
        return ""


**3.2. TEXT NORMALIZATION FUNCTION**

In [40]:
def remove_spaces(text):
    if isinstance(text, str):
        return text.replace(" ", "")
    return text

**3.3. APPLYING THE FUNCTIONS TO THE DATASET**

In [41]:
# Apply processing functions to clean data
df["genres"] = df["genres"].apply(lambda x: " ".join(extract_names_from_json(x)))
df["keywords"] = df["keywords"].apply(lambda x: " ".join(extract_names_from_json(x)))
df["cast"] = df["cast"].apply(lambda x: " ".join(extract_top_n_names_from_json(x, n=3)))
df["crew"] = df["crew"].apply(lambda x: extract_director(x))
df["overview"] = df["overview"].apply(lambda x: x.split() if isinstance(x, str) else [])

# Remove spaces from processed columns
df["genres"] = df["genres"].apply(remove_spaces)
df["keywords"] = df["keywords"].apply(remove_spaces)
df["cast"] = df["cast"].apply(remove_spaces)
df["crew"] = df["crew"].apply(remove_spaces)


<br>

## **STEP 4: FEATURE ENGINEERING**

In [42]:
# Combine relevant columns into a single 'tags' column with a clear delimiter
df["tags"] = df.apply(lambda row: "|".join(map(str, [
    row["genres"], row["keywords"], row["cast"], row["crew"], " ".join(row["overview"])
])), axis=1)


df=df[["movie_id","title","tags"]]

In [43]:
df.head ()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,ActionAdventureFantasyScienceFiction|culturecl...
1,285,Pirates of the Caribbean: At World's End,AdventureFantasyAction|oceandrugabuseexoticisl...
2,206647,Spectre,ActionAdventureCrime|spybasedonnovelsecretagen...
3,49026,The Dark Knight Rises,ActionCrimeDramaThriller|dccomicscrimefightert...
4,49529,John Carter,ActionAdventureScienceFiction|basedonnovelmars...


<br>

<br>

## **STEP 5: MODEL DEVELOPMENT**


<br>

**5.1. VECTORIZE THE **`tags`** COLUMN**

In [44]:
# Vectorize the 'tags' column
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(token_pattern=r'\b\w+\b', lowercase=True)



In [45]:
matrix = vect.fit_transform(df['tags'])
knn_model = NearestNeighbors(n_neighbors=5, algorithm='brute', metric='cosine')
knn_model.fit(matrix)

<br>

<br>

## **STEP 6: RECOMMENDATION SYSTEM IMPLEMENTATION**


In [46]:

def get_movie_recommendations(movie_title):
    movie_index = df[df["title"] == movie_title].index[0]
    distances, indices = knn_model.kneighbors(matrix[movie_index])
    # Guardamos la distancia, pero la excluimos del output
    similar_movies = [(df["title"][i], distances[0][j]) for j, i in enumerate(indices[0])]
    return similar_movies[1:]

input_movie_title = 'Avatar'
recommended_movies = get_movie_recommendations(input_movie_title)

# Mostrar solo los títulos de las películas en la salida
print(f"Recommended movies for: {input_movie_title}")
for movie in recommended_movies:
    print(movie[0])  # Solo mostramos el título de la película


Recommended movies for: Avatar
Lone Wolf McQuade
Tears of the Sun
The American
The Inhabited Island


<br>

<br>

## **STEP 7: MODEL SAVING**

**7.1. SAVE THE VECTORIZER**

In [47]:
# Save the vectorizer
with open("vectorizer.pkl", "wb") as file:
    pickle.dump(vect, file)

In [48]:
# Save the similarity matrix
with open("knn_model.pkl", "wb") as file:
    pickle.dump(knn_model, file)

print("Model saved successfully!")

Model saved successfully!


In [49]:
df.to_csv('base_movies.csv', index=False)

<br>

<br>

<br>

# **STEP 8: CONCLUSION**


**Solution Developed**:
   - **Data Collection**: We used two datasets (`tmdb_5000_movies.csv` and `tmdb_5000_credits.csv`) and merged them to combine relevant information.
   - **Data Processing**: JSON-formatted data was processed to extract meaningful attributes (e.g., genres, top 3 cast members, director) and normalized for consistency.
   - **Feature Engineering**: A consolidated `tags` column was created to summarize all key attributes of each movie into a single feature for comparison.
   - **Model Development**:
     - A K-Nearest Neighbors (KNN) model was trained on the processed data to compute similarities between movies based on their extracted features.
   - **Recommendation System Implementation**:
     - A function was built to use the KNN model and recommend the top 5 movies most similar to the input movie based on their feature vectors.
   - **Model Saving**:
     - The trained KNN model was stored as a `.pkl` file for future use.
     - The processed movie dataset was saved as a `.csv` file to ensure reproducibility and facilitate future analysis.


<br>

<br>


<br>
