# Overview:

---


This project implements a movie recommendation system using data from Netflix TV shows and movies. It leverages Chroma for efficient vector storage and retrieval, along with Sentence Transformers for embedding text descriptions. Users can input queries about movie genres or themes, and the system recommends relevant movies based on similarity searches using embedded descriptions.

In [None]:
!pip install opendatasets sentence-transformers langchain langchain-community chromadb --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m525.5/525.5 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.1/92.1 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.4/5.4 MB[0m [31m42.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m31.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.4/58.4 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━

**Data Acquisition:**Downloaded the Netflix TV shows and movies dataset from Kaggle using opendatasets for building a movie recommendation system.

In [None]:
import opendatasets as od
od.download("https://www.kaggle.com/datasets/senapatirajesh/netflix-tv-shows-and-movies")

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: omaratef3221
Your Kaggle Key: ··········
Downloading netflix-tv-shows-and-movies.zip to ./netflix-tv-shows-and-movies


100%|██████████| 1.17M/1.17M [00:00<00:00, 118MB/s]







**Data Preparation and Embedding:**
Prepared the Netflix dataset by loading descriptions and embedding them using Sentence Transformers. Created a Chroma vector store for efficient similarity searches.


In [None]:
import pandas as pd
from langchain.document_loaders.csv_loader import CSVLoader
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import DataFrameLoader
from langchain.embeddings import HuggingFaceEmbeddings

In [None]:
data_df = pd.read_csv("netflix-tv-shows-and-movies/NetFlix.csv")[["title","description"]]
print(data_df.isna().sum())
print()
print(data_df.shape)

title          0
description    0
dtype: int64

(7787, 2)


In [None]:
Embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2",
                            model_kwargs={'device': 'cuda'})

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
REVIEWS_CHROMA_PATH = "chroma_data"

loader = DataFrameLoader(data_df, page_content_column="description")
descriptions = loader.load()

reviews_vector_db = Chroma.from_documents(
    descriptions, Embeddings, persist_directory=REVIEWS_CHROMA_PATH
)

**Movie Recommendation System:**
Implemented a movie recommendation system using Chroma vector similarity search based on user queries about movie genres or themes.

In [None]:
question = """
movie talking about romantic love story between couple fighting to live together
"""

relevant_docs = reviews_vector_db.similarity_search(question, k=3)

In [None]:
print("Most Suggested Movie:")
print()
title = relevant_docs[0].metadata["title"]
print("Movie Name: ", data_df[data_df["title"] == title]["title"].iloc[0])
print()
print("Movie Description: ", data_df[data_df["title"] == title]["description"].iloc[0])

print()
print()

print("Least Suggested Movie:")
print()
title = relevant_docs[-1].metadata["title"]
print("Movie Name: ", data_df[data_df["title"] == title]["title"].iloc[0])
print()
print("Movie Description: ", data_df[data_df["title"] == title]["description"].iloc[0])

Most Suggested Movie:

Movie Name:  All Good Ones Get Away

Movie Description:  When a mysterious figure blackmails an adulterous couple during a romantic getaway, their secret affair turns into a fight for survival.


Least Suggested Movie:

Movie Name:  Club Friday To Be Continued - Friend & Enemy

Movie Description:  A love triangle spirals out of control, wreaking havoc on a couple's relationship and a friendship between two women.


In [None]:
reviews_vector_db = Chroma(persist_directory=self.chroma_path, embedding_function=self.embeddings)