# ABOUT PROJECT
This project aims to provide movie and TV show recommendations based on user input using a dataset obtained from Kaggle. The dataset contains information about various movies and TV shows available on Netflix as of 2021. By leveraging natural language processing techniques and cosine similarity, the project recommends similar titles based on user queries.
# ABOUT DATASET
The dataset used in this project was obtained from Kaggle and consists of 5968 records and 13 columns. It includes information such as show ID, director, movie title, production country, release date, genre, cast, rating, IMDb score, duration, and date added to Netflix. The dataset was collected from Flixable, a third-party Netflix search engine.
# Concepts Used in the Code
- Data Cleaning: Removing unnecessary characters, URLs, and punctuation, converting text to lowercase, and stemming.
- Mapping to Dictionary: Creating a mapping between cleaned titles and original titles to decode recommendations.
- Vectorization: Converting text data into numerical representations using TF-IDF vectorization.
- Cosine Similarity: Calculating the similarity between vectors to find similar items.
- Nearest Neighbors Algorithm: Finding the nearest neighbors to a given item in a high-dimensional space.
- Pandas DataFrame: Organizing and manipulating tabular data.


# IMPORTING LIBRARIES

In [29]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.neighbors import NearestNeighbors
import nltk
import re
from nltk.stem import SnowballStemmer

# Download stopwords and initialize stemmer

In [30]:
nltk.download('stopwords')
stemmer = SnowballStemmer("english")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Load stopwords

In [31]:
stopwords = set(nltk.corpus.stopwords.words("english"))

# Data Cleaning

In [32]:
def clean_text(text):
    text = str(text).lower()
    text = re.sub(r'\[.*?\]', '', text)
    text = re.sub(r'https?://\S+|www\.\S+', '', text)
    text = re.sub(r'<.*?>+', '', text)
    text = re.sub(r'\n', '', text)
    text = re.sub(r'\w*\d\w*', '', text)
    text = [stemmer.stem(word) for word in text.split() if word.lower() not in stopwords] # Stemming here
    return ' '.join(text)

# IMPORTING DATASET

In [33]:
data = pd.read_csv("netflixData.csv")
data = data[['Title', 'Description', 'Content Type', 'Genres']]

# Apply data cleaning

In [34]:
data['Cleaned_Title'] = data['Title'].apply(clean_text)

# Remove NaN/Null values

In [35]:
data = data.dropna()

# Creating a dictionary to map cleaned titles to original titles.

In [36]:
title_mapping = dict(zip(data['Cleaned_Title'], data['Title']))

# Optimize Vectorization

In [37]:
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(data['Genres'])


# Improve Similarity Calculation (cosine similarity)

In [38]:
similarity = cosine_similarity(tfidf_matrix)

# Using Nearest Neighbors Algorithm

In [39]:
nn_model = NearestNeighbors(n_neighbors=10, algorithm='auto', metric='cosine')
nn_model.fit(tfidf_matrix)

# Defining indices

In [40]:
indices = pd.Series(data.index, index=data['Cleaned_Title']).drop_duplicates()

# Recommendation Function

In [48]:
def netflix_recommendation(title, model=nn_model, indices=indices):
    title_cleaned = clean_text(title)
    try:
        title_index = indices[title_cleaned]
    except KeyError:
        return "No recommendations found for this title."
    _, indices = model.kneighbors(tfidf_matrix[title_index])
    recommendations = data.iloc[indices[0]]['Cleaned_Title'].map(title_mapping)
    return recommendations

# Testing the recommendation function

In [49]:
print(netflix_recommendation("Legally blond"))

5803           When We First Met
3343                       Naked
1716       Friends with Benefits
4978             The Last Summer
4852          The Girl Next Door
3071    Midnight at the Magnolia
312           Always Be My Maybe
1869               Good on Paper
4950         The Kissing Booth 2
5329             The Wrong Missy
Name: Cleaned_Title, dtype: object


In [50]:
print(netflix_recommendation("Fast and Furious"))

No recommendations found for this title.


In [51]:
print(netflix_recommendation("Transformers"))

4760              The Death and Life of Marsha P. Johnson
1323                                           Disclosure
2409                    John Was Trying to Contact Aliens
144                                         A Secret Love
1611                  Feminists: What Were They Thinking?
619     Best Wishes, Warmest Regards: A Schitt's Creek...
2167                                          I Am Divine
5473                                          Transformer
1022                                      Circus of Books
3233       Mucho Mucho Amor: The Legend of Walter Mercado
Name: Cleaned_Title, dtype: object
