# Netflix Recommendation System

<img src="https://variety.com/wp-content/uploads/2014/07/netflix-logo.jpg?w=1000&h=563&crop=1" />

## Introduction
Netflix is a subscription-based streaming platform that recommends movies and TV shows based on user interests. This notebook demonstrates how to build a recommendation system using Python, along with an introduction to deep learning techniques for enhanced recommendations.

## Dataset
The dataset used in this example is from Kaggle and contains information about movies and TV shows on Netflix as of 2021. 

## Libraries

In [4]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [5]:
data = pd.read_csv("netflixData.csv")

## EDA

In [9]:
data.head(), data.tail(), data.shape, data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5967 entries, 0 to 5966
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Show Id             5967 non-null   object 
 1   Title               5967 non-null   object 
 2   Description         5967 non-null   object 
 3   Director            3903 non-null   object 
 4   Genres              5967 non-null   object 
 5   Cast                5437 non-null   object 
 6   Production Country  5408 non-null   object 
 7   Release Date        5964 non-null   float64
 8   Rating              5963 non-null   object 
 9   Duration            5964 non-null   object 
 10  Imdb Score          5359 non-null   object 
 11  Content Type        5967 non-null   object 
 12  Date Added          4632 non-null   object 
dtypes: float64(1), object(12)
memory usage: 606.2+ KB


(                                Show Id                          Title  \
 0  cc1b6ed9-cf9e-4057-8303-34577fb54477                       (Un)Well   
 1  e2ef4e91-fb25-42ab-b485-be8e3b23dedb                         #Alive   
 2  b01b73b7-81f6-47a7-86d8-acb63080d525  #AnneFrank - Parallel Stories   
 3  b6611af0-f53c-4a08-9ffa-9716dc57eb9c                       #blackAF   
 4  7f2d4170-bab8-4d75-adc2-197f7124c070               #cats_the_mewvie   
 
                                          Description  \
 0  This docuseries takes a deep dive into the luc...   
 1  As a grisly virus rampages a city, a lone man ...   
 2  Through her diary, Anne Frank's story is retol...   
 3  Kenya Barris and his family navigate relations...   
 4  This pawesome documentary explores how our fel...   
 
                       Director  \
 0                          NaN   
 1                       Cho Il   
 2  Sabina Fedeli, Anna Migotto   
 3                          NaN   
 4             Michael Margol

In [8]:
# Drop null values and select relevant columns
data = data[["Title", "Description", "Genres"]].dropna()

# Clean the Title column
import re
import string

def clean(text):
    text = str(text).lower()
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    return text

data["Title"] = data["Title"].apply(clean)

## Deep Learning

In [10]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.preprocessing import LabelEncoder

In [11]:
# Encode user and item IDs
user_encoder = LabelEncoder()
item_encoder = LabelEncoder()

data['user_id'] = user_encoder.fit_transform(data['Title'])
data['item_id'] = item_encoder.fit_transform(data['Genres'])

# Prepare training data
X = data[['user_id', 'item_id']].values
y = np.random.rand(len(X))  # Placeholder for interaction scores

In [36]:
# Build the model
model = keras.Sequential([
    keras.layers.Embedding(input_dim=len(user_encoder.classes_), output_dim=50, input_length=1),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(1)
])

model.compile(optimizer='adam', loss='mean_squared_error')



In [22]:
model.fit(X, y, epochs=5)

Epoch 1/5
[1m187/187[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 970us/step - loss: 0.1539
Epoch 2/5
[1m187/187[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 889us/step - loss: 0.0530
Epoch 3/5
[1m187/187[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 966us/step - loss: 0.0173
Epoch 4/5
[1m187/187[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 891us/step - loss: 0.0072
Epoch 5/5
[1m187/187[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 968us/step - loss: 0.0052


<keras.src.callbacks.history.History at 0x2112dab0c90>

In [14]:
# Function to predict ratings
def predict_rating(user_id, item_id):
    return model.predict(np.array([[user_id, item_id]]))

In [32]:
data[["Title","user_id"]]

Unnamed: 0,Title,user_id
0,unwell,5548
1,alive,270
2,annefrank parallel stories,371
3,blackaf,690
4,catsthemewvie,898
...,...,...
5962,الف مبروك,5883
5963,دفعة القاهرة,5884
5964,海的儿子,5890
5965,반드시 잡는다,5891


In [40]:
predict_rating(user_encoder.transform(["alive"])[0], item_encoder.transform(["Horror Movies"])[0])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step


array([[0.01620382]], dtype=float32)