# **Netflix Recommendation System**

The algorithm used for recommendation system here is the collaborative filtering

In [1]:
import pandas as pd
import numpy as np

**Datasets**

* This is the data provided by netflix for a competition.
* In movies dataset we have movieid, name, and year it's released.
* The rating dataset contains the userid, movieid and the rating given by the user for that movie.

In [3]:
# Load the Rating and Movies CSV files into a DataFrames
movie_df = pd.read_csv("Netflix_Dataset_Movie.csv")
rating_df = pd.read_csv("Netflix_Dataset_Rating.csv")

rating_df=rating_df[:20000]

In [4]:
# Display the first few rows of the DataFrame to verify the data
movie_df.head()

Unnamed: 0,Movie_ID,Year,Name
0,1,2003,Dinosaur Planet
1,2,2004,Isle of Man TT 2004 Review
2,3,1997,Character
3,4,1994,Paula Abdul's Get Up & Dance
4,5,2004,The Rise and Fall of ECW


In [5]:
rating_df.head()

Unnamed: 0,User_ID,Rating,Movie_ID
0,6,3,30
1,6,4,173
2,6,5,175
3,6,2,191
4,6,3,197


In [6]:
# Check for null values
print(movie_df.isnull().sum())
print(rating_df.isnull().sum())

Movie_ID    0
Year        0
Name        0
dtype: int64
User_ID     0
Rating      0
Movie_ID    0
dtype: int64




*   Creating a user item matrix, where rows are users, columns are movies and the values are ratings.
*   Replace all the NaNs with 0



In [7]:
user_item_matrix = rating_df.pivot(index='User_ID', columns='Movie_ID', values='Rating')

In [8]:
user_item_matrix=user_item_matrix.fillna(0)



*   Calculating the similarity between the users using cosine similarity.




In [9]:
from sklearn.metrics.pairwise import cosine_similarity

# Calculate cosine similarity between users
user_similarity_matrix = cosine_similarity(user_item_matrix)



*   For all the users find the k similar users



In [11]:
k_neighbors = 10

# Find the top-k most similar users for each user
top_k_neighbors = np.argsort(user_similarity_matrix, axis=1)[:, -k_neighbors:]
#top_k_neighbors



*   Now predict the rating of movies which the user haven't rated yet.



In [12]:
predicted_ratings = np.zeros(user_item_matrix.shape)

# Iterate over each user
for user in range(user_item_matrix.shape[0]):
    similar_users = top_k_neighbors[user]

    # Iterate over each item
    for item in range(user_item_matrix.shape[1]):

        # Check if the user has not interacted with the item
        if user_item_matrix.iloc[user, item] == 0:
            # Predict the rating for the user-item pair
            numerator = np.sum(user_similarity_matrix[user, similar_users] * user_item_matrix.iloc[similar_users, item])
            denominator = np.sum(np.abs(user_similarity_matrix[user, similar_users]))

            # Avoid division by zero
            predicted_ratings[user, item] = numerator / (denominator + 1e-10)



*   In the recommendation function:


    *   Get the indices of items the user has not interacted with
    *   Sort unrated items based on predicted ratings in descending order
    *   Recommend the top N items
    





In [13]:
def recommendations(user_index,num_recommendations=5):
  unrated_items = np.where(user_item_matrix.iloc[user_index] == 0)[0]
  sorted_items = np.argsort(predicted_ratings[user_index, unrated_items])[::-1]
  N_recommendations = min(num_recommendations, len(sorted_items))
  recommended_item_indices = unrated_items[sorted_items[:N_recommendations]]

  return recommended_item_indices


In [16]:
user_index=424
recommended_item_indices=recommendations(user_index)
recommended_movies=names_from_array = movie_df[movie_df['Movie_ID'].isin(recommended_item_indices)]['Name'].tolist()
print("Recommended Items for User {}: {}".format(user_index, recommended_movies))

Recommended Items for User 424: ['Daydream Obsession', 'Jonah: A VeggieTales Movie: Bonus Material', "We're Not Married", 'The Killing', 'Record of Lodoss War: Chronicles of the Heroic Knight']
