# Collaborative_Filtering_Movies_Tutorial

## Import packages

<b>NearestNeighbors</b> implements unsupervised nearest neighbors learning. 

The choice of neighbors search algorithm is controlled through the keyword <b>'algorithm'</b>, which must be one of ['auto', 'ball_tree', 'kd_tree', 'brute']

https://scikit-learn.org/stable/modules/neighbors.html


In [None]:
import numpy as np
import pandas as pd
from sklearn.neighbors import ? # Replace ? NearestNeighbors
from scipy.sparse import csr_matrix

## MovieLens data

MovieLens data sets were collected by the <b>GroupLens Research Project
at the University of Minnesota</b>.

This data set consists of:
* 100,000 ratings (1-5) from 943 users on 1682 movies.
* Each user has rated at least 20 movies.


The data was collected through the <b>MovieLens web site</b> (movielens.umn.edu) during the <b>seven-month period from September 19th,
1997 through April 22nd, 1998</b>. 

## Read the dataset and process it for next steps

<b>u.data</b> -- 

The full u data set, <b>100000 ratings by 943 users on 1682 movies </b>.
Each user has rated at least 20 movies. Users and items are numbered consecutively from 

1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp.

The time stamps are unix seconds since 1/1/1970 UTC 

In [None]:

header = ['user_id','item_id','rating','timestamp']
dataset = pd.read_csv('u.data',sep ='\t',names = header)

print(dataset.head())

# dataset.shape


## Data is transformed into the matrix format

In [None]:
# Obtain the count of users
n_users = dataset.user_id.unique().shape[0]

#print(n_users)
# Obtain the count of movies
n_items = dataset.item_id.unique().shape[0]

#print(n_items)

#n_items = dataset['item_id'].max()

# Create a matrix of size n_users X n_items
rating_matrix = np.?((n_users,n_items)) # Replace ? by zeros

# Fill each cell in the matrix with value of the corresponding user-movie rating from the dataset
for line in dataset.itertuples():
    rating_matrix[line[1]-1,line[2]-1] = line[3]

# Alternatively we can use a binary matrix, containing the information whether user rated a movie or not
'''
for line in dataset.itertuples():
    if line[3] >=3:
        rating_matrix[line[1]-1,line[2]-1] = 1
    else:
        rating_matrix[line[1]-1,line[2]-1] = 0
'''

# Check how the matrix looks after user-movie rating is populated
print("Original rating matrix : ")
print(?) # Replace ? by rating_matrix
# rating_matrix[195,241]


## Converting the dense rating matrix into a sparse matrix

In [None]:
# Using scipy.sparse.csr_matrix

rating_sparse = csr_matrix(rating_matrix)
print(rating_sparse)

## Compute item-based similarity

In [None]:
# Finding nearest neighbours based on cosine similarity distance

knn = ?(metric='cosine', algorithm='brute', n_neighbors=3, n_jobs=-1) # Replace ? by NearestNeighbors
knn.fit(rating_sparse)

## Generate recommendations for an user based on items being liked
After, the similarity between items is computed, we can generate recommendations for the target user.



In [None]:


# We sort each user by descending order of rating  
dataset_sort_des = dataset.sort_values(['user_id', 'rating'], ascending=[True, False])

#print(dataset_sort_des)

# Now we get the movies being liked by an user 
target_user = ? # Replace ? by 1; lets check for the user having id 1

filter1 = dataset_sort_des[dataset_sort_des['user_id'] == target_user].item_id
#print(filter1)

filter1 = filter1.tolist()

# We select the top 5 movies liked by the target user
filter1 = filter1[0:?] # Replace ? by 5

# print the id of the top 5 movies
#print(filter1)




In [None]:
# my choice
#filter2 = [128,144,29,71,95]


In [None]:
# lets show corresponding movie name by fetching information from the movies dataset.

# Load the Excel data into a pandas DataFrame
movie_names_file = 'MoiveNames.csv'
df = pd.read_csv(movie_names_file)

# Create a dictionary mapping movie IDs to movie names
movie_id_to_name = dict(zip(df['ID'], df['Title']))

# Let's generate recommendations for target user based on 5 items being liked by the user. 
# Remmove the following multi-line comment (''' ...''') and execute
'''
print("Top Five Movies liked by user",target_user)
print()
print("Movie Ids: ",filter1)

print()
'''

# List of movie IDs for which you want to extract movie names
input_movie_ids = filter1

# input_movie_ids = filter2

# Extract movie names based on input movie IDs
extracted_movie_names = [movie_id_to_name[movie_id] for movie_id in input_movie_ids if movie_id in movie_id_to_name]

# Print the extracted movie names
print("Title:")
for movie_name in extracted_movie_names:
    print(movie_name)


In [None]:
# Next, for each item being liked by the target user, we recommend 2 similar movies. 

distances1=[]
indices1=[]

# for each movie liked by the target user i.e. the loop runs 5 times (implementing item-based filtering)
for i in filter1:  
  distances , indices = knn.kneighbors(rating_sparse[i],n_neighbors=3) 
  indices = indices.flatten()
  indices= indices[1:]
  indices1.extend(indices)

print("Top Ten Movies to be recommended to ",target_user)
print()
print("Movie Ids: ",indices1)

# List of movie IDs for which you want to extract movie names
input_movie_ids = indices1

# Extract movie names based on input movie IDs
extracted_movie_names = [movie_id_to_name[movie_id] for movie_id in input_movie_ids if movie_id in movie_id_to_name]

print()
# Print the extracted movie names
print("Title:")
for movie_name in extracted_movie_names:
    print(movie_name)


## References

1. https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset?resource=download
2. https://www.analyticsvidhya.com/blog/2021/05/item-based-collaborative-filtering-build-your-own-recommender-system/
3. https://www.analyticsvidhya.com/blog/2020/08/recommendation-system-k-nearest-neighbors/#5c17