## Movie Recommendation using Collaborative Filtering

### What's collaborative filtering?

Collaborative filtering is where algorithms attempt to make predictions about users interests by corraborating interests of other users. 

### What's a collaborative filtering recommendation system?

A collaborative filtering recommendation system makes predictions of what might interest a person based on the taste of many other users. It assumes that if person X likes comedy movies, and person Y likes comedy movies and family movies, then person X might like family movies as well. There are two types of this recommender filter:

**User-User**: It identifies other people with similar tastes to a target user and combines their ratings to make recommendations for that user.

**Item-Item**: It identifies global product associations from items, but uses these product associations to provide personalized recommendations based on a user's own product ratings

In this notebook, I will attempt to implement an **item-based** and **user-based** collaborative recommender and evaluate it using **Root Mean Square Error** (RSME) to see how well it performs. 

In [None]:
import numpy as np
import pandas as pd
from math import sqrt
from sklearn.metrics import mean_squared_error
from sklearn.metrics import pairwise_distances
from sklearn.neighbors import NearestNeighbors

You can download this dataset at https://grouplens.org/datasets/movielens/

In [None]:
ratings = pd.read_csv('ratings.csv')# read in ratings dataset
movies = pd.read_csv('movies.csv')# read in movies dataset

In [None]:
ratings.shape

In [None]:
movies.shape

In [None]:
ratings.head()

In [None]:
movies.head()

In [None]:
new_df = pd.merge(ratings,movies,on = 'movieId').drop(['timestamp','genres'], axis = 1)

In [None]:
new_df.shape

In [None]:
new_df.head()

In [None]:
missing_pivot = new_df.pivot_table(values = 'rating', index = 'userId', columns = 'title')

In [None]:
missing_pivot.head()

Let's identify the movies each user have rated. 

In [None]:
rate = {}
rows_indexes = {}
for i,row in missing_pivot.iterrows():
    rows = [x for x in range(0,len(missing_pivot.columns))]
    combine = list(zip(row.index, row.values, rows))
    rated = [(x,z) for x,y,z in combine if str(y) != 'nan']
    index = [i[1] for i in rated]
    row_names = [i[0] for i in rated]
    rows_indexes[i] = index
    rate[i] = row_names

These are their rated movies. 

In [None]:
rate[1]

Here's the user-item matrix:

In [None]:
pivot_table = new_df.pivot_table(values = 'rating', index = 'userId', columns = 'title').fillna(0)

In [None]:
pivot_table = pivot_table.apply(np.sign)

In [None]:
pivot_table.head()

Movies not rated by user.

In [None]:
notrated = {}
notrated_indexes = {}
for i,row in pivot_table.iterrows():
    rows = [x for x in range(0,len(missing_pivot.columns))]
    combine = list(zip(row.index, row.values, row))
    idx_row = [(idx,col) for idx, val, col in combine if not val > 0]
    indices = [i[1] for i in idx_row]
    row_names = [i[0] for i in idx_row]
    notrated_indexes[i] = indices
    notrated[i] = row_names

In [None]:
notrated

##  Unsupervised Nearest Neighbor Recommender

In [None]:
n = 5
item_cosine_nn = NearestNeighbors(n_neighbors = n,algorithm = 'brute', metric = 'correlation')
item_cosine_nn_fit = item_cosine_nn.fit(pivot_table.T.values)
item_distances, item_indices = item_cosine_nn_fit.kneighbors(pivot_table.T.values)

### Item-Based Recommender

In [None]:
items_dic = {}
for i in range(len(pivot_table.T.index)):
    item_idx = item_indices[i]
    col_names = pivot_table.T.index[item_idx].tolist()
    items_dic[pivot_table.T.index[i]] = col_names

In [None]:
topRecs = {}
for k,v in rows_indexes.items():
    item_idx = [j for i in item_indices[v] for j in i]
    item_dist = [j for i in item_distances[v] for j in i]
    combine = list(zip(item_idx,item_dist))
    unique = {i:d for i,d in combine if i not in v}
    sort = sorted(unique.items(), key = lambda x: x[1])
    col_names = [(pivot_table.columns[i[0]],i[1]) for i in sort]
    topRecs[k] = col_names

In [None]:
def getrecommendations(user, number_of_recs = 30):
    if user > len(pivot_table.index):
        print('Out of range, there are only {} users, try again!'.format(len(pivot_table.index)))
    else:
        
        print("These are all the movies you have viewed view in the past: \n\n{}".format('\n'.join(rate[user])))
        print()
        print("We recommend to view these movies too:\n")
    for k,v in topRecs.items():
        if user == k:
            for i in v[:number_of_recs]:
                print('{} with similarity: {:.4f}'.format(i[0], 1 - i[1]))

The top recommendations:

In [None]:
getrecommendations(1)

## **These recommendations make sense to me!**

### User-Based Recommender

In [None]:
user_cosine_nn = NearestNeighbors(n_neighbors = n,algorithm = 'brute', metric = 'correlation')
user_cosine_nn_fit = user_cosine_nn.fit(pivot_table.values)
user_distances, user_indices = user_cosine_nn_fit.kneighbors(pivot_table.values)

In [None]:
users = {}
for i in range(len(pivot_table.index)):
    user_dist = user_distances[i]
    user_idx = user_indices[i]
    users[i+1] = np.array(pivot_table.index)[user_idx].tolist()

In [None]:
newTop = {}
for k,v in topRecs.items():
    newTop[k] = [i[0] for i in v]

In [None]:
userRecs = {}
for k,v in users.items():
    users_item = {i:val[:20] for i in v[1:] for key, val in newTop.items() if i == key}
    userRecs[k] = users_item

In [None]:
userRecs

### Let's make predictions for the movies!

In [None]:
item_distances = 1 - item_distances

In [None]:
item_predictions = item_distances.T.dot(pivot_table.T.values) / np.array([np.abs(item_distances.T).sum(axis = 1)]).T

In [None]:
item_ground_truth = pivot_table.T.values[item_indices[0]]

### Let's make predictions for the users!

In [None]:
user_distances = 1 - user_distances

In [None]:
user_predictions = user_distances.T.dot(pivot_table.values)/ np.array([np.abs(user_distances.T).sum(axis = 1)]).T

In [None]:
user_ground_truth = pivot_table.values[user_indices[0]]

## Evaluating the recommender's predictions

In [None]:
def rmse(prediction, ground_truth):
    prediction = prediction[ground_truth.nonzero()].flatten() 
    ground_truth = ground_truth[ground_truth.nonzero()].flatten()
    return sqrt(mean_squared_error(prediction, ground_truth))

### Evaluation of the Item-based recommender

In [None]:
item_error_rate = rmse(item_predictions,item_ground_truth)
print("Accuracy: {:.3f}".format(100 - item_error_rate))
print("RMSE: {:.5f}".format(item_error_rate))

### Evaluation of the User-based recommender

In [None]:
user_error_rate = rmse(user_predictions,user_ground_truth)
print("Accuracy: {:.3f}".format(100 - user_error_rate))
print("RMSE: {:.5f}".format(user_error_rate))

Enjoy!