## Collaborative Filtering

To address some of the limitations of content-based filtering, collaborative filtering uses similarities between users and items simultaneously to provide recommendations. This allows for serendipitous recommendations; that is, collaborative filtering models can recommend an item to user A based on the interests of a similar user B. Furthermore, the embeddings can be learned automatically, without relying on hand-engineering of features.

### A Movie Recommendation Example
Consider a movie recommendation system in which the training data consists of a feedback matrix in which:

Each row represents a user.
Each column represents an item (a movie).
The feedback about movies falls into one of two categories:
Users specify how much they liked a particular movie by providing a numerical rating.
When a user visits the homepage, the system should recommend movies based on both:
similarity to movies the user has liked in the past (item-item filtering) and movies that similar users liked (usser-item filtering)

In the below example we have 610 users and 9724 movies


In [None]:
#import necessary libraries

import pandas as pd
import numpy as np
from IPython.display import display
from sklearn.metrics.pairwise import cosine_similarity

__author__ = "Kanchan Pandhare"
__email__ = "kanchan.pandhare08@gmail.com"


movies = pd.read_csv("movies.csv")
ratings = pd.read_csv("ratings.csv")

#Combine movies and ratings to a single dataframe movie_ratings
movie_ratings  = pd.merge(movies,ratings,)
display(movie_ratings.head())



## Item-Item Similarity

Item-item collaborative filtering, or item-based is a form of collaborative filtering for recommender systems based on the similarity between items calculated using people's ratings of those items.

### Movie Matrix
For Item-Item Similarity we have movie Ids in rows and user Ids in columns. This will form a matrix of (no_of_movies) x (no_of_users)



In [None]:
movie_matrix = movie_ratings.pivot(index = 'movieId', columns = 'userId', values = 'rating').fillna(0)
display(movie_matrix)


In [None]:
movie_similarity =  cosine_similarity(movie_matrix)
np.fill_diagonal(movie_similarity,0)
ratings_matrix_items = pd.DataFrame( movie_similarity )
display(ratings_matrix_items)


### Similar Movies Function 
Here the function takes movie name as input and finds the movie id from the movie name. Later it finds the similar movies associated with the movie name and returns the movies with the similarity score as output

### Recommend Movies As per Item Similarity
In this function we take user id for user A as input and find the movies watched by the user A and having the highest rating to get the movies liked by user A and pass these movies to the above similarMovies function to get the similar movies and sort it in descending order to get the most similar movies in sorted_movies_as_per_userChoice. We also have user2movies to get the movies already watched by user A. Based on this we recommend the movies in sorted_movies_as_per_userChoice but also at the same time we check if they are not already watched by the user A

In [None]:

def similarMovies(movieName): 
    """
    recomendates similar movies
   :param data: name of the movie 
   """
    try:
        #user_inp=input('Enter the reference movie title based on which recommendations are to be made: ')
        inp= movies[movies['title']==movieName].index.tolist()
        movies['similarity'] = ratings_matrix_items.iloc[inp[0]]
        movies.columns = ['movie_id', 'title', 'release_date','similarity']
    except:
        print("Sorry, the movie is not in the database!")

def recommendMoviesAsperItemSimilarity(user_id):
    """
     Recommending movie which user hasn't watched as per Item Similarity
    :param user_id: user_id to whom movie needs to be recommended
    :return: movieIds to user 
    """
    user_movie= movie_ratings[(movie_ratings.userId==user_id) & movie_ratings.rating.isin([5,4.5,4])][['title']]
    user_movie=user_movie.iloc[0,0]
    similarMovies(user_movie)
    sorted_movies_as_per_userChoice= movies.sort_values( ["similarity"], ascending = False )

    #print(sorted_movies_as_per_userChoice.head())
    sorted_movies_as_per_userChoice=sorted_movies_as_per_userChoice[sorted_movies_as_per_userChoice['similarity'] >=0.25]['movie_id']
    recommended_movies=list()
    df_recommended_item=pd.DataFrame()
    user2Movies= ratings[ratings['userId']== user_id]['movieId']
    #print(user2Movies)
    best10 = []
    for movieId in sorted_movies_as_per_userChoice:
            if movieId not in user2Movies:
                df_new= ratings[(ratings.movieId==movieId)]
                df_recommended_item=pd.concat([df_recommended_item,df_new])
            best10=df_recommended_item.sort_values(["rating"], ascending = False )[1:11] 
    return best10['movieId']

def movieIdToTitle(listMovieIDs):
    """
     Converting movieId to titles
    :param user_id: List of movies
    :return: movie titles
    """
    movie_titles= []
    for id in listMovieIDs:
        movie_titles.append(movies[movies['movie_id']==id]['title'].to_string())
    return movie_titles

In [None]:
#Call the function to find the similar movies
movieIdToTitle(recommendMoviesAsperItemSimilarity(2))

## User-Item Similarity
The method identifies users that are similar to the queried user and estimate the desired rating to be the weighted average of the ratings of these similar users.

### User Matrix
For User-Item Similarity we have user Ids in rows and movies Ids in columns. This will form a matrix of (no_of_users} x (no_of_movies).




In [None]:
user_matrix  = movie_ratings.pivot(index = 'userId', columns = 'movieId', values = 'rating').fillna(0)
display(user_matrix)

Here I have used the cosine similarity to calculate the weight given in the above formula. This gives the similarity score of a user A with other users. The resulting matrix is of size (no_of_users) x (no_of_users)

We also set the diagonal elements to 0 so as to avoid the same movie to be shown in recommendation as every given movie will have the similarity score with the same movie as equal to 1


In [None]:
user_similarity =  cosine_similarity(user_matrix)
np.fill_diagonal(user_similarity,0)
ratings_matrix_users = pd.DataFrame( user_similarity )
display(ratings_matrix_users)



### User's Similarity
After having the cosine similarity matrix which gives the similarity score for each user against another user, we then find the most similar user to the input user by using idxmax which gives the user index for highest score against each user. The below output shows that, the similar user corresponding to user 0 is 265.

In [None]:
similar_users = ratings_matrix_users.idxmax(axis=1)
display(similar_users)


### Recommend Movies As per User Similarity
In this function we pass the user id as input and find the similar user corresponding to the user id of user A. After having the similar user we get the movies watched by the similar user and we also find the movies watched/rated by user A, so that we don't show the same movies in recommendation to the user. We sort the recommendation movies in descending order of the ratings so that the most rated movie is recommended first to the user



In [None]:
def recommendMoviesAsperUserSimilarity(user_id):
    """
     Recommending movie which user hasn't watched as per Item Similarity
    :param user_id: user_id to whom movie needs to be recommended
    :return: movieIds to user 
    """
    similar_user = similar_users[user_id]
    print("User "+str(user_id)+" is similar to User " +str(similar_user)+"\n")
    user2Movies= ratings[ratings['userId']== user_id]['movieId']
    df_recommended=pd.DataFrame(columns=['movieId','title','genres','userId','rating','timestamp'])
    #print(user2Movies)
    best10 = []
    similarUsersMovies = ratings[ratings['userId']== similar_user]['movieId']
    for movieId in similarUsersMovies:
            if movieId not in user2Movies:
                df_new= movie_ratings[(movie_ratings.movieId==movieId) & (movie_ratings.userId==similar_user)]
                df_recommended=pd.concat([df_recommended,df_new])
            best10=df_recommended.sort_values(["rating"], ascending = False )[1:11] 
    return best10['title']

In [None]:
print(recommendMoviesAsperUserSimilarity(3))