# Movie Recommendation System using Collaborative Filtering Model


### Step 1: Download the MovieLens Dataset

In [None]:
# Before you start the project -  downaload the MovieLens dataset "https://grouplens.org/datasets/movielens/"
# Download the latest version of ML Dataset: "ml-latest-small"

### Step 2: Install all the necessary Libraries

In [None]:
!pip install surprise
!pip install pandas
!pip install numpy
!pip install scikit-learn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Step 3: Import the necessary libraries

In [None]:
# import pandas for
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy
from collections import defaultdict

**Fun Fact about Pandas:** Pandas is a popular Python library used for ***data manipulation and analysis***. It provides high-performance, easy-to-use data structures and data analysis tools for handling structured data. Pandas was created in 2008 by Wes McKinney while he was working at AQR Capital Management. McKinney created Pandas to address the limitations of the tools available for data analysis and manipulation in Python at the time.

**Fun Fact about Numpy:** NumPy is a popular Python library used for ***scientific computing and numerical analysis***. It provides high-performance multidimensional array objects and tools for working with these arrays. NumPy was created in 2005 by Travis Oliphant, a Python developer and data scientist. Oliphant created NumPy to address the limitations of the tools available for scientific computing and numerical analysis in Python at the time.

**Fun Fact about Scikit-Learn:** Scikit-Learn, also known as sklearn, is a popular Python library used for ***machine learning tasks***. It provides a range of tools for data preprocessing, model selection, and model evaluation, as well as a variety of machine learning algorithms. Scikit-Learn was created in 2007 by David Cournapeau as part of the Google Summer of Code program. Since then, it has been developed and maintained by a team of developers.

**Fun Fact about Surprise:** Surprise is a Python library for ***building and analyzing recommender systems***. It was created in 2014 by Nicolas Hug.
Surprise is designed to provide easy and efficient implementation of collaborative filtering algorithms, which are commonly used for recommender systems. It offers a range of built-in algorithms for matrix factorization, neighborhood-based methods, and other recommendation approaches.

### Step 4: Load the Data


In [None]:
# Load the "movies.csv" and "ratings.csv" files into pandas dataframe
movies=pd.read_csv('/content/drive/MyDrive/ml-latest-small/movies.csv')
ratings=pd.read_csv('/content/drive/MyDrive/ml-latest-small/ratings.csv')

### Step 5: Preprocess the Data

In [None]:
# Surprise library expects the data in a specific format so we need to customise according to that.
# We create a Reader object that tells Surprise how to parse the data and then load the data into the Surprise dataset format.
reader = Reader(rating_scale=(0.5,5))
data_ratings=Dataset.load_from_df(ratings[['userId','movieId','rating']],reader)

### Step 6: Slit the Data into Train and Test

In [None]:
# We split the dataset into training and testing
trainset_ratings, testset_ratings = train_test_split(data_ratings, test_size=0.2)

### Step 7: Train for Collaborative Filtering Model

In [None]:
# We will now train the SVD algorithm on the training dataset.
algo=SVD()
algo.fit(trainset_ratings)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7feb4604d4e0>

***Why are we using SVD:*** SVD (Singular Value Decomposition) is a linear algebra algorithm commonly used in recommendation systems for collaborative filtering. SVD is used to factorize a large user-item interaction matrix into smaller matrices of latent features that represent the relationships between users and items. The latent features are learned from the user-item interactions in the training data, and can be used to predict how a user would rate an item they have not yet interacted with. Overall, SVD is a widely used algorithm for collaborative filtering in recommendation systems due to its effectiveness, flexibility, and scalability.

### Step 8: Generate Movie Recommendations

In [None]:
# Once the model is trained - it can be used to generate movie recommendations for a specific user.

def fetch_unseen_movies(user_id):
  seen_movies = ratings[ratings['userId'] == user_id]['movieId'].tolist()
  all_movies = ratings['movieId'].unique().tolist()
  unseen_movies = [movie for movie in all_movies if movie not in seen_movies]
  return unseen_movies


# Lets try generating recommendation for a specific user

user_id=1
unseen_movies = fetch_unseen_movies(user_id)
testset_ratings = [[user_id, movie_id, 4.0] for movie_id in unseen_movies]
predictions = algo.test(testset_ratings)


# Lets try to get the top-N recommendations
def get_top_n_recommendations(predictions, n):
  top_n = defaultdict(list)
  for uid,iid,true_r,est, _ in predictions:
    top_n[uid].append((iid,est))
  for uid, ratings in top_n.items():
    ratings.sort(key=lambda x:x[1], reverse=True)
    top_n[uid]=ratings[:n]
  return top_n

n = 10
top_n=get_top_n_recommendations(predictions,n)

print(top_n)

defaultdict(<class 'list'>, {1: [(318, 5), (58559, 5), (912, 5), (1250, 5), (1252, 5), (1276, 5), (1221, 5), (38061, 5), (1921, 5), (308, 5)]})


### Step 9: Evaluate the model



In [None]:
# You can evaluate the performance of the model using RMSE (root mean square error) and MAE (mean absolute error).
# The RMSE (root mean squared error) measures the average deviation of the predicted ratings from the actual ratings, while the MAE (mean absolute error) measures the average absolute deviation of the predicted ratings from the actual ratings.
# Lower values for both metrics indicate better performance of the model.
rmse_score=accuracy.rmse(predictions)
mae_score=accuracy.mae(predictions)
percentage_accuracy = (1 - rmse_score) * 100
print(percentage_accuracy)

RMSE: 0.3023
MAE:  0.2467
69.76700383480015
