# Movie Recommender System

## Overview

This project develops a basic recommender system using collaborative filtering with the Python library `surprise`. The goal of the system is to recommend movies to users based on their past preferences and the preferences of other users with similar tastes.

## System Architecture

The recommender system follows these key steps:

1. **Data Loading and Preparation**:
    - The data used is the MovieLens 100K dataset which contains 100,000 movie ratings from users.
    - The dataset is split into training and testing sets to evaluate the model's performance.

2. **Model Building**:
    - The system utilizes the Singular Value Decomposition (SVD) algorithm, a popular method for collaborative filtering on movie rating data.
    - SVD is advantageous as it works well with sparse datasets and helps in identifying latent factors in the data.

3. **Model Evaluation**:
    - The model is evaluated using root mean square error (RMSE) to measure the accuracy of the predicted ratings.
    - Cross-validation is used to ensure the model's robustness and to prevent overfitting.

4. **Recommendations**:
    - The model predicts ratings for user-item pairs that are not present in the training set.
    - It also recommends a list of top N movies for each user based on the estimated ratings.

## Technologies Used

- **Python**: The primary programming language used.
- **Surprise**: A Python scikit for building and analyzing recommender systems that deal with explicit rating data.

## Setup and Installation

To run this project, you need to install the required Python libraries. If you are using Google Colab, you can install `surprise` by running the following code cell:

```python
!pip install scikit-surprise


In [2]:
!pip install scikit-surprise

Collecting scikit-surprise
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m772.0/772.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp310-cp310-linux_x86_64.whl size=3162994 sha256=249913ad4c3a7e4bf3bfeeb8c29e2b77a0a1411fd574afbf2265ae74be0d73d4
  Stored in directory: /root/.cache/pip/wheels/a5/ca/a8/4e28def53797fdc4363ca4af740db15a9c2f1595ebc51fb445
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.3


In [3]:
from surprise import SVD, Dataset, Reader, accuracy
from surprise.model_selection import cross_validate, train_test_split

# Step 1: Data Loading
# Load the movielens-100k dataset (small dataset)
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=0.25)

# Step 2: Build the Collaborative Filtering Model
model = SVD()

# Step 3: Train the model on the trainset and predict ratings for the testset
model.fit(trainset)
predictions = model.test(testset)

# Evaluate the model
accuracy.rmse(predictions)

# Step 4: Making Recommendations
# Predicting rating for a specific user and item
user_id = '196'  # raw user id
item_id = '302'  # raw item id
actual_rating = 4  # this is optional, just for comparison

# Predict rating
prediction = model.predict(user_id, item_id, r_ui=actual_rating, verbose=True)

# Making top N recommendations for a user
from collections import defaultdict

def get_top_n(predictions, n=10):
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the n highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

top_n = get_top_n(predictions, n=10)
# Print the recommended items for user '196'
print(top_n['196'])


Dataset ml-100k could not be found. Do you want to download it? [Y/n] Y
Trying to download dataset from https://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k
RMSE: 0.9341
user: 196        item: 302        r_ui = 4.00   est = 4.13   {'was_impossible': False}
[('8', 4.08871387884915), ('428', 3.916689368629527), ('393', 3.6876421471318106), ('66', 3.5124104587608866), ('381', 3.461086689138524), ('108', 3.2496598300989197), ('692', 3.249582094219165)]
