**First install SciKit Surprise.**

In [1]:
!pip install scikit-surprise

Collecting scikit-surprise
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m772.0/772.0 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp310-cp310-linux_x86_64.whl size=3162991 sha256=8058f342bf5e7700a6ddbd8ba59a6aadacedf28c587869cd364a8e4cb66eca67
  Stored in directory: /root/.cache/pip/wheels/a5/ca/a8/4e28def53797fdc4363ca4af740db15a9c2f1595ebc51fb445
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.3


In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

**Load the dataset zip file**

In [3]:
!wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip

--2024-04-26 09:30:53--  http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 978202 (955K) [application/zip]
Saving to: ‘ml-latest-small.zip’


2024-04-26 09:30:54 (1.49 MB/s) - ‘ml-latest-small.zip’ saved [978202/978202]



**Extract the dataset zip file.**

In [4]:
import zipfile
import os

In [5]:
zip_path = '/content/ml-latest-small.zip'
extracting_dir = '/content/'
os.makedirs(extracting_dir, exist_ok=True)
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extracting_dir)
extracted_files = os.listdir(extracting_dir)

**Load data from CSV files**

In [6]:
links = pd.read_csv('/content/ml-latest-small/links.csv')
movies = pd.read_csv('/content/ml-latest-small/movies.csv')
ratings = pd.read_csv('/content/ml-latest-small/ratings.csv')
tags = pd.read_csv('/content/ml-latest-small/tags.csv')

**Merge datasets**

In [7]:
merged_data = pd.merge(ratings, movies, on='movieId')

**Split data into training and testing sets**

In [8]:
train_data, test_data = train_test_split(merged_data, test_size=0.2, random_state=42)

**Load data into Surprise format**

In [9]:
reader = Reader(rating_scale=(1, 5))
train_dataset = Dataset.load_from_df(train_data[['userId', 'movieId', 'rating']], reader)
test_dataset = Dataset.load_from_df(test_data[['userId', 'movieId', 'rating']], reader)

**Choose and train a recommendation algorithm (e.g., collaborative filtering)
Here, let's use Singular Value Decomposition (SVD)**

In [10]:
algo = SVD()
trainset = train_dataset.build_full_trainset()
algo.fit(trainset)

cross_validate(algo, train_dataset, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8798  0.8849  0.8772  0.8822  0.8849  0.8818  0.0030  
MAE (testset)     0.6769  0.6788  0.6790  0.6830  0.6777  0.6791  0.0021  
Fit time          1.05    1.52    1.32    1.01    1.50    1.28    0.21    
Test time         0.08    0.27    0.11    0.19    0.18    0.17    0.06    


{'test_rmse': array([0.87979956, 0.88490107, 0.87721204, 0.88217657, 0.88488075]),
 'test_mae': array([0.67692143, 0.67878038, 0.67901692, 0.68298597, 0.67767743]),
 'fit_time': (1.0518600940704346,
  1.5150434970855713,
  1.3207404613494873,
  1.013169288635254,
  1.5045206546783447),
 'test_time': (0.08416318893432617,
  0.26516127586364746,
  0.1072089672088623,
  0.19318509101867676,
  0.1767585277557373)}

**In this example we want to recommend 10 movies to user number 249.**

**Get a list of all movie IDs**

In [11]:
all_movie_ids = merged_data['movieId'].unique()

**Predict ratings for all movies that the user hasn't rated yet.**

In [12]:
user_id = 249
user_ratings = train_data[train_data['userId'] == user_id]
user_rated_movie_ids = user_ratings['movieId'].values
unrated_movie_ids = [movie_id for movie_id in all_movie_ids if movie_id not in user_rated_movie_ids]

**Make predictions for unrated movie and sort them.**

In [13]:
predictions = [algo.predict(user_id, movie_id) for movie_id in unrated_movie_ids]
sorted_predictions = sorted(predictions, key=lambda x: x.est, reverse=True)

top_n_recommendations = [pred.iid for pred in sorted_predictions[:10]]
print("Top recommended movie IDs for user", user_id, ":", top_n_recommendations)

Top recommended movie IDs for user 249 : [318, 1204, 527, 898, 1283, 50, 904, 215, 1197, 5618]


*You can change the user_id anytime you want to recommend movies to another user.*

**Also using this code, you can print out movies' names instead of their IDs in the CSV files.**

In [14]:
movie_id_to_title = dict(zip(movies['movieId'], movies['title']))

top_n_recommendations_titles = [movie_id_to_title[movie_id] for movie_id in top_n_recommendations]

print("Top recommended movies for user", user_id, ":", top_n_recommendations_titles)

Top recommended movies for user 249 : ['Shawshank Redemption, The (1994)', 'Lawrence of Arabia (1962)', "Schindler's List (1993)", 'Philadelphia Story, The (1940)', 'High Noon (1952)', 'Usual Suspects, The (1995)', 'Rear Window (1954)', 'Before Sunrise (1995)', 'Princess Bride, The (1987)', 'Spirited Away (Sen to Chihiro no kamikakushi) (2001)']


*In this code, movie_id_to_title is a dictionary mapping movie IDs to their titles. After obtaining the top N recommended movie IDs, the corresponding titles are retrieved using this dictionary. Finally, the recommended movie titles are printed or used as desired.*