# Movie Recommendation System.


Making a recommendationn nsystem based on collabrative filtering.

Dataset used : "ml-latest-small.zip" from MovieLens https://grouplens.org/datasets/movielens/

Since the system collabrative fiiltering based we are using "Suprise" library which has collaborative filtering algorithms.

Python Surprise library is a powerful tool designed to simplify the process of building and evaluating recommendation systems. It focuses on collaborative filtering, a popular technique in recommendation systems that predicts a user's interests by collecting preferences from many users (collaborators). Surprise provides a convenient and user-friendly API for implementing and testing various collaborative filtering algorithms.

Key features of the Surprise library include:

Algorithm Implementations: Surprise offers a collection of built-in collaborative filtering algorithms, including matrix factorization-based methods like Singular Value Decomposition (SVD), neighborhood-based methods like k-Nearest Neighbors (k-NN), and more.

Data Handling: It handles data loading, preprocessing, and splitting into training and testing sets, making it easier to work with recommendation datasets.

Prediction and Evaluation: Surprise provides methods to predict user preferences for items and evaluate the performance of algorithms using metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and others.

Cross-Validation: The library supports various techniques for cross-validation, helping you assess the generalization performance of your recommendation models.

Hyperparameter Tuning: Surprise allows you to search for optimal hyperparameters of algorithms using techniques like GridSearchCV and RandomizedSearchCV.

Extensibility: You can implement custom algorithms and evaluation metrics using the Surprise framework.

In [2]:
import pandas as pd
from surprise import Reader, Dataset, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

In [54]:
# Load ratings data
ratings_data = pd.read_csv('./Dataset/ml-latest-small/ratings.csv')

# Create a Reader object to define the rating scale
reader = Reader(rating_scale=(1, 5))

# Load the data into a Surprise Dataset object
data = Dataset.load_from_df(ratings_data[['userId', 'movieId', 'rating']], reader)

trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

# Create and train the model (SVD in this case)
model = SVD()
model.fit(trainset)

# Make predictions on the test set
predictions = model.test(testset)

# Calculate RMSE (Root Mean Squared Error)
rmse = accuracy.rmse(predictions)
print('RMSE:', rmse)

# Load movies data
movies_data = pd.read_csv('./Dataset/ml-latest-small/movies.csv')

# Get recommendations for a specific user
user_id = 6
user_items = ratings_data[ratings_data['userId'] == user_id]['movieId']
rated_movies = ratings_data[ratings_data['movieId'].isin(user_items)]['movieId']
unrated_movies = ratings_data[~ratings_data['movieId'].isin(user_items)]['movieId']

# Create a list of tuples for unrated movies and their predicted ratings
user_predictions = [(movie_id, model.predict(user_id, movie_id).est) for movie_id in unrated_movies]

# Sort predictions in descending order
user_predictions.sort(key=lambda x: x[1], reverse=True)

# Display the top N recommendations
top_n = 5
for movie_id, predicted_rating in user_predictions[:top_n]:
    movie_title = movies_data[movies_data['movieId'] == movie_id]['title'].iloc[0]
    print(f"Movie: {movie_title}, Predicted Rating: {predicted_rating:.2f}")



RMSE: 0.8801
RMSE: 0.8800949040930861
Movie: Rosemary's Baby (1968), Predicted Rating: 4.65
Movie: Rosemary's Baby (1968), Predicted Rating: 4.65
Movie: Rosemary's Baby (1968), Predicted Rating: 4.65
Movie: Rosemary's Baby (1968), Predicted Rating: 4.65
Movie: Rosemary's Baby (1968), Predicted Rating: 4.65


In [81]:
# Load the CSV file into a DataFrame
df = ratings_data
print("Struct of rating.csv")
print(df.head())

# Specify the userId you're interested in
target_user_id = 140

# Filter data for the specific userId
print(f"The user data of user_id: {target_user_id}")
user_data = df[df['userId'] == target_user_id]
print(user_data)

# Extract movieId values for the specific user
print(f"Extracting movieId values for the user_id: {target_user_id}")
movie_ids = user_data['movieId'].tolist()
movie_ids

# Get all unique movieId values
print(f"Getting all movieIds from movies.csv file:")
movies_df = pd.read_csv('./Dataset/ml-latest-small/movies.csv')
all_movie_ids = movies_df['movieId'].unique()
#movie_ids

Struct of rating.csv
   userId  movieId  rating  timestamp
0       1        1     4.0  964982703
1       1        3     4.0  964981247
2       1        6     4.0  964982224
3       1       47     5.0  964983815
4       1       50     5.0  964982931
The user data of user_id: 140
       userId  movieId  rating   timestamp
21083     140        1     3.0   942924980
21084     140        2     3.5  1085569813
21085     140        6     5.0   942843185
21086     140       11     4.0   949667337
21087     140       21     4.0   949666898
...       ...      ...     ...         ...
21686     140    49524     3.5  1180627544
21687     140    49530     4.0  1172255978
21688     140    49910     2.5  1180627448
21689     140    50068     3.5  1172255946
21690     140    53972     3.0  1186077792

[608 rows x 4 columns]
Extracting movieId values for the user_id: 140


[1,
 2,
 6,
 11,
 21,
 22,
 23,
 34,
 47,
 50,
 62,
 86,
 95,
 104,
 105,
 110,
 146,
 150,
 151,
 161,
 163,
 164,
 165,
 185,
 193,
 208,
 210,
 246,
 257,
 260,
 261,
 266,
 277,
 288,
 292,
 293,
 296,
 303,
 317,
 318,
 338,
 339,
 349,
 350,
 353,
 356,
 362,
 364,
 368,
 373,
 376,
 377,
 380,
 383,
 422,
 434,
 440,
 454,
 457,
 458,
 474,
 480,
 500,
 524,
 527,
 529,
 539,
 541,
 553,
 569,
 587,
 588,
 590,
 593,
 595,
 597,
 599,
 608,
 609,
 647,
 648,
 691,
 707,
 733,
 736,
 780,
 798,
 800,
 805,
 832,
 837,
 848,
 852,
 858,
 866,
 912,
 914,
 919,
 920,
 924,
 953,
 969,
 986,
 1005,
 1010,
 1013,
 1015,
 1017,
 1019,
 1021,
 1028,
 1035,
 1036,
 1047,
 1073,
 1083,
 1084,
 1089,
 1090,
 1092,
 1094,
 1095,
 1097,
 1101,
 1127,
 1179,
 1183,
 1193,
 1196,
 1198,
 1200,
 1201,
 1204,
 1210,
 1213,
 1214,
 1221,
 1230,
 1231,
 1233,
 1234,
 1240,
 1242,
 1246,
 1250,
 1252,
 1262,
 1265,
 1266,
 1270,
 1272,
 1276,
 1291,
 1302,
 1304,
 1307,
 1343,
 1344,
 1370,
 1387,

In [71]:
unrated_movie_ids = [movie_id for movie_id in all_movie_ids if movie_id not in movie_ids]
unrated_movie_ids

[3,
 4,
 5,
 7,
 8,
 9,
 10,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 36,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 48,
 49,
 52,
 53,
 54,
 55,
 57,
 58,
 60,
 61,
 63,
 64,
 65,
 66,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 85,
 87,
 88,
 89,
 92,
 93,
 94,
 96,
 97,
 99,
 100,
 101,
 102,
 103,
 106,
 107,
 108,
 111,
 112,
 113,
 116,
 117,
 118,
 119,
 121,
 122,
 123,
 125,
 126,
 128,
 129,
 132,
 135,
 137,
 140,
 141,
 144,
 145,
 147,
 148,
 149,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 162,
 166,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 183,
 184,
 186,
 187,
 188,
 189,
 190,
 191,
 194,
 195,
 196,
 198,
 199,
 201,
 202,
 203,
 204,
 205,
 206,
 207,
 209,
 211,
 212,
 213,
 214,
 215,
 216,
 217,
 218,
 219,
 220,
 222,
 223,
 224,
 225,
 227,
 228,
 229,
 230,
 231,
 232,
 233,
 234,
 235,
 236,
 237,
 238,
 239,
 240,
 