# Personalized Recommendation System

Building a recommendation system using the MovieLens dataset involves several steps. In this tutorial, I'll provide a detailed, step-by-step explanation of how to create a movie recommendation system using Python and the Surprise library.

The `surprise` library is a Python library specifically designed for building and evaluating recommendation systems. It provides a simplified and user-friendly interface for creating collaborative filtering-based recommendation models.

**Step 1: Dataset Preparation**

First, you need to download the MovieLens dataset (ml-latest-small) from the [MovieLens website](https://grouplens.org/datasets/movielens/latest/). Extract the dataset files, which include `ratings.csv`, `movies.csv`, `tags.csv`, and `links.csv`.

**Step 2: Installing Required Libraries**

Before we start building the recommendation system, you'll need to install the Surprise library, which simplifies the process of building recommendation models. Open your terminal or command prompt and run:

```bash
pip install scikit-surprise
```

**Step 3: Importing Libraries and Loading Data**

In your Python script or Jupyter Notebook, import the necessary libraries and load the MovieLens dataset:


In [19]:
import pandas as pd
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import train_test_split

# Load the data
ratings = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')


**Step 4: Data Preprocessing**

Now, we'll perform some data preprocessing to prepare it for building the recommendation system.


In [20]:
# Create a Surprise Dataset
reader = Reader(rating_scale=(0.5, 5))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.25)


**Step 5: Building and Training the Model**

In this example, we'll use the User-based Collaborative Filtering approach provided by Surprise. You can choose other algorithms as well.

**User-based Collaborative Filtering**

User-based Collaborative Filtering is a popular recommendation technique that is based on the idea that users who have interacted with items in similar ways (e.g., rated movies similarly) are likely to have similar preferences. Here's how it works:

 - Similarity Calculation: For each pair of users, the similarity between their ratings is calculated. 

 - Neighborhood Selection: For a given user, a set of similar users (neighborhood) is determined based on the similarity scores. These users are the ones whose ratings are most similar to the target user.

 - Rating Prediction: The ratings of items that the target user has not yet rated are predicted based on the ratings of the similar users. This is done by taking a weighted average of the ratings, where the weights are the similarity scores.

In [21]:
# Build and train the model (Using User-based Collaborative Filtering)
sim_options = {'name': 'cosine', 'user_based': True}
model = KNNBasic(sim_options=sim_options)
model.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x1a232976e60>

**Step 6: Making Recommendations**

Now, let's make recommendations for a specific user. You can choose any user ID you want to make recommendations for:


In [22]:
user_id = 2  # Example user ID

# Get movies that the user has already rated
user_movie_ids = ratings[ratings['userId'] == user_id]['movieId']

# Get the titles of the movies the user has seen
user_movies = movies[movies['movieId'].isin(user_movie_ids)]['title']

print(f"Movies that user {user_id} has seen:")
for movie_id, movie_title in zip(user_movie_ids, user_movies):
    print(f"Movie ID: {movie_id} - Title: {movie_title}")


Movies that user 1 has seen:
Movie ID: 1 - Title: Toy Story (1995)
Movie ID: 3 - Title: Grumpier Old Men (1995)
Movie ID: 6 - Title: Heat (1995)
Movie ID: 47 - Title: Seven (a.k.a. Se7en) (1995)
Movie ID: 50 - Title: Usual Suspects, The (1995)
Movie ID: 70 - Title: From Dusk Till Dawn (1996)
Movie ID: 101 - Title: Bottle Rocket (1996)
Movie ID: 110 - Title: Braveheart (1995)
Movie ID: 151 - Title: Rob Roy (1995)
Movie ID: 157 - Title: Canadian Bacon (1995)
Movie ID: 163 - Title: Desperado (1995)
Movie ID: 216 - Title: Billy Madison (1995)
Movie ID: 223 - Title: Clerks (1994)
Movie ID: 231 - Title: Dumb & Dumber (Dumb and Dumber) (1994)
Movie ID: 235 - Title: Ed Wood (1994)
Movie ID: 260 - Title: Star Wars: Episode IV - A New Hope (1977)
Movie ID: 296 - Title: Pulp Fiction (1994)
Movie ID: 316 - Title: Stargate (1994)
Movie ID: 333 - Title: Tommy Boy (1995)
Movie ID: 349 - Title: Clear and Present Danger (1994)
Movie ID: 356 - Title: Forrest Gump (1994)
Movie ID: 362 - Title: Jungle Boo

In [23]:
# Get movies that the user hasn't seen (unseen movies)
user_unseen_movie_ids = movies[~movies['movieId'].isin(user_movie_ids)]['movieId']

# Get the titles of the movies the user hasn't seen
user_unseen_movies = movies[movies['movieId'].isin(user_unseen_movie_ids)]['title']

print(f"Movies that user {user_id} has not seen:")
for movie_id, movie_title in zip(user_unseen_movie_ids, user_unseen_movies):
    print(f"Movie ID: {movie_id} - Title: {movie_title}")


Movies that user 1 has not seen:
Movie ID: 2 - Title: Jumanji (1995)
Movie ID: 4 - Title: Waiting to Exhale (1995)
Movie ID: 5 - Title: Father of the Bride Part II (1995)
Movie ID: 7 - Title: Sabrina (1995)
Movie ID: 8 - Title: Tom and Huck (1995)
Movie ID: 9 - Title: Sudden Death (1995)
Movie ID: 10 - Title: GoldenEye (1995)
Movie ID: 11 - Title: American President, The (1995)
Movie ID: 12 - Title: Dracula: Dead and Loving It (1995)
Movie ID: 13 - Title: Balto (1995)
Movie ID: 14 - Title: Nixon (1995)
Movie ID: 15 - Title: Cutthroat Island (1995)
Movie ID: 16 - Title: Casino (1995)
Movie ID: 17 - Title: Sense and Sensibility (1995)
Movie ID: 18 - Title: Four Rooms (1995)
Movie ID: 19 - Title: Ace Ventura: When Nature Calls (1995)
Movie ID: 20 - Title: Money Train (1995)
Movie ID: 21 - Title: Get Shorty (1995)
Movie ID: 22 - Title: Copycat (1995)
Movie ID: 23 - Title: Assassins (1995)
Movie ID: 24 - Title: Powder (1995)
Movie ID: 25 - Title: Leaving Las Vegas (1995)
Movie ID: 26 - Titl

In [25]:
# Predict ratings for unseen movies
predictions = [model.predict(user_id, movie_id) for movie_id in user_unseen_movie_ids]

# Sort predictions by estimated rating in descending order
predictions.sort(key=lambda x: x.est, reverse=True)

# Get the top N recommended movie IDs
top_n = 10
top_movie_ids = [prediction.iid for prediction in predictions[:top_n]]

# Get the titles of the recommended movies
recommended_movies = movies[movies['movieId'].isin(top_movie_ids)]['title']

print(f'Top {top_n} recommended movies for user {user_id}:')
print(recommended_movies)


Top 10 recommended movies for user 1:
36            Cry, the Beloved Country (1995)
48                            Lamerica (1994)
87       Heidi Fleiss: Hollywood Madam (1995)
121          Awfully Big Adventure, An (1995)
405                    Live Nude Girls (1995)
433               What Happened Was... (1994)
441                            Orlando (1992)
536                    Denise Calls Up (1995)
557    World of Apu, The (Apur Sansar) (1959)
717                          Ninotchka (1939)
Name: title, dtype: object


That's it! You've built a simple movie recommendation system using the MovieLens dataset and the Surprise library. You can further enhance this system by trying different algorithms, handling user-item cold start problems, and integrating it into a web application or service.