# Movie Recommendation System Using Collaborative Filtering

This notebook demonstrates the development of a recommendation system using collaborative filtering with Singular Value Decomposition (SVD). The model is trained on the MovieLens dataset to predict user ratings and generate personalized movie recommendations. Key steps include data preprocessing, model training, evaluation, and generating top recommendations for individual users.

At the end of the notebook, I've added a summary of the project, insights on model performance, and potential future improvements.


In [None]:
import pandas as pd

# Load the ratings data
ratings = pd.read_csv('u.data', sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])

# Load the movies data
movies = pd.read_csv('u.item', sep='|', encoding='latin-1', names=['item_id', 'title'], usecols=[0, 1])

# Merge the two datasets on the item_id column
data = pd.merge(ratings, movies, on='item_id')

# Display the first few rows to display the data structure
data.head()

In [None]:
# Check for missing values
print(data.isnull().sum())

In [None]:
from sklearn.model_selection import train_test_split

# Split the data into training and test sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

# Display the number of records in each set
print(f"Training set size: {len(train_data)}")
print(f"Test set size: {len(test_data)}")

In [None]:
from surprise import Dataset, Reader

# Define a Reader object with the rating scale used in the MovieLens dataset
reader = Reader(rating_scale=(1, 5))

# Load the data into a Surprise Dataset
data = Dataset.load_from_df(train_data[['user_id', 'item_id', 'rating']], reader)

In [None]:
from surprise import SVD
from surprise.model_selection import train_test_split

# Split data within Surprise (again for compatibility with Surprise’s functions)
trainset, testset = train_test_split(data, test_size=0.2)

# Initialize the SVD algorithm
svd = SVD()

In [None]:
# Train the SVD model
svd.fit(trainset)

In [None]:
from surprise import accuracy

# Make predictions on the test set
predictions = svd.test(testset)

# Calculate RMSE
rmse = accuracy.rmse(predictions)
print(f"Test RMSE: {rmse}")

In [None]:
def get_top_n_recommendations(user_id, n=5):
    # Get all unique item IDs
    all_items = data.df['item_id'].unique()
    
    # Get items the user has already rated
    rated_items = train_data[train_data['user_id'] == user_id]['item_id'].unique()
    
    # Filter for items the user hasn't rated yet
    unrated_items = [item for item in all_items if item not in rated_items]
    
    # Predict ratings for unrated items
    predictions = [svd.predict(user_id, item) for item in unrated_items]
    
    # Sort predictions by estimated rating in descending order
    predictions.sort(key=lambda x: x.est, reverse=True)
    
    # Get top N recommendations
    top_n = predictions[:n]
    return [(pred.iid, pred.est) for pred in top_n]

In [None]:
# Example usage: Get top 5 recommendations for a user
user_id = 1  # Replace with any user ID to test it
recommendations = get_top_n_recommendations(user_id, n=5)

# Display recommendations
# print("Top Recommendations:", recommendations)

# Map item IDs to movie titles
recommended_titles = [(movies.loc[movies['item_id'] == item_id, 'title'].values[0], rating) for item_id, rating in recommendations]
print("Top Recommended Movies:", recommended_titles)

In [None]:
def precision_recall_at_k(predictions, k=5, threshold=4.0):
    # A dictionary mapping user IDs to their top k predictions
    user_est_true = {}
    for pred in predictions:
        if pred.uid not in user_est_true:
            user_est_true[pred.uid] = []
        user_est_true[pred.uid].append((pred.est, pred.r_ui))
    
    # Calculate precision and recall for each user
    precisions = []
    recalls = []
    for uid, user_ratings in user_est_true.items():
        # Sort by predicted rating in descending order and take the top k
        user_ratings.sort(key=lambda x: x[0], reverse=True)
        top_k_ratings = user_ratings[:k]
        
        # Count relevant items in the top-k (true positives)
        relevant_items_top_k = sum((true_r >= threshold) for (_, true_r) in top_k_ratings)
        recommended_items = len(top_k_ratings)
        
        # Count all relevant items for the user (not just in the top-k)
        relevant_items_total = sum((true_r >= threshold) for (_, true_r) in user_ratings)
        
        # Precision and recall calculations
        precision = relevant_items_top_k / recommended_items if recommended_items > 0 else 1
        recall = relevant_items_top_k / relevant_items_total if relevant_items_total > 0 else 1
        
        precisions.append(precision)
        recalls.append(recall)
    
    # Average precision and recall
    avg_precision = sum(precisions) / len(precisions)
    avg_recall = sum(recalls) / len(recalls)
    return avg_precision, avg_recall

In [None]:
# Calculate precision and recall at k=5
avg_precision, avg_recall = precision_recall_at_k(predictions, k=5, threshold=4.0)
print(f"Precision at K: {avg_precision}")
print(f"Recall at K: {avg_recall}")

In [None]:
# Function to display recommendations with titles
def display_recommendations(user_id, n=5):
    recommendations = get_top_n_recommendations(user_id, n)
    recommended_titles = [(movies.loc[movies['item_id'] == item_id, 'title'].values[0], rating) for item_id, rating in recommendations]
    print(f"Top {n} Recommendations for User {user_id}:")
    for title, rating in recommended_titles:
        print(f"{title}: Predicted Rating {rating:.2f}")
    print("\n")

# Display recommendations for a few sample users
sample_users = [1, 50, 150]  # Replace with any user IDs you like
for user_id in sample_users:
    display_recommendations(user_id, n=5)


In [None]:
from ipywidgets import widgets
from IPython.display import display

In [None]:
def show_recommendations(user_id):
    try:
        user_id = int(user_id)  # Ensure the user ID is an integer

        # Check if user ID exists in the training data
        if user_id not in train_data['user_id'].values:
            print("User ID not found in the dataset. Please enter a valid user ID.")
            return

        # Generate recommendations if user ID is valid
        recommendations = get_top_n_recommendations(user_id, n=5)
        
        # Map item IDs to titles
        recommended_titles = [(movies.loc[movies['item_id'] == item_id, 'title'].values[0], rating) for item_id, rating in recommendations]
        
        # Display recommendations
        print(f"\nTop 5 Recommendations for User {user_id}:")
        for title, rating in recommended_titles:
            print(f"{title}: Predicted Rating {rating:.2f}")

    except ValueError:
        print("Please enter a valid numeric user ID.")

In [None]:
# A text input and button widget
user_id_input = widgets.Text(
    value='',
    placeholder='Enter user ID',
    description='User ID:',
    disabled=False
)
button = widgets.Button(description="Get Recommendations")

# What happens when the button is clicked
def on_button_click(b):
    show_recommendations(user_id_input.value)

# Link the button to the function
button.on_click(on_button_click)

# Display the widgets
display(user_id_input, button)

### Project Overview
In this project, I created a recommendation system using collaborative filtering with Singular Value Decomposition (SVD). The system is designed to recommend items (movies, in this case) to users based on their past ratings. Using user-item interactions, the model predicts ratings for items a user hasn't rated, and suggests those with the highest predicted ratings.

### Steps Taken
1. **Data Loading**: Loaded the MovieLens dataset, containing user-item ratings and movie information.
2. **Data Preprocessing**: Merged datasets, checked for missing values, and split the data into training and test sets.
3. **Model Selection**: Used the SVD algorithm from the Surprise library for collaborative filtering.
4. **Evaluation**: Assessed model accuracy using RMSE, Precision, and Recall.
5. **Recommendations**: Generated top N recommendations for individual users based on predicted ratings.

In [None]:
from IPython.display import display, Markdown

# Dynamic Markdown text for model performance section
model_performance_text = f"""
### Model Performance
- **Root Mean Squared Error (RMSE)**: The model achieved an RMSE of approximately **{rmse:.3f}** on the test set, indicating an average prediction error of slightly below 1 rating point.
- **Precision at K**: The precision at the top 5 recommended items was approximately **{avg_precision:.3f}**, suggesting that around {avg_precision * 100:.1f}% of the top recommendations were relevant to users.
- **Recall at K**: The recall at the top 5 recommended items was approximately **{avg_recall:.3f}**, meaning the model retrieved about {avg_recall * 100:.1f}% of the relevant items for each user.
"""

# Display the Markdown
display(Markdown(model_performance_text))

### Future Improvements
- **Experiment with Hybrid Models**: Combining collaborative and content-based filtering could provide more personalized recommendations.
- **Incorporate Additional Features**: Including genre or timestamp data could help improve predictions for new users.
- **Parameter Tuning**: Fine-tuning the SVD model’s parameters might yield better accuracy.