# Recommender System

### Load the data

In [22]:
import pandas as pd
import numpy as np
# Load the data
links_df = pd.read_csv('/Users/nickblackford/Downloads/ml-latest-small/links.csv')
movies_df = pd.read_csv('/Users/nickblackford/Downloads/ml-latest-small/movies.csv')
ratings_df = pd.read_csv('/Users/nickblackford/Downloads/ml-latest-small/ratings.csv')
tags_df = pd.read_csv('/Users/nickblackford/Downloads/ml-latest-small/tags.csv')



### Data Preprocessing

In [27]:
# Merge movies and ratings data
data = pd.merge(ratings_df, movies_df, on='movieId')

# Create a utility matrix
utility_matrix = data.pivot_table(index='userId', columns='title', values='rating')

### Pearson Correlation

In [28]:
# Calculate the Pearson correlation between movies
movie_similarity = utility_matrix.corr(method='pearson')
movie_similarity.head()

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71 (2014),,,,,,,,,,,...,,,,,,,,,,
'Hellboy': The Seeds of Creation (2004),,,,,,,,,,,...,,,,,,,,,,
'Round Midnight (1986),,,,,,,,,,,...,,,,,,,,,,
'Salem's Lot (2004),,,,,,,,,,,...,,,,,,,,,,
'Til There Was You (1997),,,,,1.0,,,,,,...,,,,,,,,,,


### Generate Recommendations

In [25]:
# Create function that takes a movie title and returns the top 10 most similar movies based on Pearson correlation.
def recommend_movies(movie_title, num_recommendations=10):
    similar_scores = movie_similarity[movie_title].dropna().sort_values(ascending=False)
    recommendations = similar_scores.iloc[1:num_recommendations + 1].index
    return recommendations

# Example
recommendations = recommend_movies("Toy Story (1995)")
recommendations


Index(['Claim, The (2000)', 'Stalker (1979)',
       'Halloween III: Season of the Witch (1982)',
       'Eddie Murphy Delirious (1983)', 'Guy Thing, A (2003)',
       'Brigadoon (1954)',
       'Hearts of Darkness: A Filmmakers Apocalypse (1991)',
       'Hall Pass (2011)', 'Perfect Candidate, A (1996)',
       'Perfect Blue (1997)'],
      dtype='object', name='title')

### Summary

**Data Loading and Preprocessing:**

We loaded the MovieLens dataset, merged ratings with movie metadata, and created a utility matrix where rows represent users and columns represent movie titles.

**Similarity Calculation:**

We calculated the Pearson correlation between movies to create a similarity matrix.
This matrix captures the linear relationships between movie ratings.

**Generating Recommendations:**

We implemented a function to recommend movies based on the highest Pearson correlation values.
Example recommendations were generated for a specific movie to demonstrate the system.

**Interpretation and Analysis:**

Pearson correlation values indicate the strength and direction of the linear relationship between movie ratings.
High correlation values suggest similar user ratings and form the basis for recommendations.
The model’s effectiveness can be evaluated using coverage, diversity, novelty, accuracy, and user feedback.

**Sources** 
- https://analyticsindiamag.com/ai-mysteries/how-to-build-your-first-recommender-system-using-python-movielens-dataset/
- https://www.geeksforgeeks.org/recommendation-system-in-python/#