### Read data

**Import libraries**

In [29]:
# libraries
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from surprise import SVD
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import train_test_split
from surprise import accuracy

**Read the ratings and movies dataset**

In [82]:
ratings_df = pd.read_csv('ml-100k/u.data', sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp'])
movies_df = pd.read_csv('ml-100k/u.item', sep='|', encoding='latin-1', usecols=[0, 1, 5], names=['movie_id', 'title', 'genres'])

### Collaborative filtering

Collaborative filtering is a technique commonly used in recommendation systems. It involves analyzing the behavior and preferences of users to identify patterns and similarities among them. The system then uses this information to recommend items to a user based on what other users with similar preferences have liked or consumed.

Collaborative filtering can be done in two ways:

1. User-based collaborative filtering: This technique identifies a set of users with similar preferences to a given user, and then recommends items that these similar users have liked or consumed.

2. Item-based collaborative filtering: This technique identifies a set of items that are similar to items that a user has liked or consumed in the past, and then recommends these similar items to the user.

Collaborative filtering is popular because it can work well in situations where there is little information about the user or the item being recommended. It is often used in e-commerce sites, music and video streaming services, and other applications where personalized recommendations are important.

#### User-based collaborative filtering

The code is implementing a user-based collaborative filtering recommendation system using Singular Value Decomposition (SVD) matrix factorization algorithm provided by the Surprise library. The code trains an SVD model on the MovieLens ratings data and then uses it to recommend movies to a given user.

The code uses the Surprise library to load the MovieLens dataset, split it into training and testing sets, and train an SVD algorithm on the training set. It then uses the trained model to make predictions on the testing set and evaluates the model using RMSE. Finally, it recommends 10 movies to a user based on their past ratings, using the trained model to predict their ratings for movies they have not seen.

In [83]:
def get_user_recommendations(user_id, n=10):
    # Define the Reader object
    reader = Reader(rating_scale=(0.5, 5.0))

    # Load the data into the Surprise dataset object
    data = Dataset.load_from_df(ratings_df[['user_id', 'movie_id', 'rating']], reader)

    # Split the data into training and testing sets
    trainset, testset = train_test_split(data, test_size=.25)

    # Define the model (SVD algorithm)
    model = SVD()

    # Train the model on the training set
    model.fit(trainset)

    # Make predictions on the testing set
    predictions = model.test(testset)

    # Evaluate the model using RMSE
    accuracy.rmse(predictions)

    # Recommend movies for a user
    movies_not_seen = movies_df[~movies_df['movie_id'].isin(ratings_df[ratings_df['user_id'] == user_id]['movie_id'])]
    movies_not_seen['rating'] = movies_not_seen['movie_id'].apply(lambda x: model.predict(user_id, x).est)
    recommended_movies = movies_not_seen.sort_values('rating', ascending=False).head(n)
    
    return recommended_movies

In [84]:
get_user_recommendations(1,10)

RMSE: 0.9449


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies_not_seen['rating'] = movies_not_seen['movie_id'].apply(lambda x: model.predict(user_id, x).est)


Unnamed: 0,movie_id,title,genres,rating
407,408,"Close Shave, A (1995)",0,4.712775
356,357,One Flew Over the Cuckoo's Nest (1975),0,4.70795
301,302,L.A. Confidential (1997),0,4.689471
312,313,Titanic (1997),0,4.683084
483,484,"Maltese Falcon, The (1941)",0,4.596896
317,318,Schindler's List (1993),0,4.589585
473,474,Dr. Strangelove or: How I Learned to Stop Worr...,0,4.517994
529,530,"Man Who Would Be King, The (1975)",0,4.488413
653,654,Chinatown (1974),0,4.465328
492,493,"Thin Man, The (1934)",0,4.459957


#### Item-based collaborative filtering

The code provides a simple way to implement item-based collaborative filtering for making movie recommendations based on user ratings.

The code computes the similarity matrix between the items (movies) in the dataset using the cosine similarity metric. This creates a square matrix where each row and column represents a movie, and the values in the matrix represent the cosine similarity between each pair of movies.

The get_movie_recommendations() function takes a movie title and an optional parameter n which defaults to 10. It gets the index of the movie in the user-item matrix, and computes the similarity scores between the movie and all other movies using the item similarity matrix. It then sorts the movies based on the similarity scores and returns the top n most similar movies. Finally, it returns the titles of the recommended movies.

In [85]:
# We need to preprocess the data and prepare it for modeling. In this case, we need to merge the ratings and movies dataframes, and create a
# user-item matrix.
ratings_movies_df = pd.merge(ratings_df, movies_df, on='movie_id')
user_item_matrix = ratings_movies_df.pivot_table(index='user_id', columns='title', values='rating')
user_item_matrix = user_item_matrix.fillna(0)

# Compute similarity matrix between the items (movies) in the dataset. We can use the cosine similarity metric for this.
item_similarity = cosine_similarity(user_item_matrix.T)

In [86]:
# We can now make recommendations based on the similarity matrix. For a given movie, we can find the most similar movies and recommend them.

def get_movie_recommendations(movie_title, n=10):
    # Get the index of the movie
    movie_idx = user_item_matrix.columns.get_loc(movie_title)

    # Get the similarity scores between the movie and all other movies
    sim_scores = list(enumerate(item_similarity[movie_idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the top n most similar movies
    sim_scores = sim_scores[1:n+1]
    movie_indices = [i[0] for i in sim_scores]

    # Return the titles of the recommended movies
    return user_item_matrix.columns[movie_indices]

In [87]:
get_movie_recommendations('Toy Story (1995)', n=5)

Index(['Star Wars (1977)', 'Return of the Jedi (1983)',
       'Independence Day (ID4) (1996)', 'Rock, The (1996)',
       'Mission: Impossible (1996)'],
      dtype='object', name='title')