# Movie Recommender System

In [2]:
import warnings
# Suppress all warnings
warnings.filterwarnings("ignore")

**Data Source:** 
ttps://grouplens.org/datasets/movielens/

Using the small MovieLens data set, a recommender system is created that allows users to input a movie they like (in the data set) and recommends ten other movies for them to watch. .
 

## Importing Libraries & Loading Datasets

In [6]:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

ratings = pd.read_csv("ratings.csv")
print(ratings.head())

   userId  movieId  rating   timestamp
0       1        1     4.0  1225734739
1       1      110     4.0  1225865086
2       1      158     4.0  1225733503
3       1      260     4.5  1225735204
4       1      356     5.0  1225735119


In [7]:
movies = pd.read_csv("movies.csv")
print(movies.head())

   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  


## Statistical Analysis of Ratings

Calculate and prints the total number of ratings, unique movies, unique users and the average number of ratings per user and per movie.It will help us in model making and making informed decision.laces.

In [73]:
n_ratings = len(ratings) # calculates the total number of ratings in the ratings DataFrame
n_movies = len(ratings['movieId'].unique()) # gets the unique movie IDs to calculate the total number of unique movies
n_users = len(ratings['userId'].unique()) # gets the unique user IDs to calculate the total number of unique users

print(f"Number of ratings: {n_ratings}")
print(f"Number of unique movieId's: {n_movies}")
print(f"Number of unique users: {n_users}")
print(f"Average ratings per user: {round(n_ratings/n_users, 2)}") # calculates the average number of ratings per user and rounds it to 2 decimal places
print(f"Average ratings per movie: {round(n_ratings/n_movies, 2)}") #calculates the average number of ratings per movie and rounds it to 2 decimal places.

Number of ratings: 33832162
Number of unique movieId's: 83239
Number of unique users: 330975
Average ratings per user: 102.22
Average ratings per movie: 406.45


## User Rating Frequency

groups the ratings dataset by user ID to calculate the number of ratings each user has made and then prints the first few rows of this user rating frequency data.

In [78]:
user_freq = ratings[['userId', 'movieId']].groupby(
    'userId').count().reset_index()  #groups the ratings by userId and counts the number of movies each user has rated.
user_freq.columns = ['userId', 'n_ratings']
print(user_freq.head())

   userId  n_ratings
0       1         62
1       2         91
2       3         30
3       4         30
4       5         43


## Movie Rating Analysis

Analyze the average ratings of movies identifies the highest and lowest rated movies and displays information about those movies from the movies DataFrame.

ratings.groupby('movieId')[['rating']].mean(): groups the ratings by movieId and calculates the mean rating for each movie.
mean_rating['rating'].idxmin(): finds the movie ID with the lowest average rating.
mean_rating['rating'].idxmax(): finds the movie ID with the highest average rating.
movies.loc[movies['movieId'] == lowest_rated]: filters the movies DataFrame to display information about the movie with the lowest rating.
movies.loc[movies['movieId'] == highest_rated]: filters the movies DataFrame to display information about the movie with the highest rating.

In [81]:
#groups the ratings by movieId and calculates the mean rating for each movie.
mean_rating = ratings.groupby('movieId')[['rating']].mean()

# finds the movie ID with the lowest average rating.
lowest_rated = mean_rating['rating'].idxmin()

# filters the movies DataFrame to display information about the movie with the lowest rating.
movies.loc[movies['movieId'] == lowest_rated] 

 # finds the movie ID with the highest average rating.
highest_rated = mean_rating['rating'].idxmax()

 # filters the movies DataFrame to display information about the movie with the highest rating.
movies.loc[movies['movieId'] == highest_rated]

ratings[ratings['movieId']==highest_rated]
ratings[ratings['movieId']==lowest_rated]

movie_stats = ratings.groupby('movieId')[['rating']].agg(['count', 'mean'])
movie_stats.columns = movie_stats.columns.droplevel()

## User-Item Matrix Creation

Create a sparse user-item matrix using csr_matrix from scipy. It also generates mappings between user and movie IDs and their corresponding indices for use in the matrix.tings.

In [85]:
from scipy.sparse import csr_matrix

# creates a sparse matrix (Compressed Sparse Row) from the user-item ratings data to save memory.
def create_matrix(df):
    
    N = len(df['userId'].unique())
    M = len(df['movieId'].unique())

    # create dictionaries that map user IDs and movie IDs to indices in the sparse matrix.
    user_mapper = dict(zip(np.unique(df["userId"]), list(range(N))))
    movie_mapper = dict(zip(np.unique(df["movieId"]), list(range(M))))

    #create reverse dictionaries that map matrix indices back to user IDs and movie IDs.
    user_inv_mapper = dict(zip(list(range(N)), np.unique(df["userId"])))
    movie_inv_mapper = dict(zip(list(range(M)), np.unique(df["movieId"])))

    #create lists of indices for users and movies from the ratings DataFrame.
    user_index = [user_mapper[i] for i in df['userId']]
    movie_index = [movie_mapper[i] for i in df['movieId']]

    #constructs the sparse user-item matrix X with the given user and movie indices and ratings.
    X = csr_matrix((df["rating"], (movie_index, user_index)), shape=(M, N))
    
    return X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper

X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper = create_matrix(ratings)

## Movie Similarity Analysis

Use k-nearest neighbors algorithm to find similar movies based on the cosine similarity metric. It calculates the KNN for the given movie ID and returns a list of similar movie IDs.ies.

In [89]:
from sklearn.neighbors import NearestNeighbors

def find_similar_movies(movie_id, X, k, metric='cosine', show_distance=False):
    neighbour_ids = []
    
    if movie_id not in movie_mapper:
        print(f"Movie ID {movie_id} not found in movie_mapper!")
        return []

    movie_ind = movie_mapper[movie_id]
    movie_vec = X[movie_ind]
    k += 1  
    kNN = NearestNeighbors(n_neighbors=k, algorithm="brute", metric=metric)
    kNN.fit(X) # fits the k-NN model to the user-item matrix X.
    movie_vec = movie_vec.reshape(1, -1)
    neighbour = kNN.kneighbors(movie_vec, return_distance=show_distance) # finds the k-nearest neighbors for the given movie using the k-NN algorithm.
    
    for i in range(0, k):
        n = neighbour.item(i)
        neighbour_ids.append(movie_inv_mapper[n])
    
    neighbour_ids.pop(0) # removes the movie itself from the list of similar movies.
    return neighbour_ids

## Movie Recommendation with respect to Users Preference

This function recommends movies based on a userâ€™s highest-rated movie. It filters the ratings dataset to find the movie with the highest rating for the given user.
It then uses the find_similar_movies function to find movies similar to the highest-rated movie.
The movie titles are printed as recommendations and any movies that aren't found in the dataset are skipped.

In [62]:
def recommend_movies_for_user(user_id, X, user_mapper, movie_mapper, movie_inv_mapper, k=10):
    df1 = ratings[ratings['userId'] == user_id]

    movie_id = df1[df1['rating'] == max(df1['rating'])]['movieId'].iloc[0]
    print("Moview id:")
    print(movie_id)

    movie_titles = dict(zip(movies['movieId'], movies['title']))

    similar_ids = find_similar_movies(movie_id, X, k)

    print(f"Since you watched {get_title_by_movie_id(movie_id)}, you might also like:")
    
    for i in similar_ids:
        if i in movie_titles:
            print(movie_titles[i])

In [64]:
# Function to get title by movieId
def get_title_by_movie_id(movie_id):
    result = movies.loc[movies['movieId'] == movie_id, 'title']
    return result.values[0] if not result.empty else "Movie ID not found"

# Example usage
#print(get_title_by_movie_id(6))

## Recommending movies

In [94]:
#Here we are testing our model on user id= 350 and will see what models recommend based on user activity.
user_id = 350 
recommend_movies_for_user(user_id, X, user_mapper, movie_mapper, movie_inv_mapper, k=10)

Moview id:
92259
Since you watched Intouchables (2011), you might also like:
Inception (2010)
Interstellar (2014)
Django Unchained (2012)
The Imitation Game (2014)
Shutter Island (2010)
Wolf of Wall Street, The (2013)
Grand Budapest Hotel, The (2014)
Whiplash (2014)
King's Speech, The (2010)
Inglourious Basterds (2009)
