<a href="https://colab.research.google.com/github/godsesaurab/data-science-projects/blob/main/5.%20Recommendation%20System%20/Recommendation%20System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recommendation System

Recommendation systems use data-driven methodologies to provide users with tailored suggestions.

There are two primary approaches for building recommendation systems:

- **Content-Based Filtering**: This approach suggests items based on the features of the items and user profiles. For example if a user liked a specific movie the system would recommend movies with similar attributes such as the same genre, director or actors.
- **Collaborative Filtering**: This technique recommends items by analyzing user behavior and preferences, relying on the assumption that users with similar tastes will like similar items. For example if two users have liked similar movies in the past the system will recommend movies that one user liked to the other user assuming they would also like it based on their similar preferences.

## 1. Importing libraries

In [2]:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## 2. Loading Dataset

In [3]:
ratings = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/20250324125640765069/ratings.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,877,4155,5,1651201566
1,305,7661,2,1639553712
2,381,8423,2,1610704432
3,208,6433,1,1650223767
4,47,7752,4,1663998365


In [4]:
movies = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/20240903222422/movies.csv')
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## 3. Statistical Analysis of Ratings

Now we will calculate and prints the total number of ratings, unique movies, unique users and the average number of ratings per user and per movie.It will help us in model making and making informed decision.

In [5]:
n_ratings = len(ratings)
n_movies = len(ratings['movieId'].unique())
n_users = len(ratings['userId'].unique())

print(f'Number of ratings : {n_ratings}')
print(f'Number of movies : {n_movies}')
print(f'Number of users : {n_users}')
print(f'Average rating per user : {round(n_ratings/n_users,2)}')
print(f'Average rating per movie : {round(n_ratings/n_movies,2)}')

Number of ratings : 100836
Number of movies : 9742
Number of users : 999
Average rating per user : 100.94
Average rating per movie : 10.35


## 4. User Rating Frequency

Now we will block groups the ratings dataset by user ID to calculate the number of ratings each user has made and then prints the first few rows of this user rating frequency data.

In [6]:
user_freq = ratings[['userId','movieId']].groupby('userId').sum().reset_index()
user_freq.columns = ['userId','n_ratings']
user_freq.head()

Unnamed: 0,userId,n_ratings
0,1,611429
1,2,520932
2,3,424675
3,4,475996
4,5,519778


## 5. Movie Rating Analysis

This block analyzes the average ratings of movies identifies the highest and lowest rated movies and displays information about those movies from the movies DataFrame.

In [7]:
mean_rating = ratings.groupby('movieId')[['rating']].mean()
print(f'Lowest Rated')
lowest_rated = mean_rating['rating'].idxmin()
display(movies.loc[movies['movieId'] == lowest_rated])
print('Highest Rated')
highest_rated = mean_rating['rating'].idxmax()
display(movies.loc[movies['movieId'] == highest_rated])
ratings[ratings['movieId'] == lowest_rated]
ratings[ratings['movieId'] == highest_rated]
print()


Lowest Rated


Unnamed: 0,movieId,title,genres
984,1285,Heathers (1989),Comedy


Highest Rated


Unnamed: 0,movieId,title,genres
5029,7831,Another Thin Man (1939),Comedy|Crime|Drama|Mystery|Romance





In [8]:
movie_stats = ratings.groupby('movieId')[['rating']].agg(['mean','count'])
movie_stats.columns = movie_stats.columns.droplevel()
movie_stats

Unnamed: 0_level_0,mean,count
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,2.230769,13
2,3.000000,6
3,2.571429,7
4,3.916667,12
5,2.909091,11
...,...,...
9738,3.428571,7
9739,3.428571,7
9740,2.571429,7
9741,2.666667,9


## 6. User-Item matrix creation

This block creates a sparse user-item matrix using csr_matrix from scipy. It also generates mappings between user and movie IDs and their corresponding indices for use in the matrix.

In [9]:
from scipy.sparse import csr_matrix


def create_matrix(df):

  N = len(df['userId'].unique())
  M = len(df['movieId'].unique())

  user_mapper = dict(zip(np.unique(df['userId']),list(range(N))))
  movie_mapper = dict(zip(np.unique(df['movieId']),list(range(M))))

  user_inv_mapper = dict(zip(list(range(N)), np.unique(df['userId'])))
  movie_inv_mapper = dict(zip(list(range(M)), np.unique(df['movieId'])))

  user_index = [user_mapper[i] for i in df['userId']]
  movie_index =[movie_mapper[i] for i in df['movieId']]

  X =csr_matrix((df['rating'],(movie_index,user_index)),shape=(M,N))

  return X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper


X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper = create_matrix(ratings)

In [10]:
print(X)

<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 100333 stored elements and shape (9742, 999)>
  Coords	Values
  (0, 129)	1
  (0, 164)	1
  (0, 202)	4
  (0, 288)	5
  (0, 322)	3
  (0, 356)	1
  (0, 546)	6
  (0, 558)	2
  (0, 584)	1
  (0, 703)	1
  (0, 938)	3
  (0, 942)	1
  (1, 393)	1
  (1, 550)	1
  (1, 573)	4
  (1, 651)	5
  (1, 841)	5
  (1, 853)	2
  (2, 156)	5
  (2, 234)	2
  (2, 242)	1
  (2, 345)	4
  (2, 603)	2
  (2, 682)	3
  (2, 951)	1
  :	:
  (9739, 283)	2
  (9739, 380)	3
  (9739, 710)	2
  (9739, 832)	4
  (9739, 877)	5
  (9740, 9)	5
  (9740, 49)	1
  (9740, 236)	1
  (9740, 482)	1
  (9740, 510)	3
  (9740, 524)	4
  (9740, 634)	3
  (9740, 746)	1
  (9740, 862)	5
  (9741, 8)	3
  (9741, 137)	1
  (9741, 235)	5
  (9741, 281)	2
  (9741, 295)	1
  (9741, 329)	1
  (9741, 453)	3
  (9741, 584)	2
  (9741, 782)	3
  (9741, 893)	3
  (9741, 947)	1


## 7. Movie Similarity Analysis

Here we will use k-nearest neighbors algorithm to find similar movies based on the cosine similarity metric. It calculates the KNN for the given movie ID and returns a list of similar movie IDs

In [15]:
from sklearn.neighbors import NearestNeighbors

def find_similar_movies(movie_id, X, k, metric='cosine', show_distance = False):
  neighbour_ids = []

  if movie_id not in movie_mapper:
    print(f"Movie ID {movie_id} not found in movie_mapper")
    return []

  movie_ind = movie_mapper[movie_id]
  movie_vec = X[movie_ind]
  k += 1
  kNN = NearestNeighbors(n_neighbors=k, algorithm='brute', metric=metric)
  kNN.fit(X)
  movie_vec = movie_vec.reshape(1,-1)
  neighbour = kNN.kneighbors(movie_vec, return_distance=show_distance)

  for i in range(0,k):
    n = neighbour.item(i)
    neighbour_ids.append(movie_inv_mapper[n])

  neighbour_ids.pop(0)
  return neighbour_ids

## 8. Movie recommendation with respect to User Preference

- This function recommends movies based on a user’s highest-rated movie. It filters the ratings dataset to find the movie with the highest rating for the given user.
- It then uses the find_similar_movies function to find movies similar to the highest-rated movie.
- The movie titles are printed as recommendations and any movies that aren't found in the dataset are skipped.

In [22]:
def recommend_movie_for_user(user_id, X, user_mapper, movie_mapper, movie_inv_mapper, k=10):
  df1 = ratings[ratings['userId']==user_id]
  movie_id = df1[df1['rating'] == max(df1['rating'])]['movieId'].iloc[0]
  movie_titles = dict(zip(movies['movieId'],movies['title']))
  similar_ids = find_similar_movies(movie_id,X,k)
  print(f'since you watched {movie_titles[movie_id]}, you might also like:')

  for i in similar_ids:
    if i in movie_titles:
      print(movie_titles[i])

## 9. Recommendation

In [23]:
user_id = 150
recommend_movie_for_user(user_id, X, user_mapper, movie_mapper, movie_inv_mapper, k=10)

since you watched Miller's Crossing (1990), you might also like:
Flawless (1999)
Lilya 4-Ever (Lilja 4-ever) (2002)
Bells of St. Mary's, The (1945)
Dark City (1998)
Cradle 2 the Grave (2003)
Japanese Story (2003)
