<a href="https://colab.research.google.com/github/godsesaurab/data-science-projects/blob/main/5.%20Recommendation%20System%20/Recommendation%20System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recommendation System

Recommendation systems use data-driven methodologies to provide users with tailored suggestions.

There are two primary approaches for building recommendation systems:

- **Content-Based Filtering**: This approach suggests items based on the features of the items and user profiles. For example if a user liked a specific movie the system would recommend movies with similar attributes such as the same genre, director or actors.
- **Collaborative Filtering**: This technique recommends items by analyzing user behavior and preferences, relying on the assumption that users with similar tastes will like similar items. For example if two users have liked similar movies in the past the system will recommend movies that one user liked to the other user assuming they would also like it based on their similar preferences.

## 1. Importing libraries

In [1]:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## 2. Loading Dataset

In [2]:
ratings = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/20250324125640765069/ratings.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,877,4155,5,1651201566
1,305,7661,2,1639553712
2,381,8423,2,1610704432
3,208,6433,1,1650223767
4,47,7752,4,1663998365


In [3]:
movies = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/20240903222422/movies.csv')
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## 3. Statistical Analysis of Ratings

In [4]:
n_ratings = len(ratings)
n_movies = len(ratings['movieId'].unique())
n_users = len(ratings['userId'].unique())

print(f'Number of ratings : {n_ratings}')
print(f'Number of movies : {n_movies}')
print(f'Number of users : {n_users}')
print(f'Average rating per user : {round(n_ratings/n_users,2)}')
print(f'Average rating per movie : {round(n_ratings/n_movies,2)}')

Number of ratings : 100836
Number of movies : 9742
Number of users : 999
Average rating per user : 100.94
Average rating per movie : 10.35


## 4. User Rating Frequency

In [5]:
user_freq = ratings[['userId','movieId']].groupby('userId').sum().reset_index()
user_freq.columns = ['userId','n_ratings']
user_freq.head()

Unnamed: 0,userId,n_ratings
0,1,611429
1,2,520932
2,3,424675
3,4,475996
4,5,519778


## 5. Movie Rating Analysis

In [15]:
mean_rating = ratings.groupby('movieId')[['rating']].mean()
print(f'Lowest Rated')
lowest_rated = mean_rating['rating'].idxmin()
display(movies.loc[movies['movieId'] == lowest_rated])
print('Highest Rated')
highest_rated = mean_rating['rating'].idxmax()
display(movies.loc[movies['movieId'] == highest_rated])
ratings[ratings['movieId'] == lowest_rated]
ratings[ratings['movieId'] == highest_rated]
print()


Lowest Rated


Unnamed: 0,movieId,title,genres
984,1285,Heathers (1989),Comedy


Highest Rated


Unnamed: 0,movieId,title,genres
5029,7831,Another Thin Man (1939),Comedy|Crime|Drama|Mystery|Romance



