<a href="https://colab.research.google.com/github/okpriya/Movie-Reccomendation-System/blob/main/Movie_Recommendation_Platform.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Import libraries and load dataset**

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Load Movies dataset
movies = pd.read_csv('/content/movies.csv')

# Load Ratings dataset
ratings = pd.read_csv('/content/ratings.csv')


In [None]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [None]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,16,4.0,1217897793
1,1,24,1.5,1217895807
2,1,32,4.0,1217896246
3,1,47,4.0,1217896556
4,1,50,4.0,1217896523


**Exploratory Data Analysis:**



**i) Understanding of distribution of the features available**

In [None]:
movies.info()        #getting the detailed info

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10329 entries, 0 to 10328
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   movieId  10329 non-null  int64 
 1   title    10329 non-null  object
 2   genres   10329 non-null  object
dtypes: int64(1), object(2)
memory usage: 242.2+ KB


In [None]:
ratings.info()       #getting the detailed info

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105339 entries, 0 to 105338
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   userId     105339 non-null  int64  
 1   movieId    105339 non-null  int64  
 2   rating     105339 non-null  float64
 3   timestamp  105339 non-null  int64  
dtypes: float64(1), int64(3)
memory usage: 3.2 MB


In [None]:
movies.describe()    #getting the statistical value  of movies

Unnamed: 0,movieId
count,10329.0
mean,31924.282893
std,37734.741149
min,1.0
25%,3240.0
50%,7088.0
75%,59900.0
max,149532.0


In [None]:
ratings.describe()           #getting the statistical value of ratings

Unnamed: 0,userId,movieId,rating,timestamp
count,105339.0,105339.0,105339.0,105339.0
mean,364.924539,13381.312477,3.51685,1130424000.0
std,197.486905,26170.456869,1.044872,180266000.0
min,1.0,1.0,0.5,828565000.0
25%,192.0,1073.0,3.0,971100800.0
50%,383.0,2497.0,3.5,1115154000.0
75%,557.0,5991.0,4.0,1275496000.0
max,668.0,149532.0,5.0,1452405000.0


In [None]:
movies.isnull().sum()      #checking for null values


movieId    0
title      0
genres     0
dtype: int64

In [None]:
ratings.isnull().sum()     #checking for null values

userId       0
movieId      0
rating       0
timestamp    0
dtype: int64

**ii) Finding unique users and movies**

In [None]:
# Unique users and movies
unique_users = ratings['userId'].nunique()
unique_movies = ratings['movieId'].nunique()
print("Unique Users:", unique_users)
print("Unique Movies:", unique_movies)


Unique Users: 668
Unique Movies: 10325


**iii) Average rating and Total movies at genre level**

In [None]:
# Average rating and total movies at genre level
avg_rating_genre = ratings.merge(movies, on='movieId').groupby('genres')['rating'].mean()
total_movies_genre = ratings.merge(movies, on='movieId').groupby('genres')['movieId'].nunique()
print("Average Rating at Genre Level:")
print(avg_rating_genre)
print("Total Movies at Genre Level:")
print(total_movies_genre)

Average Rating at Genre Level:
genres
(no genres listed)                     3.071429
Action                                 2.836406
Action|Adventure                       3.739804
Action|Adventure|Animation             4.125000
Action|Adventure|Animation|Children    3.550000
                                         ...   
Sci-Fi|Thriller|IMAX                   3.500000
Thriller                               3.473430
Thriller|War                           3.500000
War                                    3.613636
Western                                3.500000
Name: rating, Length: 938, dtype: float64
Total Movies at Genre Level:
genres
(no genres listed)                       7
Action                                  48
Action|Adventure                        32
Action|Adventure|Animation               3
Action|Adventure|Animation|Children      1
                                      ... 
Sci-Fi|Thriller|IMAX                     1
Thriller                               106
Thriller|War

**iv) Unique genres considered**

In [None]:
# Unique genres considered
unique_genres = movies['genres'].str.split('|', expand=True).stack().unique()
print("Unique Genres Considered:")
print(unique_genres)

Unique Genres Considered:
['Adventure' 'Animation' 'Children' 'Comedy' 'Fantasy' 'Romance' 'Drama'
 'Action' 'Crime' 'Thriller' 'Horror' 'Mystery' 'Sci-Fi' 'IMAX' 'War'
 'Musical' 'Documentary' 'Western' 'Film-Noir' '(no genres listed)']


**Designing the 3 different types of recommendation modules as mentioned in the objectives:**

**1)Popularity-based recommender system at a genre level**

In [None]:

def popularity_recommender(genre, threshold, num_recommendations):
    # Filter movies by genre and minimum review threshold
    genre_movies = movies[movies['genres'].str.contains(genre, case=False)]
    threshold_movies = ratings.groupby('movieId').filter(lambda x: len(x) >= threshold)['movieId'].unique()
    filtered_movies = genre_movies[genre_movies['movieId'].isin(threshold_movies)]

    # Calculate movie popularity based on average ratings and number of reviews
    movie_popularity = ratings.groupby('movieId')['rating'].agg(['mean', 'count']).reset_index()
    movie_popularity.columns = ['movieId', 'Average Movie rating', 'Num_Reviews']

    # Join popularity data with filtered movies
    recommended_movies = pd.merge(filtered_movies, movie_popularity, on='movieId')

    # Sort movies by popularity and return top N recommendations
    top_movies = recommended_movies.sort_values(by='Average Movie rating', ascending=False).head(num_recommendations)

    return top_movies[['movieId', 'title', 'Average Movie rating', 'Num_Reviews']]


recommended_movies = popularity_recommender('Comedy', 100, 5)
recommended_movies


Unnamed: 0,movieId,title,Average Movie rating,Num_Reviews
25,1136,Monty Python and the Holy Grail (1975),4.301948,154
19,608,Fargo (1996),4.271144,201
26,1197,"Princess Bride, The (1987)",4.163743,171
6,296,Pulp Fiction (1994),4.16,325
9,356,Forrest Gump (1994),4.138264,311


**2) Content based recommendation**

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

def content_based_recommender(user_id, num_recommendations):
    # Get movies not seen by the target user
    unseen_movies = ratings[ratings['userId'] != user_id]['movieId'].unique()

    # Calculate TF-IDF vectorization for movie genres
    genre_vectorizer = TfidfVectorizer()
    genre_matrix = genre_vectorizer.fit_transform(movies['genres'])

    # Calculate similarity between target user's preferences and movie genres
    user_genres = movies[movies['movieId'].isin(unseen_movies)]['genres']
    user_genres_vector = genre_vectorizer.transform(user_genres)
    similarity_scores = genre_matrix.dot(user_genres_vector.T).toarray()

    # Get top N similar movies based on genre similarity
    top_movies_indices = similarity_scores.argsort(axis=0)[-num_recommendations:].flatten()[::-1]
    recommended_movies = movies.iloc[top_movies_indices]

    return recommended_movies[['movieId', 'title']]
recommended_movies = content_based_recommender(1, 5)
recommended_movies


Unnamed: 0,movieId,title
10328,149532,Marco Polo: One Hundred Eyes (2015)
6841,41226,Sounder (1972)
2154,2696,"Dinner Game, The (Dîner de cons, Le) (1998)"
2154,2696,"Dinner Game, The (Dîner de cons, Le) (1998)"
6894,42734,Hoodwinked! (2005)
...,...,...
3614,4617,Let It Ride (1989)
9693,103027,Much Ado About Nothing (2012)
8867,84374,No Strings Attached (2011)
8795,82169,Chronicles of Narnia: The Voyage of the Dawn T...


**3)Collaborative-based recommender system**

In [None]:


def collaborative_recommender(user_id, num_recommendations, k):
    # Filter ratings for the target user
    target_user_ratings = ratings[ratings['userId'] == user_id]

    # Get the movies rated by the target user
    target_user_movies = target_user_ratings['movieId'].unique()

    # Filter ratings for users who have rated the same movies as the target user
    similar_users_ratings = ratings[ratings['movieId'].isin(target_user_movies)]

    # Group ratings by user and calculate the average rating
    average_ratings = similar_users_ratings.groupby('userId')['rating'].mean().reset_index()

    # Sort users based on similarity (average rating similarity)
    similar_users = average_ratings.sort_values('rating', ascending=False)[:k]

    # Get the movies rated by similar users
    similar_users_movies = ratings[ratings['userId'].isin(similar_users['userId'])]['movieId'].unique()

    # Exclude movies already rated by the target user
    unseen_movies = np.setdiff1d(similar_users_movies, target_user_movies)

    # Calculate the average rating of unseen movies
    average_ratings_unseen = ratings[ratings['movieId'].isin(unseen_movies)].groupby('movieId')['rating'].mean().reset_index()

    # Sort unseen movies by average rating in descending order
    recommended_movies = average_ratings_unseen.sort_values('rating', ascending=False).head(num_recommendations)

    # Merge with movies dataframe to get movie details
    recommended_movies = pd.merge(recommended_movies, movies, on='movieId')

    return recommended_movies[['movieId', 'title']]


recommended_movies = collaborative_recommender(1, 5, 100)
recommended_movies


Unnamed: 0,movieId,title
0,5056,"Enigma of Kaspar Hauser, The (a.k.a. Mystery o..."
1,108192,Hotel Chevalier (Part 1 of 'The Darjeeling Lim...
2,96691,Resident Evil: Retribution (2012)
3,93134,"Women on the 6th Floor, The (Les Femmes du 6èm..."
4,93061,October Baby (2011)


**Create a GUI interface using Python libraries(ipywidgetsetc.) to play around with the recommendation modules.**

In [None]:
import ipywidgets as widgets
from IPython.display import display

# Create widgets for user inputs
genre_widget = widgets.Text(description='Genre:')
threshold_widget = widgets.IntSlider(description='Min Rating Threshold:', min=0, max=5, step=0.5, value=3)
num_recommendations_widget = widgets.IntSlider(description='Num Recommendations:', min=1, max=10, step=1, value=5)

# Define the function to handle button click event
def recommend_movies(button):
    genre = genre_widget.value
    threshold = threshold_widget.value
    num_recommendations = num_recommendations_widget.value

    # Call the appropriate recommendation module based on user selection
    recommended_movies = popularity_recommender(genre, threshold, num_recommendations)
    # Or recommended_movies = collaborative_recommender(user_id, num_recommendations, threshold_similar_users)
    # Or recommended_movies = content_based_recommender(user_id, num_recommendations)

    # Display recommended movies
    display(recommended_movies)

# Create a button widget
recommend_button = widgets.Button(description='Recommend Movies')

# Register the click event of the button
recommend_button.on_click(recommend_movies)

# Display the widgets and button
display(genre_widget, threshold_widget, num_recommendations_widget, recommend_button)


Text(value='', description='Genre:')

IntSlider(value=3, description='Min Rating Threshold:', max=5, step=0)

IntSlider(value=5, description='Num Recommendations:', max=10, min=1)

Button(description='Recommend Movies', style=ButtonStyle())