# Hybrid Film Recommendation System  <a class="tocSkip">

This project aims to build a **hybrid film recommendation system** that combines multiple approaches for suggesting movies to users. The system will leverage techniques such as *content-based filtering*, *collaborative filtering*, and *popularity-based recommendations* to generate personalized and diverse recommendations. By integrating these methods, we expect to create a more robust and accurate recommendation system that caters to users' preferences and interests while also considering popular and trending movies. The final recommendation will be an aggregation of the individual methods, ensuring a well-rounded and comprehensive set of suggestions for each user.


In [1]:
import pandas as pd
import numpy as np
import os
from IPython.display import display, HTML

display(HTML("<style>.container { width:90% !important; }</style>"))

In [2]:
movie_data = pd.read_csv('data/film_data/prepared_film_data.csv')

Integrating a **popularity-based recommendation system** within a comprehensive movie recommendation engine is important for providing broadly appealing suggestions, particularly for users with limited interaction data. This approach complements personalized methods like content-based and collaborative filtering, ensuring a diverse set of recommendations that balances familiar favorites and undiscovered gems to enhance users' overall movie-watching experience.

In [57]:
#my simple popularity based score takes into account avg. rating, vote count and release year (favoring more recent films)

def calculate_popularity_score(df, alpha = 0.2, beta=1, gamma=0.002): #alpha, beta and gamma ar weights for votecount, avg.rating, release year
    df['norm_numVotes'] = df['numVotes'] / df['numVotes'].max()
    df['norm_averageRating'] = df['averageRating'] / df['averageRating'].max()
    df['norm_startYear'] = (df['startYear'] - df['startYear'].min()) / (df['startYear'].max() - df['startYear'].min())
    df['popularity_score'] = alpha *df['norm_numVotes'] + beta * df['norm_averageRating'] + gamma * df['norm_startYear']
    df = df.drop(columns=['norm_numVotes', 'norm_averageRating', 'norm_startYear'])
    
    return df

In [58]:
movie_data_with_popularity = calculate_popularity_score(movie_data)
movie_data_with_popularity = movie_data_with_popularity.sort_values('popularity_score', ascending=False)

In [59]:
movie_data_with_popularity.head(10)

Unnamed: 0,tconst,primaryTitle,isAdult,startYear,runtimeMinutes,averageRating,numVotes,primaryName,Action,Adult,...,Romance,Sci-Fi,Short,Sport,Talk-Show,Thriller,War,Western,\N,popularity_score
22163,tt0111161,The Shawshank Redemption,0,1994.0,142.0,9.3,2733330.0,Frank Darabont,0,0,...,0,0,0,0,0,0,0,0,0,1.13155
183357,tt0468569,The Dark Knight,0,2008.0,152.0,9.0,2706445.0,Christopher Nolan,1,0,...,0,0,0,0,0,0,0,0,0,1.0998
152940,tt0068646,The Godfather,0,1972.0,175.0,9.2,1900604.0,Francis Ford Coppola,0,0,...,0,0,0,0,0,0,0,0,0,1.060278
206172,tt1375666,Inception,0,2010.0,148.0,8.8,2402346.0,Christopher Nolan,1,0,...,0,1,0,0,0,0,0,0,0,1.05758
153386,tt0110912,Pulp Fiction,0,1994.0,154.0,8.9,2100399.0,Quentin Tarantino,0,0,...,0,0,0,0,0,0,0,0,0,1.045238
23700,tt0137523,Fight Club,0,1999.0,139.0,8.8,2176102.0,David Fincher,0,0,...,0,0,0,0,0,0,0,0,0,1.040855
180300,tt0167260,The Lord of the Rings: The Return of the King,0,2003.0,201.0,9.0,1880210.0,Peter Jackson,1,0,...,0,0,0,0,0,0,0,0,0,1.039266
120287,tt0109830,Forrest Gump,0,1994.0,142.0,8.8,2126909.0,Robert Zemeckis,0,0,...,1,0,0,0,0,0,0,0,0,1.037178
180224,tt0120737,The Lord of the Rings: The Fellowship of the Ring,0,2001.0,178.0,8.8,1909123.0,Peter Jackson,1,0,...,0,0,0,0,0,0,0,0,0,1.021351
270191,tt0133093,The Matrix,0,1999.0,136.0,8.7,1949993.0,Lana Wachowski,1,0,...,0,1,0,0,0,0,0,0,0,1.01431
