Movie Recommender System

A Movie Recommender Systems Based on Tf-idf and Popularity.

Overview

We developed this content-based movie recommender based on two attributes, overview and popularity. Firstly, we calculate similarities between any two movies by their overview tf-idf vectors. Script rec.py stops here. This is a basic recommender only evaluated by overview. Factors, like rating, actors, publish year and popularity, etc. can also contribute to the recmmended result. Script rec_pop.py add one more feature, popularity, as input to adjust final results. The weighted similarity score consist of 80% tf-idf vector and 20% popularity.

Scripts in detail

rec.html
The script is a basic user interface for input.

<h1>Please Enter Movie Title to Get Recommendation</h1>
<form class="search" action="rec.php" method="get" style="margin:auto;max-width:300px">
  <input type="text" placeholder="Search Your Favourite Movie" name="user_query">
  <button type="submit" name="search"><i class="fa fa-search"></i></button>
</form>

rec.php
The php script gets user input, forwards the user input variable to the python recommender script and calls the recommender script to run. Once the the recommender script finishing running, it output the final result to user by browser.

<?php 
//Form data handling
$input = $_GET["user_query"];

//Execute python script
$command = escapeshellcmd('./rec_pop.py ' . $input);
$output = shell_exec($command);
?>
<h2> <?php echo $output ?> </h2>

rec.py
This is the main script for recommender system. We connected MySQL databse, read dataset into pandas dataframe, build a tf-idf matrix for overview, compute the cosine similarity matrix and get recommendations by the cosine similarity matrix.

import sys
titleList = []
i=1
while i<len(sys.argv):
	titleList.append(sys.argv[i])
	i += 1
title = " ".join(titleList)

import mysql.connector
mydb = mysql.connector.connect(
  host="localhost",
  user="username",
  passwd="password",
  database="table"
)
import pandas as pd
import numpy as np
df = pd.read_sql('SELECT Title, Overview, Popularity FROM movie2 WHERE Overview <> "" and Title <> ""', con=mydb)

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['Overview'])
from sklearn.metrics.pairwise import linear_kernel
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
indices = pd.Series(df.index, index=df['Title']).drop_duplicates()

def get_recommendations(title, cosine_sim=cosine_sim):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return df['Title'].iloc[movie_indices]

movieRec = get_recommendations(title)
movie1 = movieRec.iloc[0]
movie2 = movieRec.iloc[1]
movie3 = movieRec.iloc[2]
movie4 = movieRec.iloc[3]
movie5 = movieRec.iloc[4]
movie6 = movieRec.iloc[5]
movie7 = movieRec.iloc[6]
movie8 = movieRec.iloc[7]
movie9 = movieRec.iloc[8]
movie10 = movieRec.iloc[9]
print ('You may also like: 1. '+ movie1 + ' 2. ' + movie2 + ' 3. ' + movie3 + ' 4. ' + movie4 + ' 5. ' + movie5 + ' 6. ' + movie6 + ' 7. ' + movie7 + ' 8. ' + movie8 + ' 9. ' + movie9 + ' 10. ' + movie10)

rec_pop.py
The rec_pop.py is independent from rec.py. Everything keeps the same until the get_recommendations function. We replaced get_recommendations to get_tfidf since we will not get results only by overview vector.

def get_tfidf(title, cosine_sim=cosine_sim):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    get_tfidf.scores = sim_scores
    movie_indices = [i[0] for i in sim_scores]
    return df['Title'].iloc[movie_indices]
def get_pop(title, cosine_sim=cosine_sim):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return df['Popularity'].iloc[movie_indices]
movieSimiliar = get_tfidf(title)

weight_sim=[]
for x in get_tfidf.scores:
    weight_scores = x[1]*0.8
    weight_sim.append(weight_scores)
weight_pop=[]
for x in get_pop(title):
    weight_scores = x*0.01*0.2
    weight_pop.append(weight_scores)
weighted_scores = np.add(weight_sim, weight_pop)
weighted_index=[]
for x in get_tfidf.scores:
    index = x[0]
    weighted_index.append(index)
weighted_scoresIndex = list(zip(weighted_index, weighted_scores))
weighted_scoresIndex = sorted(weighted_scoresIndex, key = lambda x: x[1], reverse=True)

def get_recommendations(title, cosine_sim=cosine_sim):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = weighted_scoresIndex
    movie_indices = [i[0] for i in sim_scores]
    return df['Title'].iloc[movie_indices]
movieRec = get_recommendations(title)
movie1 = movieRec.iloc[0]
movie2 = movieRec.iloc[1]
movie3 = movieRec.iloc[2]
movie4 = movieRec.iloc[3]
movie5 = movieRec.iloc[4]
movie6 = movieRec.iloc[5]
movie7 = movieRec.iloc[6]
movie8 = movieRec.iloc[7]
movie9 = movieRec.iloc[8]
movie10 = movieRec.iloc[9]
print ('You may also like: 1. '+ movie1 + ' 2. ' + movie2 + ' 3. ' + movie3 + ' 4. ' + movie4 + ' 5. ' + movie5 + ' 6. ' + movie6 + ' 7. ' + movie7 + ' 8. ' + movie8 + ' 9. ' + movie9 + ' 10. ' + movie10)

Methodology on blog

https://cwang.netlify.com/post/movie_recommendation_engine/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommender System

Overview

Scripts in detail

Methodology on blog

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
rec.html		rec.html
rec.php		rec.php
rec.py		rec.py
rec_pop.py		rec_pop.py

mangfu1231/recommender

Folders and files

Latest commit

History

Repository files navigation

Movie Recommender System

Overview

Scripts in detail

Methodology on blog

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages