Skip to content

Recommending movies using Collaborative Filtering and Locality Sensitive Hashing in PySpark

Notifications You must be signed in to change notification settings

nikhitmago/movie-recommender-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movie Recommender System

Version: Spark – 2.2.1, Python – 2.7

Command to run on terminal:

spark-submit [CF python file] [input file] [testing file]

The above command will generate an output file in the current directory

  • Implementation of User-User, Item-Item and Model Collaborative Filtering methods on the MovieLens Database.
  • Locality Sensitive Hashing was used to speed up computation of Item-Item pairs
  • Item-Item performs best with lowest RMSE of 0.94

Approximate running times:

  • User-user: 600s
  • Item-item (with LSH): 400s
  • Model based: 14s

Note: testing file must be a subset of input file and both should resemble the ratings.csv file on MovieLens Database

About

Recommending movies using Collaborative Filtering and Locality Sensitive Hashing in PySpark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages