Skip to content

scottenriquez/netflix-prize

Repository files navigation

netflix-prize

This Python 2.7 software analyzes data from the Netflix Prize challenge that sought to achieve the lowest possible root mean square error. For this project, my software obtained an RMSE of approximately 0.9 stars for the data provided by Netflix.

The movie rating files contain over 100 million ratings from 480,000 randomly-chosen, anonymous Netflix customers over 17,000 movie titles. The data was collected between October 1998 and December 2005 and reflect the distribution of all ratings received during this period. The ratings are on a scale from 1 to 5 (integral) stars. To protect customer privacy, each customer ID has been replaced with a randomly-assigned ID. The date of each rating and the title and year of release for each movie ID are also provided. The original data can be found here. For the sake of performance and simplicity, I built caches for averages.

To run the code, use the command python RunNetflix.py < RunNetflix.in > RunNetflix.out. To test the code, use the command python TestNetflix.py > TestNetflix.out

About

Python software to make movie rating predictions for Netflix users

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages