Skip to content
No description, website, or topics provided.
Jupyter Notebook Python Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
movielens
tests
.gitignore
README.md
compare.py
data_mining_movie.ipynb
data_mining_rating.ipynb
requirements.txt
setup.cfg

README.md

MovieLens

Content

  1. Research of time and rating distribution
  2. Research of genres and years
  3. movielens/estimators -- realization of rating estimators on Python.
  4. movielens/slope_one -- realization of Slope One estimator on Go.

Estimators

Estimators predict rating for given user and movie. In other words it is predicted rating that given user can give to given movie.

  1. GlobalMeanEstimator -- returns global mean rating by all movies. This is most dummy estimation. Every another smart estimator must be better.
  2. GroupMeanEstimator -- returns mean rating for this genre and year. It's user independent estimation and can be used when we know nothing about user.
  3. MovieMeanEstimator -- return mean rating for given movie. Also user independent estimation.
  4. SimilarUsersEstimator -- estimate movie rating by weighted mean among similar users where weight is similarity of two users.
  5. SlopeOneGoEstimator -- Go implementation of Slope One algortihm. I've made fit of this model on Go because Python implementation (SlopeOneEstimator) is much slower (2 hours against 20 seconds).

Selectors

Selectors selects n most similar movies for given movie.

  1. GenreSelector -- returns best movies in genres of given movie. Also position in rating depends on count of common genres.
  2. SimilarSelector -- returns movies sorted by rating multiplied on genres and years similarity of movies to given movie.

Ideas to improve

  1. SVD really works great in recommendation systems. Netflix Prize winners BellKor team used SVD++ as the main algorithm of their solution. However, full solution composed from 27 (!!!) algorithms.
  2. Grid Search on Cross Validation to get best params for SVD.
  3. Write everything on Go. I really like Python, but learning time of this models makes me sad.
  4. kNN also can work quite good.
  5. Make Selector based on Slope One
  6. Make Selector that can work on any Estimator. For this we can get users that like given movie and build mean recommendation for them.
  7. VotingClassifier to use combination of Estimators as one. All modern recommendation systems build on combinations of algorithms.
  8. Use Evaluation to use combination of Selectors as one.

Results

name                       rmse  mae   fit     predict
GlobalMeanEstimator        1.95  1.54  00.00s  0.01s
SimilarUsersEstimator      1.85  1.44  40.08s  0.34s
MovieMeanEstimator         1.84  1.41  04.58s  0.01s
GroupMeanEstimator         1.74  1.34  06.48s  0.01s
SlopeOneGoEstimator        1.54  1.21  22.61s  0.09s

Run estimators comparing

python3.7 compare.py

Get recommendations

For user

Get datasets:

>>> from movielens import recommend, estimators, RatingData, MovieData, preprocess
>>> ratings = RatingData()
>>> movies = MovieData()
>>> preprocess(ratings=ratings, movies=movies)
>>> train, test = ratings.split(elements=100)

Train estimator:

>>> estimator = estimators.GroupMeanEstimator()
>>> estimator.fit(ratings=train, movies=movies)

Get recommendations:

>>> recs = recommend.by_user(
...     user=0,
...     estimator=estimator,
...     movies=ratings.movies,
...     count=6,
... )
>>> recs
[48, 433, 666, 1646, 2327, 2745]

Get movies titles:

>>> for rec in recs:
...     print(movies.get_title(rec))
...
Lamerica (1994)
What Happened Was... (1994)
Supercop 2 (Project S) (Chao ji ji hua) (1993)
Dirty Work (1998)
World Is Not Enough, The (1999)
Blood Simple (1984)

For movie

Get datasets:

>>> from movielens import recommend, selectors, RatingData, MovieData, preprocess
>>> ratings = RatingData()
>>> movies = MovieData()
>>> preprocess(ratings=ratings, movies=movies)
>>> train, test = ratings.split(elements=100)

Train estimator:

>>> selector = selectors.GenreSelector()
>>> selector.fit(ratings=train, movies=movies)

Get recommendations:

>>> recs = recommend.by_movie(
...     movie=0,  # Toy Story
...     selector=selector,
...     movies=ratings.movies,
...     count=6,
... )
>>> recs
[7742, 7338, 7787, 7899, 8656, 8689]

Get movies titles:

>>> for rec in recs:
...     print(movies.get_title(rec))
...
Immortals (2011)
Blue Valentine (2010)
Iron Lady, The (2011)
Madagascar 3: Europe's Most Wanted (2012)
Patton Oswalt: My Weakness Is Strong (2009)
Ant-Man (2015)
You can’t perform that action at this time.