Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly improve performance of SVD and SVDpp (cleaner) #401

Merged
merged 20 commits into from Aug 14, 2022

Conversation

ProfHercules
Copy link
Contributor

@ProfHercules ProfHercules commented Jan 22, 2022

All changes are documented more clearly in the commits. Before starting to make changes, I ran all tests using Python 3.9, they all passed. After all changes, all the tests still pass.

Initially, on my M1 MacBook Air, examples/bench_mf.py produced the following:

Movielens 100k RMSE MAE Time
SVD 0.936 0.738 0:00:12
SVD++ 0.922 0.723 0:07:08
NMF 0.964 0.758 0:00:11

After all changes, the run time looks like this:

Movielens 100k RMSE MAE Time
SVD 0.936 0.738 0:00:02
SVD++ 0.922 0.723 0:00:32
NMF 0.964 0.758 0:00:11

Which is a 6x improvement in speed for SVD and a 13.38x improvement for SVDpp.

I also ran the same benchmark using the MovieLens 1M dataset, and got the following:

Movielens 1M RMSE MAE Time
SVD 0.874 0.686 0:00:25
SVD++ 0.862 0.672 0:10:37
NMF 0.917 0.725 0:01:49

The primary differences between this PR and #400 is

  • the commit history is cleaner
  • the code is changed as little as possible, to make changes clearer
  • memory usage is largely unchanged (previous implementation sort-of doubled memory usage 馃槄).
  • only the algorithms with code changes have differing performance.

Note: I documented the process here, feel free to give it a read!

- significant performance gains
- we don't ever use negative indexing, and our code is always within numpy array bounds, so this should be safe
@NicolasHug
Copy link
Owner

NicolasHug commented Aug 14, 2022

@ProfHercules Thank you so much for your dedication and for taking the time to submit a PR. I'm sorry I couldn't get to it sooner.

Since you submitted, I've made a bunch of modifications on the master branch. In master I have removed some old code relying on six, which is where a bunch of Python interaction were coming from when using range() in Cython. Also I enabled by default a bunch of compiler directive like the ones you were using here (bouncheck=False, etc.). I also updated the CI so that it can run small benchmarks when submitting a PR (it's the "Benchmark / build (3.9) (pull_request)" below).

Before changes on master: #422

Movielens 100k RMSE MAE Time
SVD 0.936 0.738 0:00:30
SVD++ 0.922 0.723 0:16:16
NMF 0.964 0.758 0:00:30

After changes on master #421

Movielens 100k RMSE MAE Time
SVD 0.936 0.738 0:00:07
SVD++ 0.922 0.723 0:05:24
NMF 0.964 0.758 0:00:09

This PR:

Movielens 100k RMSE MAE Time
SVD 0.936 0.738 0:00:05
SVD++ cache_ratings=False 0.922 0.723 0:01:54
SVD++ cache_ratings=True 0.921 0.722 0:01:38
NMF 0.965 0.759 0:00:08

I've also made a bunch of changes to this PR. Some of them are cosmetic, some of them are maybe more relevant:

  • removed the caching the of the sqrt(Iu_length) as I didn't notice any significant performance hit, and it saves of bit of memory
  • put back the range() calls since they don't generate interactions anymore (note: to match the Cython options from setup.py we need to use -X wraparound=False,boundscheck=False,cdivision=True,language_level=3 now).
  • I removed the use of C++: I started removing the map to just use a vector[vector[int]] but in the end a good old malloced int ** could work just as well here, so I went for that. Hopefully I didn't introduce any memory leak.
  • The original code in this PR requires duplicating n_ratings ints (the Ius), so it can be a bit expensive in terms of memory. So I implemented two version: cache_ratings=True which corresponds to your strategy, where all ratings are cached, and cache_ratings=False where we still retrieve Iu on the fly, but in a much more efficient data-structure and with fewer Python interations. Both are a lot faster than master, and cache_ratings=True is still a bit faster thancache_ratings=False - for a higher memory footprint.

On the benchmark above we observe similar speedups as you did: 6X for SVD and ~10X for SVDpp.

from codecs import open
from os import path

from setuptools import Extension, find_packages, setup
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this file are unrelated

@NicolasHug
Copy link
Owner

Merging now, thanks a lot again for your PR @ProfHercules , this is a very nice improvement!

@NicolasHug NicolasHug merged commit be66e8f into NicolasHug:master Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants