Significantly improve performance of SVD and SVDpp (cleaner) #401

ProfHercules · 2022-01-22T20:19:45Z

All changes are documented more clearly in the commits. Before starting to make changes, I ran all tests using Python 3.9, they all passed. After all changes, all the tests still pass.

Initially, on my M1 MacBook Air, examples/bench_mf.py produced the following:

Movielens 100k	RMSE	MAE	Time
SVD	0.936	0.738	0:00:12
SVD++	0.922	0.723	0:07:08
NMF	0.964	0.758	0:00:11

After all changes, the run time looks like this:

Movielens 100k	RMSE	MAE	Time
SVD	0.936	0.738	0:00:02
SVD++	0.922	0.723	0:00:32
NMF	0.964	0.758	0:00:11

Which is a 6x improvement in speed for SVD and a 13.38x improvement for SVDpp.

I also ran the same benchmark using the MovieLens 1M dataset, and got the following:

Movielens 1M	RMSE	MAE	Time
SVD	0.874	0.686	0:00:25
SVD++	0.862	0.672	0:10:37
NMF	0.917	0.725	0:01:49

The primary differences between this PR and #400 is

the commit history is cleaner
the code is changed as little as possible, to make changes clearer
memory usage is largely unchanged (previous implementation sort-of doubled memory usage 😅).
only the algorithms with code changes have differing performance.

Note: I documented the process here, feel free to give it a read!

- adds dependency on C++ compilation

- significant performance gains - we don't ever use negative indexing, and our code is always within numpy array bounds, so this should be safe

NicolasHug · 2022-08-14T16:51:11Z

@ProfHercules Thank you so much for your dedication and for taking the time to submit a PR. I'm sorry I couldn't get to it sooner.

Since you submitted, I've made a bunch of modifications on the master branch. In master I have removed some old code relying on six, which is where a bunch of Python interaction were coming from when using range() in Cython. Also I enabled by default a bunch of compiler directive like the ones you were using here (bouncheck=False, etc.). I also updated the CI so that it can run small benchmarks when submitting a PR (it's the "Benchmark / build (3.9) (pull_request)" below).

Before changes on master: #422

Movielens 100k	RMSE	MAE	Time
SVD	0.936	0.738	0:00:30
SVD++	0.922	0.723	0:16:16
NMF	0.964	0.758	0:00:30

After changes on master #421

Movielens 100k	RMSE	MAE	Time
SVD	0.936	0.738	0:00:07
SVD++	0.922	0.723	0:05:24
NMF	0.964	0.758	0:00:09

This PR:

Movielens 100k	RMSE	MAE	Time
SVD	0.936	0.738	0:00:05
SVD++ cache_ratings=False	0.922	0.723	0:01:54
SVD++ cache_ratings=True	0.921	0.722	0:01:38
NMF	0.965	0.759	0:00:08

I've also made a bunch of changes to this PR. Some of them are cosmetic, some of them are maybe more relevant:

removed the caching the of the sqrt(Iu_length) as I didn't notice any significant performance hit, and it saves of bit of memory
put back the range() calls since they don't generate interactions anymore (note: to match the Cython options from setup.py we need to use -X wraparound=False,boundscheck=False,cdivision=True,language_level=3 now).
I removed the use of C++: I started removing the map to just use a vector[vector[int]] but in the end a good old malloced int ** could work just as well here, so I went for that. Hopefully I didn't introduce any memory leak.
The original code in this PR requires duplicating n_ratings ints (the Ius), so it can be a bit expensive in terms of memory. So I implemented two version: cache_ratings=True which corresponds to your strategy, where all ratings are cached, and cache_ratings=False where we still retrieve Iu on the fly, but in a much more efficient data-structure and with fewer Python interations. Both are a lot faster than master, and cache_ratings=True is still a bit faster thancache_ratings=False - for a higher memory footprint.

On the benchmark above we observe similar speedups as you did: 6X for SVD and ~10X for SVDpp.

NicolasHug · 2022-08-14T16:55:59Z

setup.py

 from codecs import open
 from os import path

+from setuptools import Extension, find_packages, setup


The changes to this file are unrelated

NicolasHug · 2022-08-14T17:21:28Z

Merging now, thanks a lot again for your PR @ProfHercules , this is a very nice improvement!

ProfHercules added 4 commits January 22, 2022 20:38

Add benchmark for matrix_factorization only

14a91d1

Optimize SVDpp training loop by reducing Python / C-API interactions

8fbd660

- adds dependency on C++ compilation

Optimize SVD training loop in the same way as SVDpp

7d44b79

Disable bounds checking & negative indexing for sgd() in SVD and SVDpp

12e57dc

- significant performance gains - we don't ever use negative indexing, and our code is always within numpy array bounds, so this should be safe

ProfHercules force-pushed the mf_optimize branch from c60896f to 12e57dc Compare January 23, 2022 05:40

NicolasHug added 12 commits August 14, 2022 11:50

Merge branch 'master' of github.com:NicolasHug/Surprise into mf_optimize

819ebc4

Remove boundschecks and wraparound annotations, theyre on by default

7710c43

Use memory views when possible

7db2d52

Put back for loops now that six is removed

1d3235f

Slightly optimize NMF

b17b8e4

Use vector of vector instead of map

1f5d1da

User pure C instead of C++. I <3 memory leaks

e3f028e

Also test precompute=True

5251edb

Merge branch 'master' of github.com:NicolasHug/Surprise into mf_optimize

683707b

Update benchmarks

eaf6e4e

Put back 1.1.1 version number

f422154

rm bench_mf.py

ef3d5ee

NicolasHug added 2 commits August 14, 2022 17:52

Docs

16bfc3b

Set random_state in benchmarks

eddd0c5

NicolasHug reviewed Aug 14, 2022

View reviewed changes

NicolasHug added 2 commits August 14, 2022 18:00

precompute -> cache_ratings

8d37d6b

empty commit

f1da845

NicolasHug merged commit be66e8f into NicolasHug:master Aug 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly improve performance of SVD and SVDpp (cleaner) #401

Significantly improve performance of SVD and SVDpp (cleaner) #401

ProfHercules commented Jan 22, 2022 •

edited

NicolasHug commented Aug 14, 2022 •

edited

NicolasHug Aug 14, 2022

NicolasHug commented Aug 14, 2022

Significantly improve performance of SVD and SVDpp (cleaner) #401

Significantly improve performance of SVD and SVDpp (cleaner) #401

Conversation

ProfHercules commented Jan 22, 2022 • edited

NicolasHug commented Aug 14, 2022 • edited

NicolasHug Aug 14, 2022

Choose a reason for hiding this comment

NicolasHug commented Aug 14, 2022

ProfHercules commented Jan 22, 2022 •

edited

NicolasHug commented Aug 14, 2022 •

edited