This project is an attempt at comparing the performance various distrubted SVD implementations, under reasonable configurations for their underlying frameworks:
- Mahout's Lanczos' Method based SVD.
- Mahout's Stochastic SVD based SVD.
- Spark's Lanczos' Method based SVD solver.
Caveat Emptor: Performance benchmarks are for charlatans and snake oil peddlers. Below are my performance benchmarks.
When I spent a reasonable amount of time configuring and tuning my cluster and jobs, these were the timing results I got:
If you use this repo to run a comparison, your results are very welcome in the form of a PR!