Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
[ENH] Improve performance of isotonic regression #2944
This supercedes #2940.
PAVA runs in linear time, but the existing algorithm was scaling
I implemented the O(N) version of PAVA using the decreasing subsequences
The benchmarks in this diff (adapted from one of the existing unit
On correctness - unit tests cover the isotonic regression code fairly
This supercedes #2940. PAVA runs in linear time, but the existing algorithm was scaling approximately quadratically, due to the array pop in the inner loop being O(N). I implemented the O(N) version of PAVA using the decreasing subsequences trick, and the performance is significantly faster in benchmarking. The benchmarks in this diff (adapted from one of the existing unit tests) show significant performance improvements - on the order of ~7x faster for problems of size ~1000, ~30x faster for problems of size 10,000, and ~250x faster for problems of size 100,000. On correctness - unit tests cover the isotonic regression code fairly well, and all pass before and after the change. It's a fairly well known algorithm with a bunch of implementations, so I think this is correct. In coding up this algorithm I made some mistakes and the unit tests caught the failures, which makes me more confident in the correctness now. Still, the performance improvements are surprisingly high.
I'll update the link in
@NelleV, I suspect the performance differences are down to a few things:
understandable. For an example usage, run: ``` python benchmarks/bench_isotonic.py --iterations 10 --log_min_problem_size 2 --log_max_problem_size 8 --dataset logistic ```