-
-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize K-means #97
Optimize K-means #97
Conversation
Hi, thanks for the PR! I've been busy today but I will gladly review this tomorrow 👍🏻 |
I'm taking some more time to review this because on my machine these changes actually bring the performance down significantly, at least for what concerns the benchmarks, but I am also getting some outliers so I'd like to investigate this a little more. Also, I want to make sure that the performance is not impacted by the change in the bencher (it shouldn't but I would still like to check). These are the results that I get when I run
Benchmarking naive_k_means/10: Warming up for 3.0000 s
naive_k_means/10 time: [1.0439 ms 1.1075 ms 1.1703 ms]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
naive_k_means/100 time: [5.1121 ms 5.4133 ms 5.7409 ms]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
naive_k_means/1000 time: [35.332 ms 37.381 ms 39.503 ms]
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
Benchmarking naive_k_means/10000: Warming up for 3.0000 s
naive_k_means/10000 time: [233.40 ms 243.14 ms 253.00 ms]
naive_k_means/naive_k_means/10
time: [5.3906 ms 8.3629 ms 11.660 ms]
Found 19 outliers among 100 measurements (19.00%)
19 (19.00%) high severe
naive_k_means/naive_k_means/100
time: [16.939 ms 24.063 ms 31.794 ms]
Found 24 outliers among 100 measurements (24.00%)
24 (24.00%) high severe
Benchmarking naive_k_means/naive_k_means/1000: Warming up for 3.0000 s
naive_k_means/naive_k_means/1000
time: [66.666 ms 94.997 ms 126.49 ms]
Found 19 outliers among 100 measurements (19.00%)
1 (1.00%) high mild
18 (18.00%) high severe
Benchmarking naive_k_means/naive_k_means/10000: Warming up for 3.0000 s
naive_k_means/naive_k_means/10000
time: [284.80 ms 436.71 ms 613.42 ms]
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
Have you tried running the benches on your machine? Do you get similar results? |
It's also slower on my machine, but I'm on a 2-core i7-6500U that runs faster without Rayon so I didn't take the results too seriously. I mainly wanted someone with a better machine to confirm my results. I'm also curious as to what flamegraphs you get from the benchmarks because my flamegraphs for both version indicated that the assignment step was the bottleneck, not the averaging step. |
I'm seeing similar results as you in my benchmarks, but my flamegraphs indicate that the averaging step isn't the bottleneck. |
These are my results with
With all changes
The perf problems seems to be caused entirely by the outliers, since the median is so much lower than the mean. |
I've tried profiling the code with flamegraph as well as measuring the times directly and To see how weird this is, go to line 272 of
and see how that affects the benchmarks. |
I will try to run the benchmarks now. This seems really strange, one thing which stood out to me that when you are pre-filling both hashmap and vector your performance degrades, so probably there lies the problem. In commit 561d9b7 you could also use a 2d array directly with |
Without changes to
With changes (794d8a5)
With changes: (4e38c38)
cpu info
|
I've looked a bit more and the problem seems to be with the benchmarks. For some reason if I move all the initialization code out of the benchmark function I observe more reasonable performance numbers. I'm still observing runtime spikes with cluster size of 100, which looks to be RNG-dependent. |
which is still strange, because only the code in |
just saw that this was your original version |
reducing the number of iterations to |
can you try to reinitialize the rng for each new parameter set, this gives me much more consistent results |
I did some more digging and the real reason behind the perf issues was because some inputs run for all 1000 iterations, which doesn't usually happen in the benchmarks. This happens because somewhere along the way the centroids are calculated as NaN, which messes up the algorithm. |
Performance issues were caused by empty clusters producing an average of NaN (bug in my code, not master). I'm setting empty centroids to 0 right now but there may be a better value. This may also indicate an issue with the random cluster initialization we're using (can we use a better initialization?). |
Maybe it could be a good idea to look into kmeans++ for center initialization? I see that the |
I'm getting similar issues with |
btw: have you heard about the iai project? It's from the same author as criterion and can directly measure the cache/memory access and instruction counts instead of the wall-clock. The beauty is that this can be reliable deployed to CI systems (as opposed to criterion.rs) and only need a single pass. We could probably use the metrics for a dashboard like the rustc performance and publish it with every new release |
so what are the actions here now: should we merge the PR? Does the performance gain justify it to have two implementations, one for increment K-Means and not? |
That project sounds promising, especially since it can be integrated with the CI system, it would give benchmarks some more importance, which is cool, and having a dashboard like the one you linked in the website I believe could get some more people interested in the project, I will definitely explore it a little bit 👍🏻
This evening I'll be able to run the benchmarks again with the latest changes to feel the extent of the improvements. Why do you mention two implementations? Doesn't this PR just overwrite the existing one? |
sorry I meant that in context of scikit: it contains two implementation KMeans and MiniBatchKMeans. The moving average is currently only used in our single-fit implementation, but could be used for a incremental version (by implement https://github.com/rust-ml/linfa/blob/master/src/traits.rs#L38) in the future. |
The PR changed the averaging algorithm from moving average to standard average. For handling cluster collapse I vastly prefer returning an Error. I'm definitely against setting to NaN because it causes perf drops in the benchmarks. Center initialization and incremental fit are out of scope for this PR. |
Just by reading scikit-learn's user guide, it would seem to imply that they do use two different approaches for the mini-batches and regular k-means, so maybe it would make sense to keep the non-moving average implementation for k-means if it increases performance. With a future incremental fit implementation in mind, maybe it is better to keep the
What is the behavior of the implementation in master in such cases? |
when we merge I would like to keep the moving average implementation as well: just ran the benchmark new version vs. old version:
|
On my machine:
I would say that the improvement in speed is quite good 👍🏻 |
Since master uses moving averages, it just sets the centroid to 0. |
okay then let's merge and collect what's missing in an issue 👍 I have to rerun the benchmark cause my cache was invalid
18.03.2021 21:00:07 Yuhan Lin ***@***.***>:
…> What is the behavior of the implementation in master in such cases?
> Since master uses moving averages, it just sets the centroid to 0.
> I'll look into the perf of moving average vs standard average. Also, the benchmarks for this PR has more outliers than master, which is strange.
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub[#97 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAHRRKJ5INF7Z35JAHWSJWLTEJLUDANCNFSM4ZF7D3QQ].
[data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEIAAABCCAYAAADjVADoAAAAAXNSR0IArs4c6QAAAARzQklUCAgICHwIZIgAAAAnSURBVHic7cEBDQAAAMKg909tDwcUAAAAAAAAAAAAAAAAAAAAAPBjRFIAASHKmKkAAAAASUVORK5CYII=###24x24:true###][Verfolgungsbild][https://github.com/notifications/beacon/AAHRRKLFJVEQLCSK3RRRF23TEJLUDA5CNFSM4ZF7D3Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOF7IVKUI.gif]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you undelete algorithms/linfa-clustering/src/k_means/helpers.rs
so that we keep the incremental mean implementation even if it is not currently used?
nah you can always find the file in the git history, no need to add dangling files 😆 |
without changes:
with changes:
👍 if no objections are raised I would merge this now |
Optimized the averaging step of K_means to use non-moving averages and arrays instead of hashmaps. The code also updates the new centroid in-place. Also fixed deprecation warning on all benchmarks in linfa-clustering (other benchmarks may have the same warnings). I'm curious as to the effect of my changes on benchmark speeds, since my machine isn't suited to running benchmarks.