-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: replace merge-sort with heapsort #405
Conversation
This change replaces the merge-sort with a heapsort that uses much less CU than merge-sort. The previous algorithm is very fast running on a normal CPU but doesn't work very well in BPF because all instructions (like load, move, ..) have the same cost and there is no cache and optimizations for memory-alignment have no real benefit. A major benefit of heapsort is being non-recursive that reduces the high stackframe overhead in BPF and is inplace which minimizes number of copies. Unfortunately there is no way to systematically get the the compute usage out of program test. The `test_benchmark` file has a simple code that helps running benchmarks on various number of publishers. In 32-publisher setup, heapsort reduces the CU from 16.5k to 12k and in the 64-publisher setup 37k to 20.5k. The numbers are the worst cases running on randomized input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nicely done
f04cafb
to
0302196
Compare
0302196
to
c581da7
Compare
@@ -188,8 +188,7 @@ static inline bool upd_aggregate( pc_price_t *ptr, uint64_t slot, int64_t timest | |||
// note: numv>0 and nprcs = 3*numv at this point | |||
int64_t agg_p25; | |||
int64_t agg_p75; | |||
int64_t scratch[ PC_NUM_COMP * 3 ]; // ~0.75KiB for current PC_NUM_COMP (FIXME: DOUBLE CHECK THIS FITS INTO STACK FRAME LIMIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that it's in place now
3e13613
to
9afe6e8
Compare
9afe6e8
to
5647ad0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tyvm
This reverts commit 2ea67f3.
This change replaces the merge-sort with a heapsort that uses much less CU than merge-sort.
The previous algorithm is very fast running on a normal CPU but doesn't work very well in BPF because all instructions (like load, move, ..) have the same cost and there is no cache and optimizations for memory-alignment have no real benefit.
The major benefits of heapsort are being non-recursive that reduces the high stackframe overhead in BPF and being inplace which minimizes number of copies.
Unfortunately there is no way to systematically get the the compute usage out of program test. The
test_benchmark
file has a simple code that helps running benchmarks on various number of publishers.In 32-publisher setup, heapsort reduces the CU from 16.5k to 12k and in the 64-publisher setup 37k to 20.5k. The numbers are the worst cases running on randomized input. The result of running in a highly similar input (two distinct prices, and two distinct confidences) is 14.7k and 25k respectively, which is still better and is very unlikely in practice.