Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace merge-sort with heapsort #405

Merged
merged 6 commits into from
May 16, 2024
Merged

Conversation

ali-bahjati
Copy link
Collaborator

@ali-bahjati ali-bahjati commented May 15, 2024

This change replaces the merge-sort with a heapsort that uses much less CU than merge-sort.

The previous algorithm is very fast running on a normal CPU but doesn't work very well in BPF because all instructions (like load, move, ..) have the same cost and there is no cache and optimizations for memory-alignment have no real benefit.

The major benefits of heapsort are being non-recursive that reduces the high stackframe overhead in BPF and being inplace which minimizes number of copies.

Unfortunately there is no way to systematically get the the compute usage out of program test. The test_benchmark file has a simple code that helps running benchmarks on various number of publishers.

In 32-publisher setup, heapsort reduces the CU from 16.5k to 12k and in the 64-publisher setup 37k to 20.5k. The numbers are the worst cases running on randomized input. The result of running in a highly similar input (two distinct prices, and two distinct confidences) is 14.7k and 25k respectively, which is still better and is very unlikely in practice.

This change replaces the merge-sort with a heapsort that uses much less
CU than merge-sort.

The previous algorithm is very fast running on a normal CPU but doesn't
work very well in BPF because all instructions (like load, move, ..)
have the same cost and there is no cache and optimizations for
memory-alignment have no real benefit.

A major benefit of heapsort is being non-recursive that reduces the
high stackframe overhead in BPF and is inplace which minimizes number
of copies.

Unfortunately there is no way to systematically get the the compute
usage out of program test. The `test_benchmark` file has a simple
code that helps running benchmarks on various number of publishers.

In 32-publisher setup, heapsort reduces the CU from 16.5k to 12k
and in the 64-publisher setup 37k to 20.5k. The numbers are the worst
cases running on randomized input.
jayantk
jayantk previously approved these changes May 15, 2024
Copy link
Contributor

@jayantk jayantk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nicely done

program/c/src/oracle/sort/tmpl/sort_stable.c Show resolved Hide resolved
program/c/src/oracle/model/price_model.h Show resolved Hide resolved
@@ -188,8 +188,7 @@ static inline bool upd_aggregate( pc_price_t *ptr, uint64_t slot, int64_t timest
// note: numv>0 and nprcs = 3*numv at this point
int64_t agg_p25;
int64_t agg_p75;
int64_t scratch[ PC_NUM_COMP * 3 ]; // ~0.75KiB for current PC_NUM_COMP (FIXME: DOUBLE CHECK THIS FITS INTO STACK FRAME LIMIT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that it's in place now

Reisen
Reisen previously approved these changes May 16, 2024
program/c/src/oracle/model/test_price_model.c Show resolved Hide resolved
Copy link
Contributor

@guibescos guibescos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tyvm

@ali-bahjati ali-bahjati merged commit 2ea67f3 into main May 16, 2024
3 checks passed
@ali-bahjati ali-bahjati deleted the optimize/use-heapsort branch May 16, 2024 15:24
ali-bahjati added a commit that referenced this pull request May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants