Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANISE dataset performance issues -- breaking change #252

Merged
merged 20 commits into from
Jun 7, 2024

Conversation

ChristopherRabotin
Copy link
Member

@ChristopherRabotin ChristopherRabotin commented May 31, 2024

Summary

Breaking refactoring of the DataSet type.

This change allocates the data on the heap instead of keeping it "virtually" on the stack. The prior implementation stored the underlying bytes on the heap though (via Bytes), so any just-in-time and on the stack operation still read from the heap. This change causes the ANISE binary files from version 0.3 to no longer be compatible. However, it also leads to significant performance improvements, notably when reading frame information.

Benchmark from master: https://github.com/nyx-space/anise/actions/runs/9361419937/job/25768385038:
time: [6.9400 µs 6.9446 µs 6.9505 µs]
Benchmark from this branch in test suite:
time: [5.3151 ns 5.3206 ns 5.3268 ns]

  • Fix CRC32 on Dataset loading
  • Build flamegraph from Nyx using this branch to confirm

Closes #248

Architectural Changes

DataSet items are decoded only once on load instead of being decoded just-in-time every time they were queried. This leads to 100x improvement in querying time.

New Features

No change

Improvements

Significant performance improvements.

Bug Fixes

No change

Testing and validation

Detail the changes in tests, including new tests and validations

Documentation

This PR does not primarily deal with documentation changes.

@ChristopherRabotin
Copy link
Member Author

So far, this proposal leads to a decrease in performance by a factor of five. So this should not be merged as such.

I will try to remove the cloning of the data when getting it from the vector, cf.

. That will lead to an improvement in performance, not sure if it'll be large enough to matter.

@ChristopherRabotin
Copy link
Member Author

The benchmark test was not correct and measured the cost of cloning the Almanac (out of an Arc too). Removing that clone and using the same approach as the other benchmarks led to a significant performance increase as reflected in the updated times above.

This performance increase is also seen in Nyx, flamegraphs added here: #248 (comment)

@ChristopherRabotin
Copy link
Member Author

One of the two MacOS build seems to be dependent on the released version. Not sure why, but I'm not worried about it since it worked on the other MacOS build.

@ChristopherRabotin ChristopherRabotin merged commit 0d967cc into master Jun 7, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ANISE Dataset performance issues
1 participant