perf(license-detection): add rkyv-based license index cache#651
Merged
perf(license-detection): add rkyv-based license index cache#651
Conversation
|
FYI @mmurto. This sounds like the feature we wanted to get ported from scannerust, right? |
2c235cf to
8cd74b8
Compare
Collaborator
Author
|
@sschuberth yes exactly. I benchmarked a couple of options and simply chose the fastest. At 340MB the cache file is still smaller than ScanCode's so I think that should be fine but feel free to give your opinion on the tradeoffs between cache file size and startup speed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
~/.cache/provenant/license_index/; subsequent runs load the cached index via rkyv zero-copy deserializationEvaluated alternatives
rkyv is ~4x faster than bincode and ~8x faster than rmp_serde for warm starts, at the same cache size.
Scope and exclusions
CachedLicenseIndexstruct with byte-blob fields for Automaton/Rule/License, rkyv Archive derives on TokenId/TokenDictionary/TokenSet/TokenMultiset/IndexedRuleMetadataIntentional differences from Python
Follow-up work
--reindexand--license-cache-dirCLI flags for cache controlCloses #612