-
Notifications
You must be signed in to change notification settings - Fork 0
DM-40193: Disk space leak in prompt processing #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2173c4b
to
cea2c9e
Compare
dhirving
reviewed
Jun 13, 2024
erinleighh
approved these changes
Jun 14, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looked good, only one minor comment on a comment.
This class will track which datasets should be kept in the local repo. Allowing multiple implementations lets us adjust strategies for external factors like detector affinity without investing in support up front.
The new spec is not only easier to implement, it's also more robust for segmented or rapid-demotion type eviction strategies, which could otherwise throw out incoming objects before they even get a chance to be used.
This implementation of EvictingSet chooses eviction candidates at random. This implementation is appropriate for caching when the access pattern is itself random, as in the absence of affinity.
This callback allows optional access to the dataset list without breaking the existing API. It's a stop-gap prior to refactoring the function.
This class stores a configurable cache of DatasetRefs, but does not directly link them to Butler datasets. It is a Collection but otherwise supports a bare minimum of methods.
The LOCAL_REPO_CACHE_SIZE envvar sets the number of regular datasets (e.g., calibs) to hold in the repo at once. Most runs need multiple templates and refcats, so these get a multiplier.
The hack was a workaround for not being able to manage the local repo properly. Now that we can remove contents more precisely, there's no need to keep it around.
cea2c9e
to
eafe325
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a dataset cache to ensure the local repo never exceeds a configured size. The implementation is quite messy and may need to be replaced later, but the basic configuration hooks should be permanent.