-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datastore: bake latest-at semantics into the garbage collector #1803
Labels
Milestone
Comments
5 tasks
Also known as "flattening" this will be useful for our plan of storing entity properties in the store. Each edit will be added, but on save we only want the latest of every property. |
3 tasks
jleibs
added a commit
that referenced
this issue
Aug 30, 2023
…3148) ### What Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint. ![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230) ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this issue
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint. ![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230) * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this issue
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint. ![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230) * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this issue
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint. ![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230) * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this issue
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint. ![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230) * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this issue
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint. ![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230) * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
The code is there, we just need to turn it on for the normal recordings |
3 tasks
jleibs
added a commit
that referenced
this issue
Sep 19, 2023
) ### What Now that GC has the abillity to protect data, turn the feature on for our normal `purge_fraction_of_ram` operations. Resolves: #1803 ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3357) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3357) - [Docs preview](https://rerun.io/preview/678cf75238c49f71ab338a09cc99790de0626efa/docs) <!--DOCS-PREVIEW--> - [Examples preview](https://rerun.io/preview/678cf75238c49f71ab338a09cc99790de0626efa/examples) <!--EXAMPLES-PREVIEW--> - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Consider the following log calls:
Querying for
LatestAt("some/entity", ("frame_nr", 5))
will unsurprisingly yield a red point at(5.0, 5.0)
.Now, consider what happens after running a GC that drops 50% of the data, leaving us with:
Querying for
LatestAt("some/entity", ("frame_nr", 5))
will now yield a point at(5.0, 5.0)
with whatever is currently defined as the default color, rather than red. This is just plain wrong.This happens because the GC blindly drops data rather than doing the correct thing: compacting what gets dropped into a latest-at kind of state and keeping that around for future queries.
The text was updated successfully, but these errors were encountered: