-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make the index cache size (in bytes) available #2381
Conversation
I'm leaving this in draft for now because I don't quite think we are doing the right thing for arrow array memory sizes. Some of the index structures may share the same arrow array and I believe this implementation (using On the other hand, overcounting might be better than having 0 stats available, so I'm not opposed to merging this in as it is if we want the capability sooner. |
+ self.params.deep_size_of_children(context) | ||
+ self.nodes.deep_size_of_children(context) | ||
+ self.level_count.deep_size_of_children(context) | ||
// Skipping the visited_generator_queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BubbleCal do you think visited_generator_queue
will occupy a lot of space in memory? I wasn't sure if it was safe to skip that here.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2381 +/- ##
==========================================
- Coverage 80.04% 79.68% -0.36%
==========================================
Files 199 199
Lines 54469 54713 +244
Branches 54469 54713 +244
==========================================
+ Hits 43598 43600 +2
- Misses 8349 8586 +237
- Partials 2522 2527 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I can't figure out a great way to get the Arrow memory size more reliably. The way There is So if two structures have a cloned |
Honestly, these seems acceptable to me for now. |
python/python/lance/dataset.py
Outdated
def cache_size_bytes(self) -> int: | ||
""" | ||
Return the total size of the index + file metadata caches in bytes. | ||
""" | ||
return self._ds.cache_size_bytes() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have some sort of session handle instead. It might be too early, but would be nice to have something users could pass around and re-use across datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a new session object. It can only report the size right now, so you can't use it to pass between datasets, but it will help avoid breaking changes in the future when we do add that capability.
rust/lance-index/src/scalar/btree.rs
Outdated
ScalarValue::Decimal256(_, _, _) => 0, | ||
ScalarValue::Int8(_) => 0, | ||
ScalarValue::Int16(_) => 0, | ||
ScalarValue::Int32(_) => 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this zero? Also, could we use ScalarValue::size()
? https://docs.rs/datafusion/latest/datafusion/common/enum.ScalarValue.html#method.size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't believe I missed that 🤦 . Yes, that will work. It's 0
because deepsize
does the std::mem::size_of already so this is only supposed to account for anything above and beyond that.
…ize instead of calculating size manually
@@ -76,3 +76,6 @@ class LanceFileMetadata: | |||
num_global_buffer_bytes: int | |||
global_buffers: List[LanceBufferDescriptor] | |||
columns: List[LanceColumnMetadata] | |||
|
|||
class _Session: | |||
def size_bytes(self) -> int: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, should it be cache_size_bytes()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if we have any other stateful objects in the session, we would want to capture that here as well. In other words, it would make more sense to rename Session
to DatasetCache
than it would to rename size_bytes
to cache_size_bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. For context, I'm thinking other methods we might set on the context will be execution args, like:
class Session:
def set_cpu_pool_size(num_cpus: int): ...
def set_io_pool_size(num_cpus: int): ...
def set_memory_limit(memory_limit: int): ...
def set_spill_location(path: Path): ...
So I'm thinking of the Session object as containing more state than the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm thinking of the Session object as containing more state than the cache.
Sorry, my comment was probably confusing. I think this is good, my point should have been "it doesn't make sense to rename Session to DatasetCache and therefore doesn't make sense to rename size_bytes to cache_size_bytes".
For example, maybe DF starts to need some "thread local temp space" or "slab allocator working area" or whatever (I'm probably just making terms up at this point). This would go in the session, would not be trivial in size, and would not be considered a "cache", but we would still want to make sure that size_bytes
includes it.
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [lance](https://togithub.com/lancedb/lance) | dependencies | minor | `0.10.16` -> `0.12.0` | --- ### Release Notes <details> <summary>lancedb/lance (lance)</summary> ### [`v0.12.1`](https://togithub.com/lancedb/lance/releases/tag/v0.12.1) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.12.0...v0.12.1) <!-- Release notes generated using configuration in .github/release.yml at v0.12.1 --> #### What's Changed ##### Bug Fixes 🐛 - fix: incorrect chunking was making lance datasets use too much RAM by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2438 **Full Changelog**: lancedb/lance@v0.12.0...v0.12.1 ### [`v0.12.0`](https://togithub.com/lancedb/lance/releases/tag/v0.12.0) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.11.1...v0.12.0) <!-- Release notes generated using configuration in .github/release.yml at v0.12.0 --> #### What's Changed ##### Breaking Changes 🛠 - feat: change dataset uri to return full qualified url instead of object store path by [@​eddyxu](https://togithub.com/eddyxu) in [lancedb/lance#2416 ##### New Features 🎉 - feat: new shuffler by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2404 - feat: new index builder by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2401 - feat: stable row id manifest changes by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2363 - feat: once a table has been created with v1 or v2 format then it should always use that format by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2435 ##### Bug Fixes 🐛 - fix: fix file writer which was not writing page buffers in the correct order by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2413 ##### Other Changes - refactor: refactor logical decoders into "field decoders" by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2407 - refactor: rename use_experimental_writer to use_legacy_format by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2433 - refactor: minor refactor to allow I/O scheduler to be cloned in page schedulers by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2432 **Full Changelog**: lancedb/lance@v0.11.1...v0.12.0 ### [`v0.11.1`](https://togithub.com/lancedb/lance/releases/tag/v0.11.1) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.11.0...v0.11.1) <!-- Release notes generated using configuration in .github/release.yml at v0.11.1 --> #### What's Changed ##### New Features 🎉 - feat(java): support jdk8 by [@​LuQQiu](https://togithub.com/LuQQiu) in [lancedb/lance#2362 - feat: support kmode with hamming distance by [@​eddyxu](https://togithub.com/eddyxu) in [lancedb/lance#2366 - feat: row id index structures (experimental) by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2303 - feat: update merge_insert to add statistics for inserted, updated, deleted rows by [@​raunaks13](https://togithub.com/raunaks13) in [lancedb/lance#2357 - feat: define Flat index as a scan over VectorStorage by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2380 - feat: add some schema utility methods to the v2 reader/writer by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2389 - feat: general compression for value page buffer by [@​niyue](https://togithub.com/niyue) in [lancedb/lance#2368 - feat: make the index cache size (in bytes) available by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2381 - feat: add special uri scheme to use CloudFileReader for local fs by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2402 - feat: add encoder utilities for pushdown by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2388 ##### Bug Fixes 🐛 - fix: concat batches before writing to avoid small IO slow down by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2384 - fix: low recall if the num partitions is more than num rows by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2386 - fix: f32 reduce_min for x86 by [@​heiher](https://togithub.com/heiher) in [lancedb/lance#2385 - fix: fix incorrect validation logic in updater by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2408 ##### Performance Improvements 🚀 - perf: make VectorStorage and DistCalculator static to generate better code by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2355 - perf: optimize IO path for reading manifest by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2396 ##### Other Changes - refactor: make proto conversion fallible and not copy by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2371 - refactor: separate take and schema evolution impls to own files by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2372 - Revert "fix: concat batches before writing to avoid small IO slow down ([#​2384](https://togithub.com/lancedb/lance/issues/2384))" by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2387 - refactor: shuffle around v2 metadata sections to allow read-on-demand statistics by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2400 #### New Contributors - [@​niyue](https://togithub.com/niyue) made their first contribution in [lancedb/lance#2368 - [@​heiher](https://togithub.com/heiher) made their first contribution in [lancedb/lance#2385 **Full Changelog**: lancedb/lance@v0.11.0...v0.11.1 ### [`v0.11.0`](https://togithub.com/lancedb/lance/releases/tag/v0.11.0) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.10.18...v0.11.0) <!-- Release notes generated using configuration in .github/release.yml at v0.11.0 --> #### What's Changed ##### Breaking Changes 🛠 - feat(rust)!: use BoxedError in Error::IO by [@​broccoliSpicy](https://togithub.com/broccoliSpicy) in [lancedb/lance#2329 ##### New Features 🎉 - feat: add v2 support to fragment merge / update paths by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2311 - feat: add priority to I/O scheduler by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2315 - feat: add take_rows operation to the v2 file reader's python bindings by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2331 - feat: added example for reading and writing dataset in rust by [@​raunaks13](https://togithub.com/raunaks13) in [lancedb/lance#2349 - feat: new HNSW implementation by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2353 - feat: add fragment take / fixed-size-binary support to v2 format by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2354 ##### Bug Fixes 🐛 - fix: recognize a simple expression like 'is_foo' as a scalar index query by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2356 - fix: rework list encoder to handle list-struct by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2344 - fix: minor bug fixes for v2 by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2361 ##### Documentation 📚 - docs: clearify comments in table.proto -> message DataFragment -> physical_rows by [@​broccoliSpicy](https://togithub.com/broccoliSpicy) in [lancedb/lance#2346 ##### Performance Improvements 🚀 - perf: use the file metadata cache in scalar indices by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2330 ##### Other Changes - chore: remove `m_max` and `use_heuristic` params from HNSW builder by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2336 - fix(java): fix JNI jar loader issue by [@​LuQQiu](https://togithub.com/LuQQiu) in [lancedb/lance#2340 - ci: fix labeler permissions by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2348 - fix: rework decoding to fix bugs in nested struct decoding by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2337 #### New Contributors - [@​broccoliSpicy](https://togithub.com/broccoliSpicy) made their first contribution in [lancedb/lance#2346 - [@​raunaks13](https://togithub.com/raunaks13) made their first contribution in [lancedb/lance#2349 **Full Changelog**: lancedb/lance@v0.10.18...v0.11.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/spiraldb/vortex). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCIsImxhYmVscyI6W119--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [lance](https://togithub.com/lancedb/lance) | dependencies | minor | `0.10.16` -> `0.12.0` | --- ### Release Notes <details> <summary>lancedb/lance (lance)</summary> ### [`v0.12.1`](https://togithub.com/lancedb/lance/releases/tag/v0.12.1) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.12.0...v0.12.1) <!-- Release notes generated using configuration in .github/release.yml at v0.12.1 --> #### What's Changed ##### Bug Fixes 🐛 - fix: incorrect chunking was making lance datasets use too much RAM by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2438 **Full Changelog**: lancedb/lance@v0.12.0...v0.12.1 ### [`v0.12.0`](https://togithub.com/lancedb/lance/releases/tag/v0.12.0) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.11.1...v0.12.0) <!-- Release notes generated using configuration in .github/release.yml at v0.12.0 --> #### What's Changed ##### Breaking Changes 🛠 - feat: change dataset uri to return full qualified url instead of object store path by [@​eddyxu](https://togithub.com/eddyxu) in [lancedb/lance#2416 ##### New Features 🎉 - feat: new shuffler by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2404 - feat: new index builder by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2401 - feat: stable row id manifest changes by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2363 - feat: once a table has been created with v1 or v2 format then it should always use that format by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2435 ##### Bug Fixes 🐛 - fix: fix file writer which was not writing page buffers in the correct order by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2413 ##### Other Changes - refactor: refactor logical decoders into "field decoders" by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2407 - refactor: rename use_experimental_writer to use_legacy_format by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2433 - refactor: minor refactor to allow I/O scheduler to be cloned in page schedulers by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2432 **Full Changelog**: lancedb/lance@v0.11.1...v0.12.0 ### [`v0.11.1`](https://togithub.com/lancedb/lance/releases/tag/v0.11.1) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.11.0...v0.11.1) <!-- Release notes generated using configuration in .github/release.yml at v0.11.1 --> #### What's Changed ##### New Features 🎉 - feat(java): support jdk8 by [@​LuQQiu](https://togithub.com/LuQQiu) in [lancedb/lance#2362 - feat: support kmode with hamming distance by [@​eddyxu](https://togithub.com/eddyxu) in [lancedb/lance#2366 - feat: row id index structures (experimental) by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2303 - feat: update merge_insert to add statistics for inserted, updated, deleted rows by [@​raunaks13](https://togithub.com/raunaks13) in [lancedb/lance#2357 - feat: define Flat index as a scan over VectorStorage by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2380 - feat: add some schema utility methods to the v2 reader/writer by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2389 - feat: general compression for value page buffer by [@​niyue](https://togithub.com/niyue) in [lancedb/lance#2368 - feat: make the index cache size (in bytes) available by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2381 - feat: add special uri scheme to use CloudFileReader for local fs by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2402 - feat: add encoder utilities for pushdown by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2388 ##### Bug Fixes 🐛 - fix: concat batches before writing to avoid small IO slow down by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2384 - fix: low recall if the num partitions is more than num rows by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2386 - fix: f32 reduce_min for x86 by [@​heiher](https://togithub.com/heiher) in [lancedb/lance#2385 - fix: fix incorrect validation logic in updater by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2408 ##### Performance Improvements 🚀 - perf: make VectorStorage and DistCalculator static to generate better code by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2355 - perf: optimize IO path for reading manifest by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2396 ##### Other Changes - refactor: make proto conversion fallible and not copy by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2371 - refactor: separate take and schema evolution impls to own files by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2372 - Revert "fix: concat batches before writing to avoid small IO slow down ([#​2384](https://togithub.com/lancedb/lance/issues/2384))" by [@​chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in [lancedb/lance#2387 - refactor: shuffle around v2 metadata sections to allow read-on-demand statistics by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2400 #### New Contributors - [@​niyue](https://togithub.com/niyue) made their first contribution in [lancedb/lance#2368 - [@​heiher](https://togithub.com/heiher) made their first contribution in [lancedb/lance#2385 **Full Changelog**: lancedb/lance@v0.11.0...v0.11.1 ### [`v0.11.0`](https://togithub.com/lancedb/lance/releases/tag/v0.11.0) [Compare Source](https://togithub.com/lancedb/lance/compare/v0.10.18...v0.11.0) <!-- Release notes generated using configuration in .github/release.yml at v0.11.0 --> #### What's Changed ##### Breaking Changes 🛠 - feat(rust)!: use BoxedError in Error::IO by [@​broccoliSpicy](https://togithub.com/broccoliSpicy) in [lancedb/lance#2329 ##### New Features 🎉 - feat: add v2 support to fragment merge / update paths by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2311 - feat: add priority to I/O scheduler by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2315 - feat: add take_rows operation to the v2 file reader's python bindings by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2331 - feat: added example for reading and writing dataset in rust by [@​raunaks13](https://togithub.com/raunaks13) in [lancedb/lance#2349 - feat: new HNSW implementation by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2353 - feat: add fragment take / fixed-size-binary support to v2 format by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2354 ##### Bug Fixes 🐛 - fix: recognize a simple expression like 'is_foo' as a scalar index query by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2356 - fix: rework list encoder to handle list-struct by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2344 - fix: minor bug fixes for v2 by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2361 ##### Documentation 📚 - docs: clearify comments in table.proto -> message DataFragment -> physical_rows by [@​broccoliSpicy](https://togithub.com/broccoliSpicy) in [lancedb/lance#2346 ##### Performance Improvements 🚀 - perf: use the file metadata cache in scalar indices by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2330 ##### Other Changes - chore: remove `m_max` and `use_heuristic` params from HNSW builder by [@​BubbleCal](https://togithub.com/BubbleCal) in [lancedb/lance#2336 - fix(java): fix JNI jar loader issue by [@​LuQQiu](https://togithub.com/LuQQiu) in [lancedb/lance#2340 - ci: fix labeler permissions by [@​wjones127](https://togithub.com/wjones127) in [lancedb/lance#2348 - fix: rework decoding to fix bugs in nested struct decoding by [@​westonpace](https://togithub.com/westonpace) in [lancedb/lance#2337 #### New Contributors - [@​broccoliSpicy](https://togithub.com/broccoliSpicy) made their first contribution in [lancedb/lance#2346 - [@​raunaks13](https://togithub.com/raunaks13) made their first contribution in [lancedb/lance#2349 **Full Changelog**: lancedb/lance@v0.10.18...v0.11.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/spiraldb/vortex). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCIsImxhYmVscyI6W119--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This PR makes extensive use of the deepsize crate for tracking the deep size of standard structures (Vec, BTreeMap, etc.). This crate does appear to be abandoned (no updates in 3 years). However, I'm unable to find anything similar, the code is fairly small should we need to vendor it in the future, and it is unlikely to be doing anything dangerous.
It turns out we store many many things in the cache (transitively) though most of them are probably small.