Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make the index cache size (in bytes) available #2381

Merged
merged 3 commits into from
May 24, 2024

Conversation

westonpace
Copy link
Contributor

This PR makes extensive use of the deepsize crate for tracking the deep size of standard structures (Vec, BTreeMap, etc.). This crate does appear to be abandoned (no updates in 3 years). However, I'm unable to find anything similar, the code is fairly small should we need to vendor it in the future, and it is unlikely to be doing anything dangerous.

It turns out we store many many things in the cache (transitively) though most of them are probably small.

@github-actions github-actions bot added the enhancement New feature or request label May 22, 2024
@westonpace
Copy link
Contributor Author

I'm leaving this in draft for now because I don't quite think we are doing the right thing for arrow array memory sizes. Some of the index structures may share the same arrow array and I believe this implementation (using get_array_memory_size) will double-count. We will need to create our own version of get_array_memory_size that somehow keeps track of whether or not we've seen an array before. I'd like to at least spend a bit of time investigating if that is possible.

On the other hand, overcounting might be better than having 0 stats available, so I'm not opposed to merging this in as it is if we want the capability sooner.

+ self.params.deep_size_of_children(context)
+ self.nodes.deep_size_of_children(context)
+ self.level_count.deep_size_of_children(context)
// Skipping the visited_generator_queue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BubbleCal do you think visited_generator_queue will occupy a lot of space in memory? I wasn't sure if it was safe to skip that here.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 5.17928% with 238 lines in your changes are missing coverage. Please review.

Project coverage is 79.68%. Comparing base (83ecc01) to head (8731ae1).

Files Patch % Lines
rust/lance-index/src/scalar/btree.rs 0.00% 49 Missing ⚠️
rust/lance-index/src/scalar/flat.rs 0.00% 18 Missing ⚠️
rust/lance/src/index/cache.rs 0.00% 18 Missing ⚠️
rust/lance/src/index/vector/pq.rs 0.00% 13 Missing ⚠️
rust/lance-core/src/cache.rs 50.00% 12 Missing ⚠️
rust/lance-index/src/vector/pq/storage.rs 0.00% 12 Missing ⚠️
rust/lance-table/src/format/index.rs 0.00% 11 Missing ⚠️
rust/lance-index/src/vector/hnsw/builder.rs 0.00% 8 Missing and 1 partial ⚠️
rust/lance-io/src/object_store.rs 0.00% 9 Missing ⚠️
rust/lance-index/src/vector/ivf/storage.rs 0.00% 8 Missing ⚠️
... and 25 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2381      +/-   ##
==========================================
- Coverage   80.04%   79.68%   -0.36%     
==========================================
  Files         199      199              
  Lines       54469    54713     +244     
  Branches    54469    54713     +244     
==========================================
+ Hits        43598    43600       +2     
- Misses       8349     8586     +237     
- Partials     2522     2527       +5     
Flag Coverage Δ
unittests 79.68% <5.17%> (-0.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@westonpace westonpace marked this pull request as ready for review May 23, 2024 12:01
@westonpace
Copy link
Contributor Author

I can't figure out a great way to get the Arrow memory size more reliably. The way deepsize avoids double counting Arc is by storing a set of Arc pointers that it has already visited. Arrow's Buffer does not expose its inner Arc<Bytes> and so we can't tie into deepsize's Context.

There is as_ptr but deepsize doesn't have a way to manually track pointers. We could create our own Context but this would mean we have to replace deepsize with our own thing which is a bit more involved than I want to do without knowing there is a problem.

So if two structures have a cloned ArrayRef we are fine. If two structures have a cloned Array we may overcount due to double counting.

@wjones127
Copy link
Contributor

So if two structures have a cloned ArrayRef we are fine. If two structures have a cloned Array we may overcount due to double counting.

Honestly, these seems acceptable to me for now.

Comment on lines 1520 to 1525
def cache_size_bytes(self) -> int:
"""
Return the total size of the index + file metadata caches in bytes.
"""
return self._ds.cache_size_bytes()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have some sort of session handle instead. It might be too early, but would be nice to have something users could pass around and re-use across datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new session object. It can only report the size right now, so you can't use it to pass between datasets, but it will help avoid breaking changes in the future when we do add that capability.

ScalarValue::Decimal256(_, _, _) => 0,
ScalarValue::Int8(_) => 0,
ScalarValue::Int16(_) => 0,
ScalarValue::Int32(_) => 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this zero? Also, could we use ScalarValue::size()? https://docs.rs/datafusion/latest/datafusion/common/enum.ScalarValue.html#method.size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't believe I missed that 🤦 . Yes, that will work. It's 0 because deepsize does the std::mem::size_of already so this is only supposed to account for anything above and beyond that.

@westonpace westonpace requested a review from wjones127 May 24, 2024 12:10
@@ -76,3 +76,6 @@ class LanceFileMetadata:
num_global_buffer_bytes: int
global_buffers: List[LanceBufferDescriptor]
columns: List[LanceColumnMetadata]

class _Session:
def size_bytes(self) -> int: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, should it be cache_size_bytes()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if we have any other stateful objects in the session, we would want to capture that here as well. In other words, it would make more sense to rename Session to DatasetCache than it would to rename size_bytes to cache_size_bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. For context, I'm thinking other methods we might set on the context will be execution args, like:

class Session:
    def set_cpu_pool_size(num_cpus: int): ...
    def set_io_pool_size(num_cpus: int): ...
    def set_memory_limit(memory_limit: int): ...
    def set_spill_location(path: Path): ...

So I'm thinking of the Session object as containing more state than the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm thinking of the Session object as containing more state than the cache.

Sorry, my comment was probably confusing. I think this is good, my point should have been "it doesn't make sense to rename Session to DatasetCache and therefore doesn't make sense to rename size_bytes to cache_size_bytes".

For example, maybe DF starts to need some "thread local temp space" or "slab allocator working area" or whatever (I'm probably just making terms up at this point). This would go in the session, would not be trivial in size, and would not be considered a "cache", but we would still want to make sure that size_bytes includes it.

@westonpace westonpace merged commit fc9cfd0 into lancedb:main May 24, 2024
17 of 19 checks passed
renovate bot added a commit to spiraldb/vortex that referenced this pull request Jun 12, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [lance](https://togithub.com/lancedb/lance) | dependencies | minor |
`0.10.16` -> `0.12.0` |

---

### Release Notes

<details>
<summary>lancedb/lance (lance)</summary>

### [`v0.12.1`](https://togithub.com/lancedb/lance/releases/tag/v0.12.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.12.0...v0.12.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.1 -->

#### What's Changed

##### Bug Fixes 🐛

- fix: incorrect chunking was making lance datasets use too much RAM by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2438

**Full Changelog**:
lancedb/lance@v0.12.0...v0.12.1

### [`v0.12.0`](https://togithub.com/lancedb/lance/releases/tag/v0.12.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.1...v0.12.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat: change dataset uri to return full qualified url instead of
object store path by [@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2416

##### New Features 🎉

- feat: new shuffler by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2404
- feat: new index builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2401
- feat: stable row id manifest changes by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2363
- feat: once a table has been created with v1 or v2 format then it
should always use that format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2435

##### Bug Fixes 🐛

- fix: fix file writer which was not writing page buffers in the correct
order by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2413

##### Other Changes

- refactor: refactor logical decoders into "field decoders" by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2407
- refactor: rename use_experimental_writer to use_legacy_format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2433
- refactor: minor refactor to allow I/O scheduler to be cloned in page
schedulers by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2432

**Full Changelog**:
lancedb/lance@v0.11.1...v0.12.0

### [`v0.11.1`](https://togithub.com/lancedb/lance/releases/tag/v0.11.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.0...v0.11.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.1 -->

#### What's Changed

##### New Features 🎉

- feat(java): support jdk8 by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2362
- feat: support kmode with hamming distance by
[@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2366
- feat: row id index structures (experimental) by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2303
- feat: update merge_insert to add statistics for inserted, updated,
deleted rows by [@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2357
- feat: define Flat index as a scan over VectorStorage by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2380
- feat: add some schema utility methods to the v2 reader/writer by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2389
- feat: general compression for value page buffer by
[@&#8203;niyue](https://togithub.com/niyue) in
[lancedb/lance#2368
- feat: make the index cache size (in bytes) available by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2381
- feat: add special uri scheme to use CloudFileReader for local fs by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2402
- feat: add encoder utilities for pushdown by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2388

##### Bug Fixes 🐛

- fix: concat batches before writing to avoid small IO slow down by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2384
- fix: low recall if the num partitions is more than num rows by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2386
- fix: f32 reduce_min for x86 by
[@&#8203;heiher](https://togithub.com/heiher) in
[lancedb/lance#2385
- fix: fix incorrect validation logic in updater by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2408

##### Performance Improvements 🚀

- perf: make VectorStorage and DistCalculator static to generate better
code by [@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2355
- perf: optimize IO path for reading manifest by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2396

##### Other Changes

- refactor: make proto conversion fallible and not copy by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2371
- refactor: separate take and schema evolution impls to own files by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2372
- Revert "fix: concat batches before writing to avoid small IO slow down
([#&#8203;2384](https://togithub.com/lancedb/lance/issues/2384))" by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2387
- refactor: shuffle around v2 metadata sections to allow read-on-demand
statistics by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2400

#### New Contributors

- [@&#8203;niyue](https://togithub.com/niyue) made their first
contribution in
[lancedb/lance#2368
- [@&#8203;heiher](https://togithub.com/heiher) made their first
contribution in
[lancedb/lance#2385

**Full Changelog**:
lancedb/lance@v0.11.0...v0.11.1

### [`v0.11.0`](https://togithub.com/lancedb/lance/releases/tag/v0.11.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.10.18...v0.11.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat(rust)!: use BoxedError in Error::IO by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2329

##### New Features 🎉

- feat: add v2 support to fragment merge / update paths by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2311
- feat: add priority to I/O scheduler by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2315
- feat: add take_rows operation to the v2 file reader's python bindings
by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2331
- feat: added example for reading and writing dataset in rust by
[@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2349
- feat: new HNSW implementation by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2353
- feat: add fragment take / fixed-size-binary support to v2 format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2354

##### Bug Fixes 🐛

- fix: recognize a simple expression like 'is_foo' as a scalar index
query by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2356
- fix: rework list encoder to handle list-struct by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2344
- fix: minor bug fixes for v2 by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2361

##### Documentation 📚

- docs: clearify comments in table.proto -> message DataFragment ->
physical_rows by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2346

##### Performance Improvements 🚀

- perf: use the file metadata cache in scalar indices by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2330

##### Other Changes

- chore: remove `m_max` and `use_heuristic` params from HNSW builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2336
- fix(java): fix JNI jar loader issue by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2340
- ci: fix labeler permissions by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2348
- fix: rework decoding to fix bugs in nested struct decoding by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2337

#### New Contributors

- [@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) made their
first contribution in
[lancedb/lance#2346
- [@&#8203;raunaks13](https://togithub.com/raunaks13) made their first
contribution in
[lancedb/lance#2349

**Full Changelog**:
lancedb/lance@v0.10.18...v0.11.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/spiraldb/vortex).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCIsImxhYmVscyI6W119-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
AdamGS pushed a commit to AdamGS/vortex that referenced this pull request Jun 14, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [lance](https://togithub.com/lancedb/lance) | dependencies | minor |
`0.10.16` -> `0.12.0` |

---

### Release Notes

<details>
<summary>lancedb/lance (lance)</summary>

### [`v0.12.1`](https://togithub.com/lancedb/lance/releases/tag/v0.12.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.12.0...v0.12.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.1 -->

#### What's Changed

##### Bug Fixes 🐛

- fix: incorrect chunking was making lance datasets use too much RAM by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2438

**Full Changelog**:
lancedb/lance@v0.12.0...v0.12.1

### [`v0.12.0`](https://togithub.com/lancedb/lance/releases/tag/v0.12.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.1...v0.12.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat: change dataset uri to return full qualified url instead of
object store path by [@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2416

##### New Features 🎉

- feat: new shuffler by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2404
- feat: new index builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2401
- feat: stable row id manifest changes by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2363
- feat: once a table has been created with v1 or v2 format then it
should always use that format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2435

##### Bug Fixes 🐛

- fix: fix file writer which was not writing page buffers in the correct
order by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2413

##### Other Changes

- refactor: refactor logical decoders into "field decoders" by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2407
- refactor: rename use_experimental_writer to use_legacy_format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2433
- refactor: minor refactor to allow I/O scheduler to be cloned in page
schedulers by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2432

**Full Changelog**:
lancedb/lance@v0.11.1...v0.12.0

### [`v0.11.1`](https://togithub.com/lancedb/lance/releases/tag/v0.11.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.0...v0.11.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.1 -->

#### What's Changed

##### New Features 🎉

- feat(java): support jdk8 by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2362
- feat: support kmode with hamming distance by
[@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2366
- feat: row id index structures (experimental) by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2303
- feat: update merge_insert to add statistics for inserted, updated,
deleted rows by [@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2357
- feat: define Flat index as a scan over VectorStorage by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2380
- feat: add some schema utility methods to the v2 reader/writer by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2389
- feat: general compression for value page buffer by
[@&#8203;niyue](https://togithub.com/niyue) in
[lancedb/lance#2368
- feat: make the index cache size (in bytes) available by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2381
- feat: add special uri scheme to use CloudFileReader for local fs by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2402
- feat: add encoder utilities for pushdown by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2388

##### Bug Fixes 🐛

- fix: concat batches before writing to avoid small IO slow down by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2384
- fix: low recall if the num partitions is more than num rows by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2386
- fix: f32 reduce_min for x86 by
[@&#8203;heiher](https://togithub.com/heiher) in
[lancedb/lance#2385
- fix: fix incorrect validation logic in updater by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2408

##### Performance Improvements 🚀

- perf: make VectorStorage and DistCalculator static to generate better
code by [@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2355
- perf: optimize IO path for reading manifest by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2396

##### Other Changes

- refactor: make proto conversion fallible and not copy by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2371
- refactor: separate take and schema evolution impls to own files by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2372
- Revert "fix: concat batches before writing to avoid small IO slow down
([#&#8203;2384](https://togithub.com/lancedb/lance/issues/2384))" by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2387
- refactor: shuffle around v2 metadata sections to allow read-on-demand
statistics by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2400

#### New Contributors

- [@&#8203;niyue](https://togithub.com/niyue) made their first
contribution in
[lancedb/lance#2368
- [@&#8203;heiher](https://togithub.com/heiher) made their first
contribution in
[lancedb/lance#2385

**Full Changelog**:
lancedb/lance@v0.11.0...v0.11.1

### [`v0.11.0`](https://togithub.com/lancedb/lance/releases/tag/v0.11.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.10.18...v0.11.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat(rust)!: use BoxedError in Error::IO by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2329

##### New Features 🎉

- feat: add v2 support to fragment merge / update paths by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2311
- feat: add priority to I/O scheduler by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2315
- feat: add take_rows operation to the v2 file reader's python bindings
by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2331
- feat: added example for reading and writing dataset in rust by
[@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2349
- feat: new HNSW implementation by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2353
- feat: add fragment take / fixed-size-binary support to v2 format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2354

##### Bug Fixes 🐛

- fix: recognize a simple expression like 'is_foo' as a scalar index
query by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2356
- fix: rework list encoder to handle list-struct by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2344
- fix: minor bug fixes for v2 by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2361

##### Documentation 📚

- docs: clearify comments in table.proto -> message DataFragment ->
physical_rows by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2346

##### Performance Improvements 🚀

- perf: use the file metadata cache in scalar indices by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2330

##### Other Changes

- chore: remove `m_max` and `use_heuristic` params from HNSW builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2336
- fix(java): fix JNI jar loader issue by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2340
- ci: fix labeler permissions by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2348
- fix: rework decoding to fix bugs in nested struct decoding by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2337

#### New Contributors

- [@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) made their
first contribution in
[lancedb/lance#2346
- [@&#8203;raunaks13](https://togithub.com/raunaks13) made their first
contribution in
[lancedb/lance#2349

**Full Changelog**:
lancedb/lance@v0.10.18...v0.11.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/spiraldb/vortex).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCIsImxhYmVscyI6W119-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants