Skip to content

feat: make MAX_MINIBLOCK_VALUES configurable via env var#6340

Merged
westonpace merged 3 commits intolance-format:mainfrom
westonpace:feat-configurable-max-miniblock-values
Mar 30, 2026
Merged

feat: make MAX_MINIBLOCK_VALUES configurable via env var#6340
westonpace merged 3 commits intolance-format:mainfrom
westonpace:feat-configurable-max-miniblock-values

Conversation

@westonpace
Copy link
Copy Markdown
Member

@westonpace westonpace commented Mar 30, 2026

Summary

  • Converts MAX_MINIBLOCK_VALUES from a compile-time constant to a LazyLock<u64> that reads from the LANCE_MINIBLOCK_MAX_VALUES environment variable (default 4096)
  • Updates all 6 usage sites across the encoding crate to dereference the LazyLock
  • Adds documentation in docs/src/format/file/encoding.md explaining the tuning knob and when it's useful

Closes #6140

Test plan

  • All existing miniblock tests pass (10 tests)
  • All existing RLE tests pass (22 tests)
  • cargo clippy -p lance-encoding --tests -- -D warnings clean
  • Verify with a custom LANCE_MINIBLOCK_MAX_VALUES value that smaller mini-blocks are produced

🤖 Generated with Claude Code

Allow tuning the maximum number of values per mini-block chunk through
the LANCE_MINIBLOCK_MAX_VALUES environment variable (default 4096).
This helps reduce read amplification for workloads that read small
contiguous row ranges from object storage where bandwidth is the
bottleneck.

Closes lance-format#6140

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the enhancement New feature or request label Mar 30, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

westonpace and others added 2 commits March 30, 2026 09:46
- Clamp LANCE_MINIBLOCK_MAX_VALUES to [1, 4096] to prevent zero (infinite
  loops) or values exceeding the log_num_values format constraint
- Extract parse_max_miniblock_values() for testability
- Fix stale comment on log_num_values field to reflect configurability
- Add 5 serial tests covering: default, custom value, zero clamping,
  above-max clamping, and invalid input fallback
- Add serial_test dev-dependency to lance-encoding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ments

The default is appropriate for local disks and same-region cloud object
storage. Only consider lowering LANCE_MINIBLOCK_MAX_VALUES after
profiling confirms read amplification is saturating available bandwidth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@westonpace westonpace merged commit ee71812 into lance-format:main Mar 30, 2026
29 checks passed
eddyxu pushed a commit that referenced this pull request Mar 31, 2026
## Summary

- Converts `MAX_MINIBLOCK_VALUES` from a compile-time constant to a
`LazyLock<u64>` that reads from the `LANCE_MINIBLOCK_MAX_VALUES`
environment variable (default `4096`)
- Updates all 6 usage sites across the encoding crate to dereference the
`LazyLock`
- Adds documentation in `docs/src/format/file/encoding.md` explaining
the tuning knob and when it's useful

Closes #6140

## Test plan

- [x] All existing miniblock tests pass (10 tests)
- [x] All existing RLE tests pass (22 tests)
- [x] `cargo clippy -p lance-encoding --tests -- -D warnings` clean
- [ ] ~~Verify with a custom `LANCE_MINIBLOCK_MAX_VALUES` value that
smaller mini-blocks are produced~~

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make number of miniblock values tunable

2 participants