Skip to content

litt sharding simple sharding#3395

Merged
cody-littley merged 5 commits into
mainfrom
cjl/litt-sharding
May 14, 2026
Merged

litt sharding simple sharding#3395
cody-littley merged 5 commits into
mainfrom
cjl/litt-sharding

Conversation

@cody-littley

@cody-littley cody-littley commented May 5, 2026

Copy link
Copy Markdown
Contributor

Describe your changes and provide context

This PR simplifies how littDB assignes values to shards. The net result of this change is that we don't need the siphash library any more. Previously, LittDB needed this library for shard assignment.

Testing performed to validate your change

Benchmarked this particular schema when evaluating LittDB performance for block storage.


Note

High Risk
High risk because it changes LittDB’s on-disk formats (segment metadata, key files, and keymap Address serialization) and the shard selection logic, which can affect data compatibility and read/write correctness across restarts.

Overview
This PR replaces key-hash-based sharding (with per-segment salt) with round-robin shard assignment at write time, and makes reads fully address-driven by embedding shardID (and valueSize) into the on-disk types.Address.

It updates segment/key-file/metadata serialization to a new LatestSegmentVersion (dropping legacy versions and salt fields), tightens sharding-factor validation (now capped at MaxShardingFactor = 256), removes the siphash dependency, and adjusts/extends tests and docs to match the new wire formats and shard-selection behavior (including out-of-range shardID handling).

Reviewed by Cursor Bugbot for commit 9d2dd25. Bugbot is set up for automated code reviews on this repo. Configure here.

@cody-littley cody-littley requested review from Kbhat1 and blindchaser May 5, 2026 19:37
@cody-littley cody-littley self-assigned this May 5, 2026
@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 14, 2026, 1:34 PM

Comment thread sei-db/db_engine/litt/README.md Outdated
@@ -233,18 +233,13 @@ the [value](#value) associated with a [key](#key) can be retrieved from disk.
An address is encoded in a 64-bit integer. It contains two pieces of information:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ai: The implementation now serializes a 13-byte address, not a 64-bit integer, and the offset points at the value length prefix. Update this before merging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed wording:

## Address

An address partially describes the location on disk where a [value](#value) is stored. Together with a [key](#key),
the [value](#value) associated with a [key](#key) can be retrieved from disk.

An address contains the following information:

- the [segment](#segment) [index](#segment-index) where the [value](#value) is stored
- the [shard](#shard) within that segment that holds the [value](#value)
- the offset within the [value file](#segment-value-files) where the first byte of
  the [value](#value) is stored
- the length of the [value](#value) in bytes

Retrieving a [value](#value) starting from an Address is a self-contained
operation that does not need to consult any segment-level metadata or recompute anything from the [key](#key).

}

// DeserializeAddress converts a byte slice to an Address. The slice must be exactly AddressSerializedSize bytes.
func DeserializeAddress(bytes []byte) (Address, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: DeserializeAddress accepts any byte as ShardID, but these paths index s.shards without validating it is < len(s.shards). A bad keymap entry or damaged key file can turn a read/restart into a runtime panic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added defensive checks

@cody-littley cody-littley enabled auto-merge May 14, 2026 13:33
@cody-littley cody-littley added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit d56b207 May 14, 2026
43 checks passed
@cody-littley cody-littley deleted the cjl/litt-sharding branch May 14, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants