Skip to content

feat(fs): simplify dyn Fs object safety for per-level routing#109

Merged
polaz merged 14 commits into
mainfrom
feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r
Mar 23, 2026
Merged

feat(fs): simplify dyn Fs object safety for per-level routing#109
polaz merged 14 commits into
mainfrom
feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 22, 2026

Summary

Remove associated types (File, ReadDir) from the Fs trait so that Arc<dyn Fs> works without specifying type parameters — enabling ergonomic per-level filesystem routing.

  • Fs::open() now returns Box<dyn FsFile> (allocation overhead is negligible for syscall-backed implementations like StdFs)
  • Fs::read_dir() now returns Vec<FsDirEntry> (cold-path only: recovery, compaction file listing)
  • Remove StdReadDir public type (logic inlined into StdFs::read_dir)

Before: Arc<dyn Fs<File = std::fs::File, ReadDir = StdReadDir>>
After: Arc<dyn Fs>

Changes

  • src/fs/mod.rs — remove type File and type ReadDir associated types, update method signatures and object-safety doc
  • src/fs/std_fs.rs — update StdFs impl, remove StdReadDir, update tests

Testing

All 429 unit tests + integration tests pass. Object-safety test updated to assert simple Arc<dyn Fs>.

Closes #92

Copilot AI review requested due to automatic review settings March 22, 2026 19:44
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Warning

Rate limit exceeded

@polaz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 55 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3d0809a8-23ba-40b5-be8d-4eeacf59449e

📥 Commits

Reviewing files that changed from the base of the PR and between 814d4f4 and ee9a0ee.

📒 Files selected for processing (2)
  • src/fs/mod.rs
  • src/fs/std_fs.rs
📝 Walkthrough

Walkthrough

Removed Fs associated types to make dyn Fs object-safe: open now returns io::Result<Box<dyn FsFile>> and read_dir returns io::Result<Vec<FsDirEntry>>. The StdReadDir wrapper/iterator was removed and StdFs/tests updated for eager directory collection and boxed file handles.

Changes

Cohort / File(s) Summary
Trait surface
src/fs/mod.rs
Removed StdReadDir re-export. Removed associated types File and ReadDir from Fs. Changed open(&self, ...)io::Result<Box<dyn FsFile>>. Changed read_dir(&self, ...)io::Result<Vec<FsDirEntry>>.
StdFs implementation & tests
src/fs/std_fs.rs
Removed StdReadDir type and its Iterator impl. StdFs::open now boxes returned files as Box<dyn FsFile>. StdFs::read_dir now eagerly reads entries, converts each to FsDirEntry, maps non-UTF-8 names to InvalidData with directory path, and returns Vec<FsDirEntry>. Tests adjusted (including Linux-only non-UTF-8 filename case).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I boxed a file and gathered the trail,
Entries in a vector — a tidy mail,
No more type params to bend or fight,
UTF-8 names now must be right,
I hop on, code gleaming in moonlight. 🌙

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: removing associated types from the Fs trait to simplify object safety, enabling Arc without type parameters for per-level routing.
Description check ✅ Passed The description is directly related to the changeset, explaining the rationale for removing associated types, documenting the new return types, and covering the trade-offs.
Linked Issues check ✅ Passed The PR fully implements the objectives from #92: removes associated types File and ReadDir from Fs trait, changes Fs::open to return Box, changes Fs::read_dir to return Vec, removes StdReadDir, and documents accepted performance trade-offs.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the scope of #92 and #75: trait refactoring, implementation updates, test modifications, and removal of the StdReadDir type are all necessary to achieve the object-safety simplification.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies the filesystem abstraction to make Arc<dyn Fs> ergonomic for per-level filesystem routing by removing Fs trait associated types and type-erasing returned handles/dir listings.

Changes:

  • Remove File/ReadDir associated types from Fs, making it directly object-safe as dyn Fs.
  • Change Fs::open() to return Box<dyn FsFile> and Fs::read_dir() to return Vec<FsDirEntry>.
  • Update StdFs implementation and tests accordingly, removing the now-unneeded StdReadDir type.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/fs/mod.rs Updates the Fs trait API to be object-safe without associated types and adjusts docs/signatures for boxed file handles and eager directory listing.
src/fs/std_fs.rs Implements the new Fs API for StdFs, inlines directory-entry mapping, and updates tests to use the new return types.

Comment thread src/fs/mod.rs Outdated
@polaz polaz force-pushed the feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r branch 2 times, most recently from 37805e6 to 99f65f9 Compare March 22, 2026 20:14
@polaz polaz requested a review from Copilot March 22, 2026 20:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread src/fs/mod.rs
Comment thread src/fs/std_fs.rs
Comment thread src/fs/std_fs.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread src/fs/mod.rs Outdated
Comment thread src/fs/mod.rs Outdated
Comment thread src/fs/std_fs.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@polaz polaz requested a review from Copilot March 22, 2026 21:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread src/fs/mod.rs Outdated
@polaz polaz force-pushed the feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r branch from 4d8c9f9 to 718a56d Compare March 22, 2026 21:48
@polaz polaz requested a review from Copilot March 22, 2026 22:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/fs/std_fs.rs
Comment thread src/fs/std_fs.rs
@polaz polaz force-pushed the feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r branch from a1ac3f6 to 44832fd Compare March 22, 2026 22:27
@polaz polaz requested a review from Copilot March 22, 2026 22:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/fs/mod.rs
Comment thread src/fs/std_fs.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread src/fs/mod.rs
Comment thread src/fs/mod.rs
Comment thread src/fs/mod.rs Outdated
@polaz polaz force-pushed the feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r branch from 94a1a77 to ce60753 Compare March 23, 2026 00:37
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: ee9a0ee Previous: c1f5511 Ratio
readseq 2291224.4054624946 ops/sec 3035687.9728046074 ops/sec 1.32
prefixscan 182812.71589896103 ops/sec 220381.37949769635 ops/sec 1.21

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

polaz added 13 commits March 23, 2026 16:10
Copilot correctly noted that hardcoded timing figures (~50ns vs ~5μs)
are specific to syscall-backed implementations, not universal for all
Fs backends (e.g., in-memory FS). Reword to qualify the claim.
Add the directory path and offending OsString to the error message
so callers can identify which directory contains the problematic entry.
Add inline comments explaining:
- Box<dyn FsFile> is intentionally 'static (owned handles, not borrowed)
- Eager collect fail-fast on bad entries is intentional (corruption signal)
Use path.display() for Path and suppress unnecessary_debug_formatting
for OsString which has no Display impl.
Cover the error path in StdFs::read_dir that rejects non-UTF-8
filenames. Only runs on Linux — macOS/APFS rejects invalid UTF-8
at the filesystem level.

Also clarify Box<dyn FsFile> lifetime comment wording.
Add explicit note that Vec<FsDirEntry> order is unspecified and
callers must sort if needed.
Move #[expect(clippy::unnecessary_debug_formatting)] from the let
binding to the format! macro where the lint actually fires.
Some Linux filesystems (overlay, container mounts) reject invalid
byte sequences at file creation. Skip the test gracefully instead
of panicking on environments that don't support the test scenario.
Strengthen doc comment to clarify that Vec is chosen because LSM
directory sizes are bounded (level count), not unbounded.
Directory sizes are bounded by segment count, not level count.
Simplify to "expected to remain small" which is accurate.
Simplify the guard to skip on any write error, not just InvalidInput.
Container/overlay mounts may return various error kinds when
rejecting non-UTF-8 filenames.
Move rationale from adjacent comment into the expect attribute's
reason field, matching the pattern used elsewhere in the crate.
Replace unwrap() with ? and unwrap_err() with match to comply
with crate-wide #![deny(clippy::unwrap_used)].
@polaz polaz force-pushed the feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r branch from b0f7fde to ee9a0ee Compare March 23, 2026 14:12
@polaz polaz requested a review from Copilot March 23, 2026 14:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/fs/std_fs.rs
Comment thread src/fs/mod.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat src/fs/mod.rs
echo "=== STD_FS ==="
cat src/fs/std_fs.rs

Length of output: 28986


@polaz: Full re-review complete. Here is my assessment:


✅ Object safety

Fs has no associated types; Arc<dyn Fs> compiles without qualification. The object_safety test exercises this at runtime with an actual exists() call.

open / read_dir signatures

  • openBox<dyn FsFile> with a 'static-lifetime file handle. The inline comment explaining why 'static is correct is a good addition.
  • read_dirVec<FsDirEntry>. Fail-fast semantics are documented at the trait level and in the inline comment. Non-UTF-8 error messages include both the offending OsString and the directory path.

read_at implementation

Fill-or-EOF loop with EINTR retry is correct on both Unix (FileExt::read_at) and Windows (FileExt::seek_read, which does not update the file pointer — pread-equivalent). The #[expect(clippy::expect_used, reason = "filled < buf.len() by loop guard")] is correctly scoped to the one .expect() whose invariant is proved by the loop condition.

lock_exclusive FFI

LOCK_EX = 2 is the correct POSIX value. The Windows LockFileEx path uses LOCKFILE_EXCLUSIVE_LOCK = 0x0000_0002 with a properly zeroed Overlapped and a valid u32::MAX lock range. Both #[expect(unsafe_code, reason = "...")] attributes are well-scoped.

sync_directory

Correctly no-ops on Windows (with a comment matching the existing convention in the crate). Works on non-Windows via File::open + is_dir() guard + sync_all().

Fs::exists

Returns io::Result<bool> via path.try_exists() — matches the corrected learning recorded after the last re-review.

✅ Lint policy

All new #[allow(...)] uses are #[expect(..., reason = "...")]. No regressions against the repo policy (learning 4358e71e).

✅ Error handling

No unwrap()/expect() in non-test production paths. All I/O uses ?.

✅ Tests

Full coverage: I/O, metadata, set_len, truncate/append, lock_exclusive, sync_data, sync_directory, read_at at multiple offsets, FsDirEntry field correctness, object_safety assertion, and the Linux-only non-UTF-8 graceful-skip test.


No blocking concerns. The implementation is correct and the PR is ready to merge.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-23T11:30:24.031Z
Learning: In `structured-world/lsm-tree` (`src/fs/mod.rs`), `Fs::exists` returns `io::Result<bool>` (not `bool`), implemented via `path.try_exists()`, and propagates I/O/permission errors to the caller. This signature has been in place since it was introduced in PR `#80` and was not changed by PR `#109`. Do not flag `Fs::exists` as returning a bare `bool` or as collapsing errors to `false` in any future PR review.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-15T13:24:43.905Z
Learning: In the lsm-tree repository, the `unsafe { Slice::builder_unzeroed }` pattern used in lz4 decompression is pre-existing upstream code (not introduced by any feature PR). It is tracked for a separate fix in issue `#2`. The zstd decompression path uses the safe `Vec`-based approach instead.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 23
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-17T00:34:54.145Z
Learning: In `structured-world/lsm-tree`, the lint suppression policy distinguishes between legacy and new code:
- Existing/upstream `#[allow(clippy::...)]` in **unchanged** code: treat as legacy — do **not** flag it in reviews.
- New code introduced in a PR **must** use `#[expect(clippy::...)]` with a justification reason.
This avoids churning upstream code while enforcing the `#[expect]` standard for all new contributions.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 100
File: tests/custom_comparator_compaction.rs:421-447
Timestamp: 2026-03-22T20:07:13.916Z
Learning: In `structured-world/lsm-tree` (`tests/custom_comparator_compaction.rs`), the two range-scan integration tests (`reverse_comparator_range_scan_after_compaction` and the second range-scan test) are intentionally `#[ignore]`d in PR `#100` because `RunReader::new()` still uses lexicographic `range_overlap_indexes` rather than the comparator-aware variant for on-disk range reads. Threading the `SharedComparator` through `RunReader` is a known deferred limitation documented in the PR body and tracked as a follow-up. Do not flag the ignored range-scan tests or the absence of comparator-aware `RunReader` overlap logic as a blocking concern in PR `#100` or future PRs until that follow-up issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:14.362Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 108
File: src/compaction/leveled/test.rs:681-727
Timestamp: 2026-03-23T00:40:18.897Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/test.rs`), the regression test `multi_level_sparse_keyspace_data_integrity` (added in PR `#108` for issue `#72`) uses full-span SSTs (writes both "a" and "z" in each flush) rather than genuinely disjoint key ranges. This is a known, accepted coverage gap: at unit-test data volumes the force-trivial-move scoring (99.99) cascades all intermediate levels to Lmax, making it structurally impossible to simultaneously populate L1 and L2 in a unit test. The test verifies data integrity (read-back correctness) only, not the per-range L2 sparse-overlap optimization. Do not flag the full-span flush pattern in this test as a deficiency in PR `#108` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 117
File: src/tree/ingest.rs:325-330
Timestamp: 2026-03-22T22:44:15.807Z
Learning: In `structured-world/lsm-tree` (`src/tree/ingest.rs`), the write-path monotonicity guards in `Ingestion::write`, `write_tombstone`, `write_weak_tombstone`, and `write_indirection` use plain lexicographic `key > *prev` comparisons instead of the configured `UserComparator`. This is a pre-existing issue acknowledged as valid but intentionally out of scope for PR `#98` (issue `#98` covers `Run::push()` and related methods). It is tracked in a separate follow-up issue. Do not flag the lexicographic ordering guards in `src/tree/ingest.rs` ingestion write paths as a concern in PR `#98` or future PRs until that separate issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:11.575Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.708Z
Learning: Applies to **/*.rs : Before flagging any potential issue, trace the call chain visible in the PR diff. Check the caller to see if it handles the case, verify type-level guarantees, and read adjacent comments about deliberate design decisions.

@polaz polaz merged commit d7279d3 into main Mar 23, 2026
23 of 24 checks passed
@polaz polaz deleted the feat/#92-feat-simplify-dyn-fs-object-safety-for-per-level-r branch March 23, 2026 15:38
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 23, 2026
polaz added a commit that referenced this pull request Mar 23, 2026
Replace generic FS: Fs type parameters with Arc<dyn Fs> / &dyn Fs
throughout Writer, MultiWriter, and helpers, following the Fs trait
simplification in #109 that removed associated types in favor of
Box<dyn FsFile> returns.
polaz added a commit that referenced this pull request Mar 23, 2026
Replace generic FS: Fs type parameters with Arc<dyn Fs> / &dyn Fs
throughout Writer, MultiWriter, and helpers, following the Fs trait
simplification in #109 that removed associated types in favor of
Box<dyn FsFile> returns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: simplify dyn Fs object safety for per-level routing

2 participants