Count shared snapshot extents separately#244
Conversation
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR modifies disk utilization accounting logic in lib/diskutilization and lib/resources, not the kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) specified in the filter. To monitor this PR anyway, reply with |
hiroTamada
left a comment
There was a problem hiding this comment.
reviewed — overall looks solid, mostly questions and nits:
Questions
lib/diskutilization/diskutilization_reflink_linux.go:77-80— when FIEMAP errors orsawSharedis false, the whole file falls back tostat.Blocks. For a reflink-free file with holes/preallocated-unwritten extents, FIEMAP's private-byte sum could disagree withstat.Blocks; the fallback hides that asymmetry. Intentional?lib/diskutilization/diskutilization_reflink_linux.go:111-121— countsextent.Length(logical) for shared bytes. Fine for ext4/xfs but would over-report on btrfs compressed extents. Worth a comment if compression at the fs layer isn't expected for snapshot files?
Nits
lib/diskutilization/diskutilization_reflink_linux.go:15— consider usingunix.FS_IOC_FIEMAPrather than hard-coding0xC020660B.lib/diskutilization/diskutilization_reflink_linux_test.go:56-64—TestSharedExtentTrackerAddcovers exact-dup, partial overlaps, and disjoint. Missing zero-length, full-containment, and adjacent-merge cases — would lock downcompact()behavior if you want it.lib/diskutilization/diskutilization_reflink_linux.go:137-174—add()+compact()is O(n log n) per call, O(n²log n) over aCollectpass. Not a concern at current guest counts; flagging in case it gets reused on a hotter path.
|
the fallback is intentional: for files without FIEMAP shared extents, we keep the existing stat.Blocks behavior so sparse/preallocated edge cases don’t change as part of this patch. this only switches accounting when FIEMAP reports shared CoW extents. we also deployed this on dev-yul-hypeman-0 and checked SigNoz: snapshot_shared appeared after deploy, snapshot_uncompressed dropped from ~1.146 TB to ~397.5 GB, and scrape health looked fine. scrape_duration_seconds did not regress around the deploy (pre avg ~0.600s/max 0.812s, post avg ~0.573s/max 0.758s), samples stayed stable, and there were no disk-utilization/FIEMAP/resource-monitoring errors. |
Summary
Tests
Note
Medium Risk
Changes how snapshot disk usage is reported and relies on FIEMAP/ioctl on Linux, but non-Linux and fallback paths preserve prior st_blocks behavior.
Overview
Snapshot disk utilization now splits reflink / shared copy-on-write usage from per-snapshot private bytes. A new
snapshot_sharedcomponent reports shared physical extents once across all guests;snapshot_uncompressedandsnapshot_compressedonly get the private portion.On Linux, regular snapshot files use FIEMAP to classify extents; a cross-guest physical-range tracker deduplicates shared blocks. If FIEMAP is unavailable or no shared extents are seen, accounting falls back to existing
st_blocksbehavior (no double-count fix). Non-Linux builds keep the prior all-in-one block counting via a stub implementation.Tests cover extent deduplication and an integration path with FICLONE when the filesystem supports it.
Reviewed by Cursor Bugbot for commit 2b088a2. Bugbot is set up for automated code reviews on this repo. Configure here.