Skip to content

fix(ecstore): harden issue3031 multipart validation path#3106

Merged
houseme merged 3 commits into
mainfrom
fix/fix-issues-3031-2
May 28, 2026
Merged

fix(ecstore): harden issue3031 multipart validation path#3106
houseme merged 3 commits into
mainfrom
fix/fix-issues-3031-2

Conversation

@houseme
Copy link
Copy Markdown
Contributor

@houseme houseme commented May 28, 2026

Related Issues

Fixes #3031.

Summary of Changes

  • Clear stale multipart part and part.meta destinations before the per-disk rename fan-out so repeated uploads of the same part number follow the MinIO overwrite behavior.
  • Add regression coverage for repeated multipart part overwrites and keep complete/list operations bound to the latest uploaded part state.
  • De-sensitize remote-disk startup health handling so the first network-like failure transitions to Suspect before escalating to Offline, reducing false startup fault noise.
  • Refine remote locker diagnostics, identify scanner leader-lock traffic as unrelated local warning noise, and remove that noise from the default warning surface while preserving connection eviction behavior.
  • Add a dedicated 4-node Docker validation script for issue 3031 with S3 readiness gating, warp execution, and summary/log collection.
  • Inline serde_json::json! in the admin console version handler and drop the now-unused import without changing behavior.

Verification

  • cargo test -p rustfs-ecstore repeated_upload_part_overwrites_previous_part_state -- --nocapture
  • cargo test -p rustfs-ecstore multipart -- --nocapture
  • cargo test -p rustfs-ecstore remote_disk -- --nocapture
  • cargo test -p rustfs-ecstore remote_locker -- --nocapture
  • cargo check -p rustfs-ecstore
  • bash scripts/validate_issue_3031_docker.sh --skip-build
  • bash scripts/validate_issue_3031_docker.sh --force-build
  • make pre-commit

Impact

  • Local 4-node Docker validation no longer reproduces issue warp multipart-put can fail with Storage resources are insufficient for the write operation #3031 under the validated warp multipart-put profile.
  • Multipart overwrite handling now more closely matches MinIO semantics for repeated part uploads.
  • Remote-disk and scanner leader-lock warning noise is reduced in the default local validation surface.
  • No intended API or user-facing behavior changes beyond multipart-path hardening and quieter diagnostics.

Additional Notes

  • Final local validation artifact: target/issue3031/20260528-205330/summary.txt.
  • This validation proves the current local 4-node Docker scenario no longer reproduces the issue; it does not prove that every external 4-node environment is fixed.

houseme added 2 commits May 28, 2026 21:11
- clear stale multipart part destinations before rename fan-out
- add repeated part overwrite regression coverage
- reduce remote disk startup false-fault escalation to suspect-first
- refine remote locker diagnostics and lower scanner leader-lock log noise
- add a dedicated 4-node issue3031 docker validation script
- drop the unused serde_json::json import in admin console
- call serde_json::json! inline in version_handler
- keep the console version response behavior unchanged
Copilot AI review requested due to automatic review settings May 28, 2026 13:29
@github-actions
Copy link
Copy Markdown
Contributor

CLA requirements are satisfied for this pull request.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens RustFS’s multipart upload handling (issue #3031) to better match MinIO overwrite semantics under repeated part uploads, while also refining distributed diagnostics/health handling and adding a local 4-node Docker validation workflow.

Changes:

  • Clear stale multipart part payload/metadata before per-disk rename fan-out and add a regression test for repeated part overwrites.
  • De-sensitize remote-disk health transitions (Suspect before Offline) and reduce warning noise for scanner leader-lock remote locker traffic.
  • Add a dedicated 4-node Docker validation script and small admin console cleanup (inline serde_json::json!).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/validate_issue_3031_docker.sh Adds a 4-node docker-compose + mc/warp validation harness with readiness gating and log/summary collection.
.docker/compose/docker-compose.cluster.local-build.yml Passes lock acquire timeout env vars into local 4-node cluster containers for reproducer/validation.
rustfs/src/admin/console.rs Inlines serde_json::json! and removes the unused import without behavior change.
crates/ecstore/src/set_disk/write.rs Clears stale multipart destination paths prior to rename fan-out to support part overwrite semantics.
crates/ecstore/src/bucket/lifecycle/bucket_lifecycle_ops.rs Adds regression coverage ensuring repeated uploads of the same part number overwrite prior state.
crates/ecstore/src/rpc/remote_disk.rs Adjusts remote-disk health handling toward Suspect-first behavior and updates tests accordingly.
crates/ecstore/src/rpc/remote_locker.rs Refines remote lock RPC diagnostics and demotes scanner leader-lock eviction warnings to debug.

Comment thread crates/ecstore/src/rpc/remote_disk.rs
Comment thread crates/ecstore/src/rpc/remote_disk.rs
Comment thread crates/ecstore/src/rpc/remote_disk.rs Outdated
@majinghe majinghe added this pull request to the merge queue May 28, 2026
@houseme houseme removed this pull request from the merge queue due to a manual request May 28, 2026
- record probe success during remote disk health checks so suspect drives recover
- use async_with_vars for the remote disk health probe test
- make the missing-listener test assert the state transition more robustly
@houseme houseme enabled auto-merge May 28, 2026 14:07
@houseme houseme added this pull request to the merge queue May 28, 2026
Merged via the queue into main with commit 088c4bd May 28, 2026
8 checks passed
@houseme houseme deleted the fix/fix-issues-3031-2 branch May 28, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

warp multipart-put can fail with Storage resources are insufficient for the write operation

2 participants