Skip to content

feat: Implement MultipartUploadBackend on TieredStorage#458

Open
lcian wants to merge 9 commits intolcian/feat/multipart-upload-other-backendsfrom
lcian/feat/multipart-upload-tiered
Open

feat: Implement MultipartUploadBackend on TieredStorage#458
lcian wants to merge 9 commits intolcian/feat/multipart-upload-other-backendsfrom
lcian/feat/multipart-upload-tiered

Conversation

@lcian
Copy link
Copy Markdown
Member

@lcian lcian commented May 4, 2026

⚠️ Stacked on #456

Last implementation of MultipartUploadBackend in objectstore-service.
Now TieredStorage requires a MultipartUploadBackend in its long_term slot.
All methods just pretty much forward to long_term, with the exception of complete_multipart which needs to use the Changelog for consistency and deal with the tombstones, with logic very similar to put_object except that we need an additional metadata read call.

Close FS-339

@lcian lcian requested a review from a team as a code owner May 4, 2026 15:36
@lcian lcian changed the title feat(gcs): implement MultipartUploadBackend using GCS XML API feat: Implement MultipartUploadBackend on TieredStorage May 4, 2026
@lcian lcian changed the base branch from main to lcian/feat/multipart-upload-other-backends May 4, 2026 15:37
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 94.60784% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.30%. Comparing base (611d2c3) to head (445d0a3).
⚠️ Report is 1 commits behind head on lcian/feat/multipart-upload-other-backends.

Files with missing lines Patch % Lines
objectstore-service/src/backend/testing.rs 0.00% 10 Missing ⚠️
objectstore-service/src/backend/mod.rs 0.00% 9 Missing ⚠️
objectstore-service/src/backend/tiered.rs 99.22% 3 Missing ⚠️
Additional details and impacted files
@@                              Coverage Diff                               @@
##           lcian/feat/multipart-upload-other-backends     #458      +/-   ##
==============================================================================
+ Coverage                                       86.91%   87.30%   +0.38%     
==============================================================================
  Files                                              77       77              
  Lines                                           11198    11603     +405     
==============================================================================
+ Hits                                             9733    10130     +397     
- Misses                                           1465     1473       +8     
Components Coverage Δ
Rust Backend 92.03% <94.60%> (+0.27%) ⬆️
Rust Client 80.56% <ø> (ø)
Python Client 87.47% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread objectstore-service/src/backend/tiered.rs Outdated
@lcian

This comment was marked as outdated.

@lcian lcian marked this pull request as draft May 4, 2026 15:50
lcian added 2 commits May 4, 2026 18:25
Move ChangeGuard creation before get_metadata so that if the metadata
read fails after complete_multipart succeeds, the guard cleans up the
assembled blob. Previously, Manual-expiration objects could be orphaned
permanently. Also fix redundant rustdoc link.
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 5, 2026

Comment on lines +552 to +577
struct TieredUploadId {
revision: String,
upload_id: String,
}

impl TryInto<UploadId> for TieredUploadId {
type Error = Error;

fn try_into(self) -> Result<UploadId, Self::Error> {
use base64::Engine;
let json =
serde_json::to_vec(&self).map_err(|e| Error::serde("encoding multipart token", e))?;
Ok(base64::engine::general_purpose::URL_SAFE_NO_PAD.encode(json))
}
}

impl TryFrom<&UploadId> for TieredUploadId {
type Error = Error;

fn try_from(value: &UploadId) -> Result<Self, Self::Error> {
let json = base64::engine::general_purpose::URL_SAFE_NO_PAD
.decode(value.as_bytes())
.map_err(|e| Error::generic(format!("invalid multipart upload ID: {e}")))?;
serde_json::from_slice(&json).map_err(|e| Error::serde("decoding multipart token", e))
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about whether we want to encrypt or otherwise obfuscate these contents to hide the physical revision and upload id from the user, but I don't think there's really any issue (security-wise or otherwise) with just giving them to the user in clear.

fn try_into(self) -> Result<UploadId, Self::Error> {
use base64::Engine;
let json =
serde_json::to_vec(&self).map_err(|e| Error::serde("encoding multipart token", e))?;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a smarter way to concatenate the 2 with less size overhead than JSON...

@lcian lcian marked this pull request as ready for review May 7, 2026 08:54
@lcian

This comment was marked as off-topic.

@lcian lcian requested a review from matt-codecov May 7, 2026 08:55
Comment on lines +699 to +701
.await?;

if error.is_some() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When complete_multipart fails due to invalid parts, the code returns early without aborting the multipart upload in the backend, leaving orphaned parts and causing a resource leak.
Severity: MEDIUM

Suggested Fix

Before returning the error in the case where long_term.complete_multipart() returns Ok(Some(error)), add a call to long_term.abort_multipart(). This will ensure that any resources associated with the failed multipart upload are properly cleaned up from the backend storage, preventing resource leaks.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: objectstore-service/src/backend/tiered.rs#L699-L701

Potential issue: When the `complete_multipart` function in the long-term storage backend
fails due to an application-level error, such as invalid parts, the code returns early
at `tiered.rs:702`. This early return bypasses the necessary cleanup logic.
Consequently, the in-progress multipart upload is not aborted in the backend (e.g., GCS,
LocalFs). This leaves orphaned parts in the storage system, leading to a resource leak
that consumes storage space and quota over time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. The user needs to have the possibility to retry the complete_multipart call if they supplied wrong data.
Cleanup of stale ongoing multipart uploads will be handled by each long-term backend.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 409be20. Configure here.

.await?
.ok_or_else(|| {
Error::generic("completed multipart object not found in long-term storage")
})?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transient metadata read failure causes irrecoverable upload loss

Medium Severity

After the inner complete_multipart succeeds (consuming the multipart upload irreversibly), the guard is advanced to Written, and then get_metadata is called. If this call fails (transient network error or returns None), the ? propagates an error and the guard drops in Written phase. The cleanup logic then deletes the successfully assembled object from long-term storage. Unlike put_long_term — where the metadata is available as a parameter and no extra call exists between the LT write and CAS commit — here the client cannot retry complete_multipart because the upload has been consumed by the inner backend. They must re-upload all parts from scratch, which can be very costly for large uploads. Encoding the expiration_policy in the TieredUploadId at initiation time would eliminate this failure point entirely.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 409be20. Configure here.

Copy link
Copy Markdown
Member Author

@lcian lcian May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very unlikely that this fails. I don't want to encode more information in the token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant