feat: Implement MultipartUploadBackend on TieredStorage#458
feat: Implement MultipartUploadBackend on TieredStorage#458lcian wants to merge 9 commits intolcian/feat/multipart-upload-other-backendsfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## lcian/feat/multipart-upload-other-backends #458 +/- ##
==============================================================================
+ Coverage 86.91% 87.30% +0.38%
==============================================================================
Files 77 77
Lines 11198 11603 +405
==============================================================================
+ Hits 9733 10130 +397
- Misses 1465 1473 +8
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as outdated.
This comment was marked as outdated.
Move ChangeGuard creation before get_metadata so that if the metadata read fails after complete_multipart succeeds, the guard cleans up the assembled blob. Previously, Manual-expiration objects could be orphaned permanently. Also fix redundant rustdoc link.
| struct TieredUploadId { | ||
| revision: String, | ||
| upload_id: String, | ||
| } | ||
|
|
||
| impl TryInto<UploadId> for TieredUploadId { | ||
| type Error = Error; | ||
|
|
||
| fn try_into(self) -> Result<UploadId, Self::Error> { | ||
| use base64::Engine; | ||
| let json = | ||
| serde_json::to_vec(&self).map_err(|e| Error::serde("encoding multipart token", e))?; | ||
| Ok(base64::engine::general_purpose::URL_SAFE_NO_PAD.encode(json)) | ||
| } | ||
| } | ||
|
|
||
| impl TryFrom<&UploadId> for TieredUploadId { | ||
| type Error = Error; | ||
|
|
||
| fn try_from(value: &UploadId) -> Result<Self, Self::Error> { | ||
| let json = base64::engine::general_purpose::URL_SAFE_NO_PAD | ||
| .decode(value.as_bytes()) | ||
| .map_err(|e| Error::generic(format!("invalid multipart upload ID: {e}")))?; | ||
| serde_json::from_slice(&json).map_err(|e| Error::serde("decoding multipart token", e)) | ||
| } | ||
| } |
There was a problem hiding this comment.
I've thought about whether we want to encrypt or otherwise obfuscate these contents to hide the physical revision and upload id from the user, but I don't think there's really any issue (security-wise or otherwise) with just giving them to the user in clear.
| fn try_into(self) -> Result<UploadId, Self::Error> { | ||
| use base64::Engine; | ||
| let json = | ||
| serde_json::to_vec(&self).map_err(|e| Error::serde("encoding multipart token", e))?; |
There was a problem hiding this comment.
Maybe there's a smarter way to concatenate the 2 with less size overhead than JSON...
This comment was marked as off-topic.
This comment was marked as off-topic.
…feat/multipart-upload-tiered
| .await?; | ||
|
|
||
| if error.is_some() { |
There was a problem hiding this comment.
Bug: When complete_multipart fails due to invalid parts, the code returns early without aborting the multipart upload in the backend, leaving orphaned parts and causing a resource leak.
Severity: MEDIUM
Suggested Fix
Before returning the error in the case where long_term.complete_multipart() returns Ok(Some(error)), add a call to long_term.abort_multipart(). This will ensure that any resources associated with the failed multipart upload are properly cleaned up from the backend storage, preventing resource leaks.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: objectstore-service/src/backend/tiered.rs#L699-L701
Potential issue: When the `complete_multipart` function in the long-term storage backend
fails due to an application-level error, such as invalid parts, the code returns early
at `tiered.rs:702`. This early return bypasses the necessary cleanup logic.
Consequently, the in-progress multipart upload is not aborted in the backend (e.g., GCS,
LocalFs). This leaves orphaned parts in the storage system, leading to a resource leak
that consumes storage space and quota over time.
There was a problem hiding this comment.
This is fine. The user needs to have the possibility to retry the complete_multipart call if they supplied wrong data.
Cleanup of stale ongoing multipart uploads will be handled by each long-term backend.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 409be20. Configure here.
| .await? | ||
| .ok_or_else(|| { | ||
| Error::generic("completed multipart object not found in long-term storage") | ||
| })?; |
There was a problem hiding this comment.
Transient metadata read failure causes irrecoverable upload loss
Medium Severity
After the inner complete_multipart succeeds (consuming the multipart upload irreversibly), the guard is advanced to Written, and then get_metadata is called. If this call fails (transient network error or returns None), the ? propagates an error and the guard drops in Written phase. The cleanup logic then deletes the successfully assembled object from long-term storage. Unlike put_long_term — where the metadata is available as a parameter and no extra call exists between the LT write and CAS commit — here the client cannot retry complete_multipart because the upload has been consumed by the inner backend. They must re-upload all parts from scratch, which can be very costly for large uploads. Encoding the expiration_policy in the TieredUploadId at initiation time would eliminate this failure point entirely.
Reviewed by Cursor Bugbot for commit 409be20. Configure here.
There was a problem hiding this comment.
It's very unlikely that this fails. I don't want to encode more information in the token.


Last implementation of
MultipartUploadBackendinobjectstore-service.Now
TieredStoragerequires aMultipartUploadBackendin itslong_termslot.All methods just pretty much forward to
long_term, with the exception ofcomplete_multipartwhich needs to use theChangelogfor consistency and deal with the tombstones, with logic very similar toput_objectexcept that we need an additional metadata read call.Close FS-339