Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not leave 0-byte multipart files in Cloud Storage #602

Closed
wants to merge 1 commit into from
Closed

Do not leave 0-byte multipart files in Cloud Storage #602

wants to merge 1 commit into from

Conversation

kurtismullins
Copy link
Contributor

@kurtismullins kurtismullins commented Nov 20, 2020

Issue: https://issues.redhat.com/browse/PROJQUAY-1123

Changelog: fix: Do not leave orphaned 0-byte multi-part uploads in Cloud Storage

Docs: n/a

Testing:

  • Push images. Ensure there are no objects still sitting in an "incomplete" state after the image push is finalized successfully. There is a separate case where objects with > 0 bytes are left in the storage. This does not fix that issue.

Details:

  • Prior to this Pull Request, zero-byte objects are created and left in a multi-part pending state. Due to a conditional statement in the code, those are never cleaned up. This Pull Request takes the approach of simply cleaning up those objects immediately. Eventually, the Cloud Storage logic should be refactored or fixed to never write these abandoned files in the first place.

@thomasmckay
Copy link
Contributor

When I tried reproducing this, I would start podman push then ctrl-c it part way through. This left non-zero sized files in the /uploads dir. When are those cleaned up?

@kurtismullins
Copy link
Contributor Author

@thomasmckay

When I tried reproducing this, I would start podman push then ctrl-c it part way through. This left non-zero sized files in the /uploads dir. When are those cleaned up?

That's a good question. I'm not sure off-hand.

This Pull Request addresess the issue where multi-part uploads are initiated but never marked as completed. In Noobaa/RHOCS, this will be shown as "incomplete" uploads. If Quay is creating those objects in the /uploads directory and leaving them as "in-progress" multi-part objects, then this should resolve that.

If the files you're talking about are finalized, then that requires a different mechanism to go through and "garbage collect" the uploads folder. I'm not sure if anything like that exists in Quay. I would consider that a separate issue, if not.

@kurtismullins kurtismullins changed the title [PROJQUAY-1123] fix: Do not create and orphan Cloud storage objects Do not leave 0-byte multipart files in Cloud Storage Dec 17, 2020
@@ -396,6 +396,13 @@ def initiate_chunked_upload(self):
def stream_upload_chunk(self, uuid, offset, length, in_fp, storage_metadata, content_type=None):
self._initialize_cloud_conn()

# Do not create zero-byte objects.
# TODO: length=-1 appears to be a special case and used in tests. Why? Document it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Length -1 means to write the whole in_fp stream, vs just length bytes if set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Thank you!

@@ -408,6 +415,12 @@ def stream_upload_chunk(self, uuid, offset, length, in_fp, storage_metadata, con
if bytes_written > 0:
new_metadata[_CHUNKS_KEY].append(_PartUploadMetadata(chunk_path, offset, bytes_written))

# Do not leave zero-byte chunks in Object Storage
# TODO: Determine a clean way to eliminate calling this function when it's not needed.
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woudn't the above check for zero-length prevent these zero bytes objects to be written in the first place. i.e Don't need to remove them anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so. In the case where it's explicitly defined as 0 bytes when this file is called, it is short-circuited.

In other cases, I believe that the -1 you explained earlier leads to the creation and writing of a 0 byte file. I'm not exactly sure if that's how they get created, but that initial short-circuit did not fix it for all cases during my personal testing.

upload_id, 0, -1, BytesIO(data), metadata
)
assert len(data) == bytes_written
assert mock_remove.call_count == 1
Copy link
Member

@kleesc kleesc Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, shouldn't need to remove anymore, since it's not being written anymore?

@kleesc
Copy link
Member

kleesc commented Dec 17, 2020

@kurtismullins Is this just an issue with Nooba. What is causing the zero-bytes to be written in the first place?

@kurtismullins
Copy link
Contributor Author

kurtismullins commented Dec 17, 2020

@kurtismullins Is this just an issue with Nooba. What is causing the zero-bytes to be written in the first place?

@kleesc IIRC, Ivan has witnessed this on AWS S3. I can also see it in a GCP Object Storage bucket I'm using for a long-running Quay deployment.

re: Why is this happening? The chunks are created optimistically even if they don't contain data. I believe that question (and solution) will require a larger body of work (refactor) to solve correctly. This PR is primarily a "hack" work-around until a real solution can be prioritized and implemented.

@kurtismullins kurtismullins marked this pull request as ready for review December 17, 2020 19:38
@kleesc kleesc closed this Mar 24, 2021
@kleesc
Copy link
Member

kleesc commented Mar 24, 2021

Resolved with #705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants