Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3: stream a layer to s3 if possible, instead of getting it then sending it #4551

Merged
merged 1 commit into from
May 9, 2024

Conversation

azr
Copy link
Contributor

@azr azr commented Jan 12, 2024

Hello there !

This allows to have a more flat memory usage.

I took the insides of ReadBlob, to make it streamable:

// ReadBlob retrieves the entire contents of the blob from the provider.
//
// Avoid using this for large blobs, such as layers.
func ReadBlob(ctx context.Context, provider Provider, desc ocispec.Descriptor) ([]byte, error) {
if int64(len(desc.Data)) == desc.Size && digest.FromBytes(desc.Data) == desc.Digest {
return desc.Data, nil
}
ra, err := provider.ReaderAt(ctx, desc)
if err != nil {
return nil, err
}
defer ra.Close()
p := make([]byte, ra.Size())
n, err := ra.ReadAt(p, 0)
if err == io.EOF {
if int64(n) != ra.Size() {
err = io.ErrUnexpectedEOF
} else {
err = nil
}
}
return p, err
}

In our case the code is a little bit modified further in the sense that we allow to have 5 uploads at once, which exacerbated this even more.

The s3 uploader seems to make use of the ReadSeeker to do multipart uploads, hence why I implemented ReadSeekerFromReaderAt. I wish s3 would also handle ReaderAts.

func (u *multiuploader) upload(firstBuf io.ReadSeeker, cleanup func()) (*UploadOutput, error) {

Edit: Here are some logs with GC logs on:

Before(same image):

#31 preparing build cache for export
gc 75 @150.145s 0%: 0.018+1.0+0.004 ms clock, 0.29+0.14/3.9/5.5+0.071 ms cpu, 8344->8344->8344 MB, 8344 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 76 @150.188s 0%: 0.029+0.99+0.004 ms clock, 0.46+0.26/3.5/7.4+0.065 ms cpu, 8344->8344->8344 MB, 8344 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 77 @150.536s 0%: 0.035+0.99+0.003 ms clock, 0.57+0.098/3.6/7.6+0.053 ms cpu, 8344->8344->8344 MB, 8344 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 78 @150.556s 0%: 0.022+2.1+0.13 ms clock, 0.35+0.18/4.2/9.6+2.1 ms cpu, 8344->8349->8345 MB, 8344 MB goal, 0 MB stacks, 0 MB globals, 16 P
gc 79 @150.559s 0%: 0.049+1.7+0.005 ms clock, 0.78+0.15/5.3/5.5+0.091 ms cpu, 8345->8346->8345 MB, 8345 MB goal, 0 MB stacks, 0 MB globals, 16 P

After:

#31 preparing build cache for export 73.7s done
gc 9 @152.991s 0%: 0.038+0.49+0.063 ms clock, 0.61+0.054/1.0/0.87+1.0 ms cpu, 4->5->2 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 16 P

@azr azr force-pushed the azr/s3-stream-finalized-blobs branch 4 times, most recently from 2160862 to 8d017a2 Compare January 12, 2024 10:33
}
if err := e.s3Client.saveMutable(ctx, key, dt); err != nil {
return nil, layerDone(errors.Wrap(err, "error writing layer blob"))
if int64(len(dgstPair.Descriptor.Data)) == dgstPair.Descriptor.Size && digest.FromBytes(dgstPair.Descriptor.Data) == dgstPair.Descriptor.Digest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these digests can never have inline data. If we want to handle this case for safety then we should write a helper function (in contentutil maybe, or maybe containerd wants it) ReaderAt(ctx, Provider, desc) that also handles the inline data.

Copy link
Contributor Author

@azr azr Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moving this there ! (Edit: I mean trying to move things to containerd )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gaining a little bit of traction over there with a very good review simplifying the code, containerd/containerd#9657 Will update this PR if///when...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, containerd/containerd#9657 was recently merged. Rebasing & refreshing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait they are releasing a v2 it seems 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, don't update to v2 as part of this PR. You can leave a comment that this needs to be updated after upgrade to containerd v2 or you can just ignore this case as inline descriptor data should be impossible atm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I pushed a change without the inline data check and a TODO, lmk if anything

azr added a commit to azr/containerd that referenced this pull request Jan 19, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551
Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
azr added a commit to azr/containerd that referenced this pull request Jan 19, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
azr added a commit to azr/containerd that referenced this pull request Jan 19, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
azr added a commit to azr/containerd that referenced this pull request Jan 19, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
azr added a commit to azr/containerd that referenced this pull request Jan 19, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
azr added a commit to azr/containerd that referenced this pull request Feb 23, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
azr added a commit to azr/containerd that referenced this pull request Feb 23, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
@azr azr force-pushed the azr/s3-stream-finalized-blobs branch 6 times, most recently from 6b3df41 to 58c86e9 Compare May 7, 2024 14:33
Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
@azr azr force-pushed the azr/s3-stream-finalized-blobs branch from 58c86e9 to ac34889 Compare May 7, 2024 15:10
@tonistiigi tonistiigi merged commit 44ebf90 into moby:master May 9, 2024
74 checks passed
@azr azr deleted the azr/s3-stream-finalized-blobs branch May 13, 2024 10:31
dbro86 pushed a commit to dbro86/containerd that referenced this pull request Aug 16, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
azr added a commit to azr/containerd that referenced this pull request Aug 30, 2024
A downstream library (s3) needs a read seeker to be able to do its own multipart upload.

See: moby/buildkit#4551

Signed-off-by: Adrien Delorme <azr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants