Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serve s3: uses too much memory on multipart uploads #7453

Open
ncw opened this issue Nov 24, 2023 · 4 comments
Open

serve s3: uses too much memory on multipart uploads #7453

ncw opened this issue Nov 24, 2023 · 4 comments

Comments

@ncw
Copy link
Member

ncw commented Nov 24, 2023

When uploading multipart files serve s3 holds all the parts in memory. This is a documented limitaton of the library rclone uses for serving S3.

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@nbari
Copy link

nbari commented Dec 17, 2023

if it helps in s3m - https://github.com/s3m/s3m/ when reading from STDIN, I use a file buffer from there I take the checksums and then stream it, I would like to use named pipes but the problem is that I need to know the checksums before doing the upload because of the S3 signature.

@StarAurryon
Copy link

StarAurryon commented Mar 7, 2024

Seems to me that's an issue with the gofakes3 lib rclone is using.

See at https://github.com/Mikubill/gofakes3/blob/master/uploader.go
`func (mpu *multipartUpload) AddPart(partNumber int, at time.Time, body []byte) (etag string, err error) {
if partNumber > MaxUploadPartNumber {
return "", ErrInvalidPart
}

mpu.mu.Lock()
defer mpu.mu.Unlock()

// What the ETag actually is is not specified, so let's just invent any old thing
// from guaranteed unique input:
hash := md5.New()
hash.Write([]byte(body))
etag = fmt.Sprintf(`"%s"`, hex.EncodeToString(hash.Sum(nil)))

part := multipartUploadPart{
	PartNumber:   partNumber,
	Body:         body,
	ETag:         etag,
	LastModified: NewContentTime(at),
}
if partNumber >= len(mpu.parts) {
	mpu.parts = append(mpu.parts, make([]*multipartUploadPart, partNumber-len(mpu.parts)+1)...)
}
mpu.parts[partNumber] = &part
return etag, nil

}`

The body of a part is stored in memory.

The suggestions that I can make is as rclone serve s3 is considering the root path as root of all buckets :

  • Either we use local cache file to implement mpu.parts[partNumber] ,
  • Either we use /uploads/bucket-name/multipart-upload-{id}/part_{id}.

This could be done with an option to provide an existing path to VFS.

BTW would be wise to integrate gofakes3 lib into rclone repo as it is not widely used / reviewed and could be an target of choice for a malicious actor.

Missed sth : I can provide with a PR if this approach is ok for you.

@ncw
Copy link
Member Author

ncw commented Mar 8, 2024

Seems to me that's an issue with the gofakes3 lib rclone is using.

Yes it is.

The body of a part is stored in memory.

The suggestions that I can make is as rclone serve s3 is considering the root path as root of all buckets :

  • Either we use local cache file to implement mpu.parts[partNumber] ,

This could be done very easily using --vfs-cache-mode writes - this would assemble the file on disk then upload it when it was finished. This has the disadvantage that you need local disk space and that the file doesn't get uploaded until the end. It would work with all backends though.

  • Either we use /uploads/bucket-name/multipart-upload-{id}/part_{id}.

We have the problem then that we need to join the chunks back together when they are downloaded. This isn't impossible but it is quite a bit of extra work.

A third option would be for backends which support OpenChunkWriter or OpenWriterAt to send the chunks directly to the backend. This was my original plan and would work very well with backends which support that, but not at all with other backends.

This could be done with an option to provide an existing path to VFS.

BTW would be wise to integrate gofakes3 lib into rclone repo as it is not widely used / reviewed and could be an target of choice for a malicious actor.

@Mikubill we discussed transferring the gofakes3 fork to the rclone organisation - would you still like to do that? If so let me know and I'll send you instructions.

Missed sth : I can provide with a PR if this approach is ok for you.

Offer of a PR is great! I think we need to sort out the right approach and where gofakes3 is living first but yes - thank you.

@StarAurryon
Copy link

Offer of a PR is great! I think we need to sort out the right approach and where gofakes3 is living first but yes - thank you.

Agreed. Lets wait for @Mikubill answer first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants