Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize uploads when using S3 object storage #19414

Closed
tcoupin opened this issue Feb 11, 2020 · 7 comments
Closed

Optimize uploads when using S3 object storage #19414

tcoupin opened this issue Feb 11, 2020 · 7 comments
Labels

Comments

@tcoupin
Copy link
Member

tcoupin commented Feb 11, 2020

Is your feature request related to a problem? Please describe.
I use a private S3 as primary storage. When I upload a big file using webui or desktop sync client, this file is send by chunk. Each chunk is store on S3 and finaly nextcloud download all chunks, assemble them and upload the entire file to s3.

The webDAV steps:

  • MKCOL request
  • PUT request of 10 MBytes
  • MOVE request to merge

Describe the solution you'd like
It would be great if the sending of each chunk could match an S3 multipart sending. So the final merge step would be useless and the upload more fast.

S3 steps corresponding to webdav steps:

Describe alternatives you've considered
Pay beers to nextcloud developpers?

Additional context
no

@tcoupin tcoupin added enhancement 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Feb 11, 2020
@solracsf
Copy link
Member

Solving this could also solve things like #19223

@solracsf solracsf added feature: object storage and removed 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Feb 19, 2020
@skjnldsv skjnldsv added the 0. Needs triage Pending check for reproducibility or if it fits our roadmap label Aug 20, 2020
@szaimen szaimen added 1. to develop Accepted and waiting to be taken care of and removed 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Jun 8, 2021
@luzhkovvv
Copy link

The additional use case for that is handling really large files. Right now I need to upload files up to 200GB (video archives) with S3 as backend, and it's totally impossible with the current system. Assembling chunks on the local filesystem, including S3 downloads and uploads is done at approximately 1GB/min in my case (i think it's all done in one stream?), which means 200 min for synchronous MOVE request. Not to mention an additional 200GB temporary storage and 2x200GB traffic.
So need to offload chunks management to the backend for all multipart-enabled backends (i think most object storages are).

Now it's a lot of useless work:

  • to upload chunks to the backend as complete files;
  • then download all chunks back to local fs, assembling the final file and resplitting it into parts for upload;
  • upload parts using multipart upload.

How it should look - if the backend supports multipart upload, chunks should be uploaded by nextcloud to the backend as parts of the final file, not as complete temporary files. After the last part is uploaded, the final file is ready for consumption without additional assembly step.

What we get rid of for advanced backends for every file > 10MB (by default):

  • temporary storage (only for current chunk);
  • traffic (2x reduction);
  • upload time (depends on network speed, if client-nextcloud and nextcloud-backend speed is the same - then 2x reduction);
  • all the issues with timeouts on MOVE request (multiple threads on the support forum, apache timeouts, php timeouts, reverse proxy/lb timeouts).

@krakazyabra
Copy link

Hi guys! Is there any chance to implement that? Years go by, files get bigger, S3 is more and more popular.

@ZE3kr
Copy link
Contributor

ZE3kr commented Mar 30, 2023

It seems that the #27034 has been merged to master and 26.0.0 recently.

However, this feature also needs the client to support it (since it has limitations). As per my trial, the newest desktop version didn't implement it yet. I have submitted an issue here: nextcloud/desktop#5554. Please upvote that issue if you need this feature.

@krakazyabra
Copy link

Sounds great. Hope, that there will be backports too!

@solracsf
Copy link
Member

Fixed by #27034 and nextcloud/desktop#5939

@xplosionmind
Copy link

Thanks a lot, @solracsf! Just one quick question: does this improve performance for S3 uploading only when using it as the primary storage or also when it is used as external storage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants