Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 upload: troubles with big files (temp files) #29589

Closed
labor4 opened this issue Nov 8, 2021 · 3 comments
Closed

S3 upload: troubles with big files (temp files) #29589

labor4 opened this issue Nov 8, 2021 · 3 comments
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap bug feature: object storage needs info

Comments

@labor4
Copy link
Contributor

labor4 commented Nov 8, 2021

This report is about the handling of an S3 upload preparation, and its failing with big files.

The trouble is that NC struggles to handle large S3 uploads.
I try to lay out where it fails and why, with eye sight to (my) human errors.
The first stumble is that the small OS disk gets filled with chunks (which I didn't expect).
Upon inspection I wonder if the process could even be half as long, too.
Thank you for your time, and all this good stuff!

NC Version: 22.2.0
Fresh install: Yes
Setup: Postgres, Apache2 (h2/http1.1), PHP-fpm, Push, HAproxy
PHP: 7.4
OS: Ubuntu 20.04
Encryption: no
nproc: 8
RAM: 8 GB
Swap: no

Against: Wasabi S3

index.php/settings/integrity/failed
No errors have been found.

Disks

- small OS FS (hosting /tmp) has 15G free (too little for the big file)
- big ['tempdirectory'] is on a 2.3T NFS share (plenty enough)
- big ['datadirectory'] is on another 2.3T NFS share

Task
COPY from ['datadirectory'] to S3, via GUI, leaving Browser open

Testfiles
Failing Bigfile: 26 GB file
Succeeding Smallfile: 300 MB file

Environment

PHP's ['max_execution_time'] = 3500 (seen via http/phpinfo())
NC uses the ['tempdirectory'] for a full copy, PHP uses the /tmp for chunks.
There seem to be no limits touched at Wasabi's side. Their max PUT is 5G. NC does 500MB chunks.
It seems there are no timeouts touched.

Errors

  • There is a misleading web notification (premature negative, but without "disk full": false negative)
  • There is an error in the NC logs about "disk full" after the second try

Overall Description

  • Upon S3 copy command, a seemingly identical temp file gets prepared first in ['tempdirectory'] (first copy stage)
  • Beginning the S3 upload, chunks get made in /tmp (second copy stage)
  • This is probably where the first "/tmp disk full" takes place, and NC tries again.
  • After the /tmp disk is at 99%, the S3 upload stops, and NC begins to re-try the first prep stage, followed by another /tmp stage. This quickly fails at 100% disk usage, logging an error for that.
  • after that, both /tmp and ['tempdirectory'] are being emptied. No file exists at S3.

Timeline
This is probably unrelated to timeouts, but I took notes anyways:

0'00" NC begins first copy stage
2'40" temp file ready, S3 upload starts, /tmp gets populated with chunks
5'00" Error: "FILE could not be copied", S3 upload continues
14'30" /tmp is 99% full, 10s after: S3 upload stops, a second temp-file gets prepared, /tmp still 99% full, no Error on Log-Page
24'00" /tmp goes 100% full, both temp files are deleted, no more action towards S3. NC gives up and logs "disk full"

Logs (NC Protocol Page)

fwrite(): write of 8192 bytes failed with errno=28 No space left on device at /home/serveruser/www/lib/private/Log/File.php#89	

Sabre\DAV\Exception: Unable to write to stream

What if I avoid "disk full"?
if both PHP and NC tempfiles are on the "plentiful" NFS mount, the web notification "FILE could not be copied" still appears, but the S3 upload succeeds, and no "disk full" error gets logged.

Wished solution
It seems plausible to skip the first non-chunked stage, because

  • the temp file seems unmodified?
  • process protection could be done by a file lock?
  • problems with storage limits arise sooner (calculate FILESIZE*3)
  • it all takes MUCH longer

others:

  • There should be a setup warning so the chunk dir cannot produce problems ("do you really want /tmp ?")
  • having two separate temp dirs for PHP and NC was unkown/misleading to me
  • It would be even better if the uploader could "dial" into the full original file, avoiding pre-chunking alltogether, thus starting the S3 upload right away.
  • I don't know what event triggered the error notification. It was either too early for "disk full", or wrong in case of success.

Best Regards
Manu

@labor4 labor4 added bug 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Nov 8, 2021
@solracsf
Copy link
Member

solracsf commented Jan 26, 2022

See if #30843 (comment) helps you. This #27034 will also help once merged.

@labor4
Copy link
Contributor Author

labor4 commented Jan 26, 2022

See if #30843 (comment) helps you. This #27034 will also help once merged.

Thanks!

I am aware of and very keen on the multipart effort over there. Very interesting.
I didn't understand it in-depth, but It may well cover this issue except for the wrong notification.

As for the comment mentioned:
I believe the js part does not affect this issue, as i describe direct SERVER->S3 (well, assuming php does it), and the chunksize part was high (good for steady work) but should not be none/all, as this would potentially hit limits at the receiver.

@szaimen
Copy link
Contributor

szaimen commented Jan 23, 2023

Hi, please update to 24.0.9 or better 25.0.3 and report back if it fixes the issue. Thank you!

My goal is to add a label like e.g. 25-feedback to this ticket of an up-to-date major Nextcloud version where the bug could be reproduced. However this is not going to work without your help. So thanks for all your effort!

If you don't manage to reproduce the issue in time and the issue gets closed but you can reproduce the issue afterwards, feel free to create a new bug report with up-to-date information by following this link: https://github.com/nextcloud/server/issues/new?assignees=&labels=bug%2C0.+Needs+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap bug feature: object storage needs info
Projects
None yet
Development

No branches or pull requests

3 participants