-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dropbox: please report rejected attempts to upload larger objects. #5008
Comments
|
We (Dropbox) internally rolled out a API perf test daemon and tested the performance of uploading 350GB file using API endpoints. It confirmed we can upload 350GB file in ~50mins using concurrent upload sessions (client is not network/disk bounded, so real experience may be a little slower). This upload attempt may hit API rate-limit errors but if you just backoff and retry, upload will succeed. |
|
@rajatgoel Thank you for info @borkd You might be intetested. See the comment from Dropbox devs above. As stated in #4656 (comment) (AFAICS), rclone plans to test the new dropbox SDK with improved upload concurrency. Retrying at several levels is one of rclone intrinsic features. |
|
Looking forward to testing this |
|
Blocks on #4656 (comment) |
|
Can try this beta? v1.55.0-beta.5331.64427951b.fix-dropbox-batch-sync on branch fix-dropbox-batch-sync (uploaded in 15-30 mins) I'm collecting together dropbox upload improvments in #5156 |
|
What's the expected performance impact (if any) for lots of small files with this beta? Haven't gotten to large files yes |
It should be much improved as it won't trip the dropbox namespace locks and timeouts. |
|
I've tried it a couple of times, still seeing When If copy is running it gets affected, too: |
|
Also, have just ran into an internal error with subsequent segmentation violation, which I have never encountered with rclone before. Running on a server with ECC ram, and source data is fetched from ZFS pool and there are no problems there |
If you could post the full backtrace that would be helpful - it might be an rclone bug.
Are you doing uploads from more than one rclone process at once? Dropbox doesn't like that.
So just
Yes it will. Dropbox likes all the operations to one namespace to be serialized in one upload thread. If you don't do this it works, but if you do it too fast then you get those "too_many_requests" errors and some long timeouts. |
|
|
For reference, the size of the dataset in question is ~2.8T, and ~800k files. |
|
I can fix the bug from that backtrace thanks.
Thoughts on that? |
I know how fragile Dropbox API is when it comes to call frequency, and I do try to serialize uploads or any other lifecycle management operations when possible. That can easily become a roulette without global locking and/or scheduling. Having to manage serialization on the customer side becomes a serious problem when daily changes within any one namespace start to take a week or longer to sync |
This should be less of a problem when using batch mode as it is only the batch commits which shouldn't overlap, not the actual uploads.
I can't work out which version of rclone you are using here. It doesn't look like the latest from here: https://beta.rclone.org/branch/fix-dropbox-batch-sync/v1.55.0-beta.5334.ac7acb9f0.fix-dropbox-batch-sync/ Can you try again with that one please? |
|
I have built my binary from the branch suggested earlier in the comments: |
Built shortly after this was referenced ^^^ |
|
That is what I needed to know - the commit number of the build you were using. The crash happened here
rclone/backend/dropbox/batcher.go Line 201 in 6442795
Which looks like It was called from
rclone/backend/dropbox/batcher.go Line 246 in 6442795
So why was I think it was mostly likely because the batch didn't complete in 120 seconds and this was called rclone/backend/dropbox/batcher.go Line 162 in 6442795
This is fixed in the current version, so hopefully it won't happen again. |
|
Testing 5394 now |
|
@ncw - no segfault this time, but some more internal_error feedback. Note I have specified max age to copy changes from last 24 hours. I was careful to not issue any potentially competing operations against the dropbox namespace/bucket, and actually the entire account so API rate limiting is not accidentally triggered. |
No segfault is an improvement. However the batch not completing for 2 minutes is a problem, as is Can you tell me something about the number and size distribution of the upload? If you could do a copy with |
|
Sure. This particular set being synced is a relatively small but constantly evolving backup repository. Current footprint is 3TB in 1.4M files scattered across a number of sparsely populated directories. Data there changes multiple times a day. Rclone can handle it, albeit it takes quite some time. For example, when defaults are used a simple check without any data transfer to a nearby swiftstack backend takes about 95 minutes. With tens of checkers this time gets cut in half. A bunch of transfers need to be specified for actual copy to happen at acceptable overall speed. Dropbox and box backends are where the pain and suffering gets real... Above set is the smallest in what I'm trying to push through. Next one is changing almost as often but is about an order of magnitude larger. Attempts to keep that up to date with Dropbox and/or box was never successful |
|
If you can get me a log with |
|
The very verbose session with dump responses has been running since. What kind of overhead/slowdown does that impose compared to normal operation? I'm curious whether we are altering the transaction profile enough to not trigger anything on the Dropbox side. |
|
A good question. Some overhead certainly. Are you logging it straight to file - that will be the quickest way. It does change the processing path of http requests so I guess it could mean that the bug doesn't show. |
|
It's been running, or rather crawling along for 172hrs and reports another 3 weeks and 5 days to go. It took easily more than a month to find and reproduce the original issue with large uploads, this is becoming time consuming as well. Do you have any ideas how to make this troubleshooting more interactive? |
Wow!
:-(
Is there a smaller subset of the data? Can you grep the logs so far and see if the problem has ocurred? |
|
There are some rate limiting messages in the log, so I will send you the incomplete compressed log with whatever accumulated so far. |
|
Thanks for the log. I can see two events in here... First a batch failed to complete after 120 seconds - this all looks completely normal except for the batch (with 1 file) taking 120 seconds to complete. So rclone gives up here and tries to finish the next batch And immediately gets a 500 error. Rclone retries this 10 times then gives up. The two problems look related... I will ask on the dropbox developer forum to see if they have any hints. |
|
@ncw - did you ever get some feedback on this from the Dropbox developers? |
|
The feedback was - try for longer that 2 minutes. I haven't implemented that yet though. And the 500 error was a "transient availability" problem. |
|
Rate limit errors do not show with the latest release, but I have run into internal_error/ and complete stall for many hours during uploads. I have opened a new issue #5491, but if the problems are related it might be a duplicate. |
@rajatgoel: the 350GB you mentioned, does it mean 350GiB (ie, 350 * 1024 * 1024 * 1024) or 350 decimal GB (ie, 350 * 1000 * 1000 * 1000)? TIA! |
Its 350 * 1024 * 1024 * 1024 according to the code. I have asked the team to clarify this in the documentation. |
Originally reported by @borkd at #2158 (comment)
https://help.dropbox.com/installs-integrations/sync-uploads/upload-limitations
The text was updated successfully, but these errors were encountered: