-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat webdav nextcloud chunked v2 #6133
Feat webdav nextcloud chunked v2 #6133
Conversation
All chunked uploads are stuck to 100% now that I've fixed the retry mechanisms. I'll figure out the issue here. |
Hitting an issue when copying a file to my nextcloud with this branch. Logs are pasted below.
The issue seems to be that If these comments are not the correct place for these comments, definitely let me know. Thanks for your work on this feature. It'll really help me out! |
@alkenerly This is the right place. Thanks a lot for testing, I'll look into it. |
@alkenerly Can you retry with the latest changes? |
It sometimes crashes here, not sure why (chunk size: Note that both
|
Running some long running transfers now. I am moving approx 4GB files using a 20mbit connection. So not only are the files large, my transfers will also stress the timeouts of both servers. Tentatively: it works! with an exception Sometimes, the transfer works exactly how you would expect it to! Other times, it gets a 423 Locked when finalizing chunks That error repeats itself 10 times in quick succession (with the pacer rate limit never causing sleep longer than 2s). After 10 of those errors it cancels the chunked upload. However, when it attempts to restart the upload, it seems to figure out that the file was indeed uploaded correctly. It would appear that rclone successfully uploaded the file and nextcloud successfully assembled the chunks. Interestingly, the first low level retry is not triggered by a 423 file locked exception, but by a gateway 504 timeout.
I bet the first request to get the chunks to merge was successfully received, but for whatever reason I get a 504. That causes the subsequent requests to report that the files are locked (which makes sense because nextcloud is performing the merge operation on them). Maybe there isn't a practical way around this except for each user hunting down every kind of timeout that exists on their webserver / reverse proxies etc. If only there were a "hey nextcloud, are you currently merging chunks from this directory right now?" API... Do you have any ideas? |
@alkenerly Could you open an issue on NextCloud's repo? It may be possible to do a workaround but that would be super brittle and/or slow. |
@devnoname120 Just so I can clarify the issue before opening an issue over there: I would think that a 504 is returned when there is a request that takes a long time to respond to. The request that is taking so long in this instance is probably the HTTP MOVE command that assembles the chunks. There would not be any other long outstanding requests other than that, MOVE, right? It is weird because that MOVE command probably takes way less outstanding time than uploading one chunk's worth of data at my slow data rate. |
I may jump in here right away as I seen your development happening as a Nextcloud dev 😉 Really cool to see active work on the chunked upload. The gateway timeout is usually something that can only be solved by the server configuration where the frontend webserver or load balancer has a too short timeout set for requests. On the PHP side those requests do not get cancelled until the PHP maximum execution time is reached (which is 60 minutes by default with Nextcloud: https://github.com/nextcloud/server/blob/f110ff9f4d54c4124483f3ba0e6c681362b79a50/lib/base.php#L624) From the above description it seems like the following is happening:
The general recommendation would be to adjust the server configuration to increase the timeout. |
The PUT requests are a bit different here as there is data transferred during the time, so frontend Webservers usually keep the connection open in that case, while for the MOVE the connection is basically idle between client (rclone) and webserver as it just waits for the PHP operation to finish and return a HTTP response. |
Thanks for jumping in! I am uploading another set of chunks now but am pausing rclone's execution before it issues the MOVE command. My plan is to issue the last MOVE command myself probably using curl so that I can get a rough time until I get the 504. I am running my nextcloud behind an nginx reverse proxy. So it will be a little scavenger hunt to determine which timeout I need to increase in order to get around this issue. All that being said, I haven't personally found an issue with this branch (because this issue I have is a server configuration issue). Great work @devnoname120 |
@juliushaertl Would there be a way to check the state of the Some workarounds I can do on my side:
|
I have increased the relevant timeouts in my environment to prevent getting 504s on chunk assembly. I'll leave this branch transferring files of various sizes for a bit and will report any issues I hit. Thanks again for the work you put in |
Transferred a mix of chunkable and non chunkable files over the last few days and have had no issues |
Thanks a lot! Since then I implemented parallel chunk upload (see https://github.com/devnoname120/rclone/tree/feat-webdav-nextcloud-parallel-chunk-upload). There are two issues remaining:
So it will be for a next pull request. I think that this pull request is functional enough that it makes sense to keep it as is. I'll add some tests and clean up the code a bit, then I'll ask for reviews from the maintainers. |
@alkenerly Can you confirm that it still works with my latest changes? There should be no difference in behavior, I just made the code cleaner. |
There may always be scenarios where file locks can stay around for longer, e.g. if the php process gets killed for some reason. However those should be cleaned up by a background job (if the database is used as a locking backend) or by its set TTL if redis is used as a locking backend. The TTL has a default of 1 hour, but is also configurable on the Nextcloud config.php ( However I agree that it is not the ideal way to monitor if a move is still pending.
Currently there is not really such a thing as a dedicated status check of the operation. During the move operation you would as mentioned receive a 409 locked status code, once it is finished the temporary upload folder will get removed and then you should get a 404 as the source folder for the move is no longer found. |
@juliushaertl Thank you for your reply. It would be great if this feature could be added to NextCloud. |
@ncw Could you do a first review pass on this pull request? I'll then add tests. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking nice :-)
I put a comment inline.
Looking at the commits, this needs a lot of tidying up!
I think we should have one commit each for the preliminary work, ie
- lib/rest
- fstests
I don't think cmd/cmount
should be here.
I'm not sure how many commits the backend/webdav
code should be.
I know its been through a few different authors, so if you squash commits with different authors then add a Co-authored-by: Name1 Name2 <Name1.Name2@example.com>
line at the end of the commit after a blank line - that will keep the authorship straight.
I don't know how experienced a git user you are, so if you'd like me to do the squashing I can.
I'll run the CI now too and we can see if that shows up anything else.
Thank you :-)
96e58d9
to
2783ac1
Compare
@ncw Thank you for your comments. I addressed the changes + a lint formatting issue. |
I wonder if they're dissatisfied because there isn't a clean way to deal with the file locking issue that happens after you ask nextcloud to assemble the chunks. It would be extremely nice if the command to assemble the chunks returned immediately along with a UUID that you could use to poll the status of the assembly... but that just isn't how nextcloud works (yet). In either case, I still use exclusively this branch when doing anything nextcloud related with rclone. I would hope they see it fitting to merge in especially in the case that the chunking feature could be disabled by default. |
This feature would really come in handy! I'm no developer, but is there anything I can do to assist? |
@mzed2k I'm not sure… The main blocker is getting the attention of an rclone maintainer who could review the pull request. |
@devnoname120 I will try to merge this PR for rclone 1.63 - (I'm just about to release 1.62). Can you rebase it, fix up anything which needs fixing and remove the draft flag as that a signal to me that it doesn't need reviewing yet! I'll put it on the milestone. |
@ncw Thanks a lot. I'll budget some time next week to rebase it and do the last touches :) |
6b0c8ac
to
c3e6c12
Compare
Drop me a note here when you want it to be reviewed or anything else. This will generate a GitHub notification which I keep track of closely. |
@ncw I rebased & reviewed the PR and it looks good to me. Some optimizations and improvements (such as parallel chunks upload) can be done but it's more suited for a follow-up PR. I'm not too sure if the integration tests cover the chunked uploads though. I dug into how rclone tests remotes but I couldn't figure out in a reasonable amount of time how to do that… If adding tests is a blocker for merging the PR then I'll need help creating those (or maybe have someone else do it for me). |
I'll take a look at the chunked tests. It should be easy to make them work. My main concern is not to break existing providers as there are a great number of WebDAV providers and we only have integration tests with a few of them . Will review the code when I'm back at my desk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to merge.
It looks like the chunked uploads are properly tested.
I have a feeling the chunked tests are going to break the integration tests as it won't be setting the skip chunks flag but we'll cross that bridge later.
Unfortunately I merged a webdav patch which has caused a conflict for this - can you rebase it again please :-(
I'll then merge it for 1.63 :-)
Co-authored-by: Thibault Coupin <thibault.coupin@gmail.com> Co-authored-by: Nick Craig-Wood <nick@craig-wood.com>
c3e6c12
to
689333c
Compare
@ncw ✅ |
All merged - thank you :-) I suspect this will break the integration tests - check out https://pub.rclone.org/integration-tests/current/ tomorrow after about 12:00 UTC. If it does I will fix up! |
@devnoname120 Well it did break the integration tests, but not in the way I was expecting! If you check out https://pub.rclone.org/integration-tests/current/webdav-backend.webdav-TestWebdavNextcloud-5.txt You'll see that it isn't uploading the modification time correctly for the upload. I think this is doing a non-chunked upload but I'm not 100% sure. You can try this yourself with this in the webdav/backend directory
What do you think? |
@ncw Interesting… Modification time should be handled by the Unfortunately I won't have the bandwidth to debug it in the next two weeks. |
If you've got time to look at it before the next release (maybe 4 weeks away) that would be perfect :-) |
@devnoname120 yes that did fix it :-) |
It looks like a v2 of the chunked uploads API has been released (it has more limitations though).
I'm not sure how important remote uploads are but in the meantime… there is still no API to check on the status of the merging! IMO this should be a priority as it's a breaking bug. We have no way to know whether the merge got stuck or if it's still running — and since a merge can take several minutes on big files it's a major pain for us and our users. Not a minor issue. @juliushaertl Could push this problem to the maintainers of chunking? It has been 5 years already that this bug is there and it still hasn't been addressed! Thank you. 🙏 |
Supersedes #4865.
Fixes #3666.
Follow-up PR with parallel chunks upload: #6179
Compared to the previous PR:
master
& solve conflicts.rclone
is closed abruptly.MOVE
operation without erroring out with404
.chunk_size
config tonextcloud_chunk_size
and set it to NextCloud's default:10 MiB
.application/x-www-form-urlencoded
in order to follow the recommandations of NextCloud.Known limitations:
readers.NewRepeatableLimitReaderBuffer()
. This creates a buffer of chunk size, so it may use lots of memory if chunk size is set to a big value in the remote config. Maybe we could obtain a seekable reader from the upper-level abstraction instead of creating our own wrapper.423 Locked
. Need to figure out why this happens and if there is a workaround.UNLOCK
method may work to force the unlock (edit: it actually means that the file is still getting merged by NextCloud, nothing to do).TODO:
504 Gateway Time-out
then423 Locked
).rclone/backend/b2/b2.go
Lines 1214 to 1233 in 1d6d41f
rclone/backend/box/upload.go
Lines 232 to 249 in 1d6d41f
In a follow-up PR:
fs.CleanUp()
to purge all the temporary upload folders.[ ] Check the hash of the final merged file hash to make sure that the upload went well.What is the purpose of this change?
Enable chunked uploads for Nextcloud (WebDav).
See https://docs.nextcloud.com/server/latest/developer_manual/client_apis/WebDAV/chunking.html
Was the change discussed in an issue or in the forum before?
#3666
#4865
Checklist