-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async file operations #24509
Comments
I see more and more reports of people having discrepancies due to PHP timeouts. Unfortunately we can't rollback a FS change from a killed PHP process. However if we had some kind of journal or operations queue, it should be possible to either redo or rollback the last operation. This all fits well with the "async file operation" concept. CC @butonic |
If one day we ever go the "Webdav sync" route, that one will need a table containing all changes. |
Some further ideas:
|
or well, get rid of the filecache... |
I tried renaming "test" to "test2" with a lot of children inside. In theory it's only about renaming "test" to "test2" without touching ever children. Even the file ids stay the same. |
Maybe we do need closure tables to get rid of the "path" column: #4209. While closure tables might not increase regular read speed, if it can help solve the timeout issues on long-running MOVE or DELETE then they might be worth it. Data loss 🔔 |
A good read but probably not useful as it will likely not work on shared hosters: http://symcbean.blogspot.de/2010/02/php-and-long-running-processes.html |
If we do make a request async (like DELETE or MOVE), we could use this approach: http://restcookbook.com/Resources/asynchroneous-operations/ But not sure how standard Webdav clients would react... Or we'd need to optimistically tell them that we succeeded even though we just queued the request. |
Oh oh, looks like 202 might be acceptable, see https://msdn.microsoft.com/en-us/library/aa142865(v=exchg.65).aspx which says that it could be used for DELETE. |
I hacked Sabre locally for a quick test:
|
Large file uploads also require this. The assemble step can take a long time. Not only the file chunks need to be assembled, but also the antivirus scan will kick in or any other postprocessing. IMO we should show the upload is completed and mark the file as 'in postprocessing'. Probably even exposing this in the web interface. A PROPFIND will be able to get the metadata, but actually accessing the file should cause a 403 Forbidden together with a Retry-After header? Marking a file as 'in postprocessing' may lead to a new lifetime column, eg to also mark files as deleted. Hm what do we have: Receiving chunks, assembling file, antivirus scan, content extraction (for workflow), indexing (for search), thumbnail generation, deleted. Those can roughly be separated into where is the file stored and what is done with the content. In that light for federated shares a status like 'cached locally' would make sense. But I don't know if it makes sense to fit all these into a single column. It does make sense to have a common pipeline for files that applications can then hook into ... hm need to think on this further. 00006520 |
I think the key point here is to have 2 different processes:
If I remember correctly, there is a trick we can use to spawn the process B in a async way without cli access although I don't remember if there are some caveats to take into account. Taking into account what we have, we'll need to expose at least one additional endpoint for each async operation we want and at least an additional endpoint to check the operation status. For example, for uploads we'll have the sync upload (we can use whatever we're doing right now and the same endpoint), and the async upload which will trigger the sync one at some point. We'll need additional columns / tables in the DB to track the status of the sync operations so we can poll for changes and check when the sync operation is finished. Although this doesn't seem too intrusive, we'll need to take into account the sync operations needs to notify its status somehow so the users can check the status periodically.
Note that these endpoints don't need to rely on webdav, so worst case we can use these async operations ourselves even though 3rdparty software would still use the sync ones through webdav. |
@butonic FYI we had the asynchronous PUT implemented in a private fork of the client and server some time ago. It worked by returning (as header) a "poll URL" from the PUT of the latest chunk. The client would (after having uploaded all chunks) check a poll URL every few seconds to see if the file was uploaded the backend. Contact @ogoffart or me if you want more info and/or sources. |
All these ideas are pointless as long as we have no active job execution mechanism in place. |
There is also an additional challenge. While blocking access to a pending file is one thing, what happens with external storage ? It seems that we need to first upload the pending file to some local invisible temporary space in which it is assembled/virus-scanned, etc, and then upload it to the final storage. But that would cause delays. Or upload it as a temporary part file like we already do. Part files are invisible to the clients. |
We could also use "part folders" for some operations: #13756 |
Some file operations, especially deleting from external storage can take a long time as it needs to download the files to trash in the background.
Also there were talks about async PUT: #12097
So opening this ticket to discuss the possibility of having asynchronous file operations.
The good part is that we already have file locking, so it might be possible to leverage this to avoid concurrency issues.
Also, need to make sure we stay compatible to Webdav. So async operations would have to be custom-Webdav/extensions..
@DeepDiver1975
The text was updated successfully, but these errors were encountered: