Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get remote update detection issues for files_external sorted out #11797

Open
butonic opened this issue Oct 27, 2014 · 49 comments
Open

Get remote update detection issues for files_external sorted out #11797

butonic opened this issue Oct 27, 2014 · 49 comments

Comments

@butonic
Copy link
Member

butonic commented Oct 27, 2014

Using mtime to detect changes in external storages has been the wrong approach. None of Amazon/Ceph s3 #11652, Swift #8633, ftp #5655, ftps/proftpd #9630, smb on FAT correctly propagates mtime changes up the directory tree, so our hasUpdated() fails and we have yet to implement backend specific implementations that work. We need to change the current external storage scanning / cache implementation to remove the necessity of a correct mtime propagation. It will improve the performance and allow updating the filecache by either browsing the mount point or periodically running a background job.

We have to realize that checking the mtime on the root of the external storage does not reliably tell us if anything in it updated. In fact that is only the case on common linux filesystems and some webdav servers (eg ownCloud). Since we need to identify changes inside external storages to provide a correct synchronization experience, a purely pull based implementation would have to scan the complete external storage whenever a sync happens. A push based implementation can be achieved when the external storage has a notification mechanism that can be used to trigger a filecache update on demand (The only system I am currently aware of being capable of this is ownCloud with webhooks enabled ... and maybe github, if there were a backend for ownCloud). The next best thing to these two extremes, the former being a performance nightmare, the latter being mostly unavailable, is IMHO a periodically updated filecache.

Instead of always querying the external storage for stat based storage operations all backend implementations should use the filecache and even an in memory cache (to save db queries), until a write operation is executed. The filecache is only updated by querying the external storage whenever:

  • The external storage root is capable of correctly telling us it changed (eg. ownCloud webdav)
  • The user updates a file through ownCloud (propagating the mtime up in the filecache)
  • A background job periodically scans the external storage, updating the filecache
  • The fallback: a user browses a folder (can be turned off when the storage is exclusively accessed by owncloud)
  • The wish: a push notification is received (to implement)

The periodic background scans should be backend specific. A new scanner for s3 is in #11712 A similar scanner should be possible for swift.
A fallback implementation must traverse the whole directory tree and not rely on the mtime.

Note that all of this is only necessary when the external storage is not exclusive to ownCloud access.
Note that the google backend uses the changes API to implement hasUpdated().

Would also fix #5036 and maybe several other issues.
Still does not address #11533

@butonic
Copy link
Member Author

butonic commented Oct 31, 2014

Not all external filesystems have a way of telling us when changes have been made on their side. To determine if we need to rescan a storage we compare the mtime of the external roof to our corresponding filecache entry for it. The storages can change how the check is done by overwriting hasUpdated().

backend hasUpdated() behavior reliable mtime?
local default not on FAT
dav checks etag, then mtime server dependent
ftp default server dependent
sftp default server dependent
amazon/ceph s3 default, PR #11712 scans object list no
dropbox default, could use delta API ?
google uses updates API ?
smb true for root, otherwise mtime not for shares
swift mtime in root and subfolder depth 1 no

An mtime is considered reliable when the mtime of a folder updates when a file or folder inside is updated.

As you can see nearly no backend can rely on the mtime. In fact, the only working examples are modern filesystems when using the local storage backend or mounting a remote ownCloud via webdav. All other scenarios will not pick up files that have been added to the remote storage without going through ownCloud.

@PVince81
Copy link
Contributor

For Dropbox I had this old WIP to use Dropbox's hash argument when retrieving changes: #6069
But it still requires further work.

@PVince81
Copy link
Contributor

@RealRancor has suggested to at least document it and the workarounds here: owncloud-archive/documentation#763

@te-online
Copy link

Well, the »workaround« is a non-option for all shared-hosting user without shell-access (like me), is it?

@ghost
Copy link

ghost commented Jan 16, 2015

@te-online
You still can use web-cron (see the docs) which don't need shell-access

@te-online
Copy link

Okay, I didn't know there is a difference between ajax and webcron (amount of tasks done with one call – if I see it right). A reference to the web-cron section (http://doc.owncloud.org/server/7.0/admin_manual/configuration/background_jobs_configuration.html?highlight=cron#webcron) could be added to the documentation of this topic as well.

@ghost
Copy link

ghost commented Jan 16, 2015

Hi,

the main difference is that AJAX is only executed when a user is browsing the installation where (web)cron is running regularly without user-interaction.

Ref for another documentation part for this: owncloud-archive/documentation#764

@ownclouders
Copy link
Contributor

Hey, this issue has been closed because the label status/STALE is set and there were no updates for 7 days. Feel free to reopen this issue if you deem it appropriate.

(This is an automated comment from GitMate.io.)

@ownclouders
Copy link
Contributor

Hey, this issue has been closed because the label status/STALE is set and there were no updates for 7 days. Feel free to reopen this issue if you deem it appropriate.

(This is an automated comment from GitMate.io.)

@VikDex
Copy link

VikDex commented Sep 10, 2018

Hello, what is the best workaround for SMB shares to get files updated seamless to the clients?
At the moment its confusing to find a reliable solution in this topic when files changed remotely on the SMB.

In fact, we detected the following behaviour:

1
Owncloud Desktop - missing files for the last 3 months in local sync folder
Even if I click on "Sync Now" in the desktop client doesn't sync the changed data from SMB

2
Solution: login to owncloud with the user OR access the app on the iphone with the user

3
Suddenly the Desktop Client sync all the data from the SMB share correctly (without any intervention!)

Do you have an idea?
there are three possibilites to access owncloud data:
browser - works instant
app - works instant
desktop client - syncs changed data after access the data via browser / app

In fact the successful sync of changed SMB data in the desktop client depends on access the owncloud instance via browser or app in the first step.
Maybe it's a good idea to add a feature to the desktop sync that sorts this "lack of feature" out ?

we are not using the commercial license, i guess with the wnd:listener the problem does not exist:
https://doc.owncloud.org/server/10.0/admin_manual/enterprise/external_storage/windows-network-drive_configuration.html

[edit]
workaround for me at the moment: connect the smb shares with user and password (not session based) and setup up a regular filescan with php /var/www/owncloud/occ files:scan --all ..

@guruz
Copy link
Contributor

guruz commented Oct 15, 2018

@VikDex I think files scan is indeed the recommended workaround.

@hodyroff
Copy link
Contributor

Indeed, this is solved for WND via Enterprise Edition. This is also solved for OneDrive where we found an API which provides the change notification, again via Enterprise Edition. We continue to look at the issue for other file systems and approaches ... and yes, the file scan works in absence of this but has of course performance issues.

@hodyroff
Copy link
Contributor

If you want to try out Enteprrise Edition, here is the link to the trial: https://marketplace.owncloud.com/bundles/enterprise_apps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests