Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inotify support #9

Closed
jpjp opened this issue Jan 6, 2014 · 116 comments
Closed

Inotify support #9

jpjp opened this issue Jan 6, 2014 · 116 comments
Labels
enhancement New features or improvements of some kind, as opposed to a problem (bug)

Comments

@jpjp
Copy link
Contributor

jpjp commented Jan 6, 2014

To notice changes more quickly.

@jpjp
Copy link
Contributor Author

jpjp commented Jan 14, 2014

The other advantage of this would probably be reduced power usage.

@calmh
Copy link
Member

calmh commented Jan 14, 2014

Yep.

@calmh
Copy link
Member

calmh commented Jan 14, 2014

Btw, this is on top of the todo list now.

@jpjp
Copy link
Contributor Author

jpjp commented Jan 15, 2014

Great!

@calmh
Copy link
Member

calmh commented Jan 18, 2014

This turns out to be not doable using the currently (or soon) available APIs in Go. See http://goo.gl/MrYxyA for some discussion of the problems; in short it doesn't scale on any of the BSDs, including Mac. Possibly something non portable could be implemented for Mac (fsevents), Linux and Windows separately but that's a significant amount of work. Hopefully someone else will do that and package it up. :) Until then, this will have to be put on the back burner.

@jpjp
Copy link
Contributor Author

jpjp commented Jan 18, 2014

Okay. Back burner for Linux too?

@johnsto
Copy link

johnsto commented Jan 21, 2014

I wonder if it's worth using https://github.com/howeyc/fsnotify in the meanwhile?

@calmh
Copy link
Member

calmh commented Jan 22, 2014

It suffers from the same problem - not using the recursiveness support that the OS might have. This means that for a directory structure like

dir/
dir/file
dir/dir_b/
dir/dir_b/file
dir/dir_c/
dir/dir_c/file

we need to put watches on (and consume one file descriptor per) dir, dir/dir_b and dir/dir_c to catch changes to the files. That doesn't scale.

@calmh
Copy link
Member

calmh commented Jan 22, 2014

For example, on my Mac the default fd limit is 256, and the number of directories in my directory to sync is significantly higher;

jb@jborg-mbp:~ $ ulimit -n
256
jb@jborg-mbp:~ $ find -L Sync -type d | wc -l
    2424
jb@jborg-mbp:~ $ 

The limit can of course be raised, but there's a hard max somewhere and the exact limits are system specific. It sometimes working, sometimes not working and sometimes working to start with but then not working later would not be awesome behavior.

@jpjp
Copy link
Contributor Author

jpjp commented Jan 22, 2014

How about syncthing let something else do the watching, then accept events from that? inotifywatch is a binary that will monitor a directory recursively. Perhaps the output could be parsed or it could be setup to pass the information to syncthing so that the inotify stuff stays separate?

@calmh
Copy link
Member

calmh commented Jan 22, 2014

Inotifywatch is a Linux thing.

@jpjp
Copy link
Contributor Author

jpjp commented Jan 22, 2014

Yes, it's just an example of an external program that feeds data to syncthing.

@calmh
Copy link
Member

calmh commented Jan 22, 2014

Actually, no, that doesn't work either. inotify isn't recursive either, so has the same scalability problems.

jb@syncer:/data/syncer$ inotifywatch -r sync/
Establishing watches...
Failed to watch sync/; upper limit on inotify watches reached!
Please increase the amount of inotify watches allowed per user via `/proc/sys/fs/inotify/max_user_watches'.

@calmh
Copy link
Member

calmh commented Jan 22, 2014

What we could do is use inotify/kqueue/etc for the case where there is only a handful of directories to sync, then fall back to scanning when it doesn't work any more. Personally I think that's pretty crappy and the only gain is to discover changes faster than the scanning interval -- the CPU load of doing the scanning is anyway pretty negligible since we've established there are only a handful of directories to start with...

@johnsto
Copy link

johnsto commented Jan 22, 2014

I'd say that's a reasonable compromise, and one could expose the directory limit for users who have raised their system's limits and want to push it further (and/or know what they're doing).

@jpjp
Copy link
Contributor Author

jpjp commented Jan 22, 2014

Yeah, I think switching behaviour is fine as long as you tell the user when you do this, otherwise they will see a sudden degradance in the functionality of syncthing and not know what has happened.

@jpjp
Copy link
Contributor Author

jpjp commented Mar 5, 2014

inotify support could be useful for rapidly changing files, it would avoid the delay.

@ncode
Copy link

ncode commented Mar 7, 2014

Hi I think that would be easier on linux systems stay with fanotify ( http://lwn.net/Articles/339253/ ), usually Anti-Virus implementations uses fanotify because you won't need a fd per directory or even create new inotify handlers for new folders on the fly. There's a good implementation of fanotify for GO ( https://godoc.org/bitbucket.org/madmo/fanotify ) I am using this one to scan uploaded files using clamav.

@nogweii
Copy link

nogweii commented Mar 18, 2014

Another tool that has already solved this problem & has implemented a protocol is Facebook's watchman so perhaps including support for that would be beneficial?

@calmh
Copy link
Member

calmh commented Mar 18, 2014

Indeed, that could be a solution.

@jedie
Copy link
Contributor

jedie commented May 14, 2014

Until there is a better solution inotify or something else: What's about to add a button "scan" in the WebGUI on every repository? Maybe beside "edit" ?

@jpjp
Copy link
Contributor Author

jpjp commented May 14, 2014

@a-m-s
Copy link

a-m-s commented May 27, 2014

I may be confused, but my reading of the inotify API suggests that it does not need a separate file descriptor for each directory. As far as I can tell, you can associate a whole list of watches against a single inotify instance and therefore a single descriptor. It is true that you need to add each subdirectory explicitly, however.

IMHO, syncthing should absolutely be using this feature, where available, especially where there are large numbers of directories to scan. Anything else is just going to annoy the hell out of me.

@calmh
Copy link
Member

calmh commented May 27, 2014

It varies per platform. On Linux, as far as I can see, you're correct and this is governed by a separate /proc/sys/fs/inotify/max_user_watches limit. On BSD:s, it's governed by the file descriptor limit (which is usually a lot harsher than what you can max out the linux watches to). On Windows, I have no clue.

@jedie
Copy link
Contributor

jedie commented May 27, 2014

For Python there exist watchdog, a platform independent solution, see: https://pypi.python.org/pypi/watchdog

quote: Supported Platforms:

Linux 2.6 (inotify)
Mac OS X (FSEvents, kqueue)
FreeBSD/BSD (kqueue)
Windows (ReadDirectoryChangesW with I/O completion ports; ReadDirectoryChangesW worker threads)
OS-independent (polling the disk for directory snapshots and comparing them periodically; slow and not recommended)

There is also a note of the FreeBSD/BSD kqueue limits, see: https://github.com/gorakhargosh/watchdog#supported-platforms

EDIT: Found this Go project: https://github.com/howeyc/fsnotify

@a-m-s
Copy link

a-m-s commented May 27, 2014

May I suggest a solution: add inotify/kqueue/etc. watchers to the most-recently-changed directories until the limit is reached. Then scan the remaining directories the hard way. When a non-watched directory is changed, replace the least-recently-used existing watch with the new one. Naturally, the user should be notified, in some suitably non-obnoxious way, that the limit has been reached (and how to increase the limit, if appropriate).

@calmh
Copy link
Member

calmh commented May 27, 2014

Guys... You're not adding new information. Look back at the posts from january and march and the links. The point is not that it's impossible, or that there is nothing cross platform. It is that the cross platform stuff doesn't scale, and the stuff that does scale is not cross platform. As far as I can tell there exists solutions that will work for Linux (requiring the user to tweak their stuff in /proc), Mac (entirely different API) and Windows (again, different API). For other BSD:s, there seems to exist nothing that scales.

Implementing it is certainly possible, but it's a significant amount of work, will require falling back to the current algorithm in a bunch of cases, and adds no significant value compared to a lot of other stuff that needs to be done.

@a-m-s
Copy link

a-m-s commented May 27, 2014

Significant for whom? This is a deal-breaker for me. It might be that syncthing is not the right solution to my problem, but it looks like it could be perfect, with this feature. I've not found anything else. I could maybe glue inotifywatch to rsync with an expect script, or something, but I was hoping for something less hacky. Anyway, sorry to have bothered you.

@calmh
Copy link
Member

calmh commented May 28, 2014

@a-m-s Now I'm curious. How is this a deal breaker? What significant visible change in syncthings behavior do you expect to see from this?

@jedie
Copy link
Contributor

jedie commented May 28, 2014

Sync will be faster, because filesystem changes must not wait for "Rescan Interval"... And rescaning the filesystem is slower.

But it seems that there is no usable cross-platform solution for recursive watcher out there. see:

@rkfg
Copy link

rkfg commented Sep 12, 2014

That's funny, I've made a shell script just for this and went here to post it and I'm late by 20 hours. Anyway, maybe it will be useful. Parameters are set in stwatch.conf which should be placed to the same directory where the script is located. It may look like this:

APIKEY=SDFJHLEURH121412W2488HRHFF984FD
ADDRESS=192.168.1.2:8080
WATCHDIRS=(~/repo1 ~/another_repo)
WATCHREPOS=(repoid1 repoid2)

The only requirements are bash and inotify-tools. You can also override the timeout, TIMEOUTCHANGE defines the "cooldown" after change is detected. Only if nothing else changes in that number of seconds, the repo is getting updated or else it waits another 5 seconds (by default). This is to prevent many updates on file copying or another bulk operation.

@parity3
Copy link

parity3 commented Nov 13, 2014

@rkfg I tweaked your script to work on cygwin with https://github.com/thekid/inotify-win. That version of inotifywait has no timeout setting so I had implement a workaround.

https://gist.github.com/parity3/6e0bd4979d9e808ecb86

I'm in agreement with above comments that path recursive change monitoring is definitely in this project's core scope. No matter how non-cross-platform, multi-faceted, difficult, or performance hungry, and no matter how much build complexity it adds, this feature needs to implemented in some fashion and bundled not as a plugin but built into the program. The good news is that this is largely independent of the rest of the program's inner workings.

'All we need' is someone proficient in integrating all existing OS file notification mechanisms so they are accessible in GO. If that's not possible, then start with these scripts and just manage the off-shoot process as a quick-and-dirty solution (that can be disabled via check box in folder config, right next to the scheduled sync time setting).

For me, I it meant a longer install time. For someone else, they've already moved on.

@thenktor
Copy link

@parity3 I agree that this is a must have feature. AFAIK it's the only way to let hard drives go to sleep mode while syncthing is running.
That's an important feature for users that have a NAS running 24/7 and still want to save some energy.

@owenson
Copy link

owenson commented Dec 15, 2014

I also wanted to chime in that this is a deal breaker for me and I consider myself a typical use-case. Often you're working on a document on your laptop or PC, save and then shutdown to run to a meeting/event/etc - presently you have to wait a few minutes before shutting down. A major inconvenience, more so on laptops.

Also, whilst I don't have the data, I can't imagine scanning every 60 seconds is particularly battery friendly.

@parity3
Copy link

parity3 commented Dec 16, 2014

FYI, no longer using the bash script I posted earlier. syncthing-inotify has evolved enough to be usable on windows. I don't see heavy resource usage so it seems to be using native fs monitoring hooks. It also handles telling syncthing specific files to rescan, but currently the documentation suggests to not fully unset rescan, but instead set it to 86400 (daily). Good enough for me.

@a-m-s
Copy link

a-m-s commented Dec 16, 2014

Scanning every 60 seconds is not the most efficient thing it could do, but since that data is almost certainly memory cached, it's not quite as painful as it could be.

inotify is certainly an improvement, but it is possible to confuse it slightly (by moving files over the top of other files, etc), so yes, a daily watch-reset and rescan wouldn't do any harm.

@Zillode
Copy link
Contributor

Zillode commented Dec 16, 2014

@a-m-s : in this case inotify would trigger two (simple) file scans, one for the removed file and one for the modified. That seems to be the intention, no?

A daily scan is still recommended because inotify might be started too late or had a bug I'm not aware of:)

@a-m-s
Copy link

a-m-s commented Dec 16, 2014

@Zillode When you move a file over another, inotify only gives you the move event; the overwrite event is implicit. This probably isn't a problem for syncthing, which will just push up the new contents of that file, but other users of inotify hate it (the main problem is keeping a count of how many files there are, without keeping a detailed record of what their names are). There may be other corner cases syncthing does care about.

@Zillode
Copy link
Contributor

Zillode commented Dec 16, 2014

@a-m-s : With "other users of inotify" I guess you mean inotify application outside of the context of syncthing?
If you can figure out an example related to this project I'm all ears and happy to fix it.

So as I read it, due to the fact that the combination of syncthing and syncthing-inotify is exactly about keeping track of all filenames that have been changed, there should be no problem?

@a-m-s
Copy link

a-m-s commented Dec 16, 2014

@Zillode Yes, I mean other projects that use inotify. I don't know of a specific flaw that hurts syncthing, apart from the OSX-specific "too many open files" problem.

If syncthing keeps an in-memory list of all files it knows about, which I'd imagine it does, then all is probably good.

@jkaberg
Copy link

jkaberg commented Apr 7, 2015

@calmh @AudriusButkevicius @Zillode Have you guys ruled out ionotify or is it something that might be implemented "later down the road"? If the later, would you humor me with an rough ETA guess? 2015, 2016, 2017+?

I've already seen the various implementations using the API endpoints, and I'm sure they work but IMO Syncthing is worth something better 😃

@AudriusButkevicius
Copy link
Member

The original problem discussed in this ticket still exists.

OS X is bad at performing file watches, and requires tweaking kernel settings to be usable, which is not an option for average joe, hence I guess why it's still not part of syncthing.

@rkfg
Copy link

rkfg commented Apr 7, 2015

How do they work around it in Dropbox? It's available for OS X.

@calmh
Copy link
Member

calmh commented Apr 7, 2015

Probably mainly by giving most users a small enough quota that it doesn't happen, and then switching to scanning if it does... But, the plan has always been to integrate syncthing-inotify when it works well enough in most case. I think it may be getting there.

@rkfg
Copy link

rkfg commented Apr 7, 2015

They have a 100 Gb paid plan and scanning this much data, if it consists of many small files, may be pretty expensive. It could be interesting to investigate their methods further as Dropbox is de-facto the leading (or at least the most popular) product in its field. Maybe this points out that the file watches are used all the time?

@calmh
Copy link
Member

calmh commented Apr 7, 2015

Actually, on Mac at least, the native fseventsd API can be used without any restriction on number of watched files (as far as I can tell). Windows and Linux probably have similar API:s.

@AudriusButkevicius
Copy link
Member

Yeah but fsevents requires linking against CoreServices, hence requiring a special build environment.

@jkaberg
Copy link

jkaberg commented Apr 7, 2015

Glad to hear it @calmh , looking forward to that day :-)

@lucb1e
Copy link

lucb1e commented Aug 7, 2015

Was looking forward to moving away from the proprietary btsync, but this is a deal breaker for me as well. Nice that there is a workaround, but it should totally have been done in the main application.

I'm having a hard enough time moving people at school away from Dropbox, let alone also going "oh and if you don't want a performance hog and battery drain every single minute, you have to install and configure this external thing as well [which every other file sync tool on the planet has built-in]". Not being able to turn others over means I can't use this either.

The issue I read in the thread is that there is no good cross-platform thing that scales, but it's not that hard to write it for one platform (so that the code structure is there) and then let pull requests add and maintain other platforms. A few people in this thread offered to write code and I see various implementations to use the API. The desire and support is there. As long as automatic fallback to rescanning works well, there is no regression.

@owenson
Copy link

owenson commented Aug 7, 2015

Just an informational comment to @lucb1e : I tried syncthing again wondering if the scanning was actually an issue or not. It turns out there's another deal breaker, it cant punch through firewalls that effectively ruling it out completely at my workplace (some ports blocked but not all and btsync works fine).

@Schroedingers-Cat
Copy link

@lucb1e Why not use https://github.com/canton7/SyncTrayzor or https://github.com/syncthing/syncthing-gtk - both have inotify support and are easier to set up as they have an installer/repo.

@calmh
Copy link
Member

calmh commented Aug 8, 2015

To be clear, a working solution will always be welcomed and merged. Syncthing-inotify is the closest I know of. Any effort towards putting in whatever finishing touches are necessary and getting it merged is welcome. Cross platform doesn't need to mean "works perfectly everywhere" - supporting Windows and Linux would cover almost 90% of the current user base and is a good start.

However, discussing it further here does no good whatsoever.

@syncthing syncthing locked and limited conversation to collaborators Aug 8, 2015
@calmh calmh modified the milestone: v0.14.51 Sep 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New features or improvements of some kind, as opposed to a problem (bug)
Projects
None yet
Development

No branches or pull requests