Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time Filesystem Notifications - WIP #2807

Closed
wants to merge 21 commits into from

Conversation

Projects
None yet
7 participants
@plouj
Copy link
Contributor

commented Feb 28, 2016

This is my attempt at integrating inotify into syncthing.

Features

  • Buffer events for 500ms (fastNotifyDelay) before telling the folder model about them
  • Going into deep sleep if the filesystem isn't busy
  • Prevent syncthing from doing full scans (except for the initial one) if real-time notification is enabled
  • Filtering ignored files and staying on top of config changes
  • Smart aggregation of changes
  • Allow enabling/disabling without syncthing restart on a per-folder basis
  • Ignore as many events originating from syncthing itself as possible; can already skip some, more testing needed
  • Integrate with rofolder
  • Make the godeps (https://github.com/zillode/notify/) work; Help please!
  • Maximize open file limit
  • Gracefully deal with file handle errors
  • Delay a full scan for much longer (longRescanIntervalS, configurable, defaults to 3 hours) if realtime notification is available
  • Extensive testing on Linux
  • Extensive testing on Windows
  • Extensive testing on OSX

My current approach is to make as little changes and add as little dependencies to rwfolder/rofolder as possible. To that end I spawn an FsWatcher which communicates back to the syncthing model via fsWatchChan.

FsWatcher listens to events from the filesystem on the fsEventChan and stores them in a map until its notifyTimer fires at which point the folder model is asked to rescan the changed paths. I copied the flush reset logic from st-inotify which slows down the notifyTimer to fire only as fast as slowNotifyDelay if there aren't any filesystem events; saves CPU cycles and makes laptop friendly. FsWatcher also subscribes to the ItemFinished events from the model to avoid telling the model to scan files that it has just modified; this doesn't fully work yet.

@canton7

This comment has been minimized.

Copy link
Member

commented Feb 28, 2016

You can't disable scanning completely. On windows at least, events will be lost if a lot happen at the same time. On top of that, I'm sure there will be edge cases where events won't work at all.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 28, 2016

I haven't even opened the diff tab (and won't do so until I'm back; and then this will be a prime candidate for reviews.syncthing.net, we'll sort that out) but I'll emphasize what canton7 says - don't disable the scanner. Unless I hear strong arguments to the opposite, it seems to be that the most win will be in implementing something very similar to what the current external inotify implementation does - drive the scanner and delay it, while letting it run now and then anyway in the absence of events just to be sure.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 28, 2016

Oh and 💯 ❤️ for actually going ahead with integrating this, I look very much forward to it and will work with you to get it done.

watcher.notifyTimer = *time.NewTimer(watcher.notifyDelay)
defer func() {
watcher.notifyTimer.Stop()
}()

This comment has been minimized.

Copy link
@AudriusButkevicius

AudriusButkevicius Feb 28, 2016

Member

just defer watcher.notifyTimer.Stop()

This comment has been minimized.

Copy link
@plouj

plouj Feb 28, 2016

Author Contributor

Done.

}

func (watcher *FsWatcher) watchFilesystem() {
watcher.notifyTimer = *time.NewTimer(watcher.notifyDelay)

This comment has been minimized.

Copy link
@AudriusButkevicius

AudriusButkevicius Feb 28, 2016

Member

Not sure why you need the pointer deref, why can't you just hold a pointer on the struct?

This comment has been minimized.

Copy link
@plouj

plouj Feb 28, 2016

Author Contributor

Well, it was easier to deref than change the types throughout at the time of writing. But I went ahead and changed relevant code to use pointers instead.


func setupNotifications(path string) (chan notify.EventInfo, error) {
c := make(chan notify.EventInfo, maxFiles)
if err := notify.Watch(path, c, notify.All); err != nil {

This comment has been minimized.

Copy link
@AudriusButkevicius

AudriusButkevicius Feb 28, 2016

Member

Probably need to close c if you hit this branch.

This comment has been minimized.

Copy link
@plouj

plouj Feb 28, 2016

Author Contributor

The way I understand it only the sender should close a channel even if I'm the one who creates it.

This comment has been minimized.

Copy link
@plouj

plouj Feb 28, 2016

Author Contributor

What I can do is call notify.Close() on it, although that doesn't seem to close the channel either.

This comment has been minimized.

Copy link
@AudriusButkevicius

AudriusButkevicius Feb 28, 2016

Member

Yeah, but if you return nil, nobody has a chance to close it.

func (watcher *FsWatcher) storeFsEvent(event notify.EventInfo) {
newEvent := watcher.newFsEvent(event.Path())
if newEvent != nil {
watcher.addEvent(*newEvent)

This comment has been minimized.

Copy link
@AudriusButkevicius

AudriusButkevicius Feb 28, 2016

Member

This calls a function which only performs 1 line of code, so might as well avoid the overhead of calling a function.
Also, unsure about the pointer deref here too.

This comment has been minimized.

Copy link
@plouj

plouj Feb 28, 2016

Author Contributor

I usually start by writing functions with informative names and optimize later. This has been changed already.

func (watcher *FsWatcher) skipPathChangedByUs(event events.Event) {
path := event.Data.(map[string]interface{})["item"].(string)
l.Debugf("Skipping notification for finished path: %s\n", path)
watcher.removeEventIfPresent(path)

This comment has been minimized.

Copy link
@AudriusButkevicius

AudriusButkevicius Feb 28, 2016

Member

This can just be swapped with delete(watcher.fsEvents, path) to remove the call overhead.

I think you need to be a bit more clever here.
Have a list of files you are currently ignoring, and add them to the list upon events.ItermStarted and remove them upon events.ItemFinished.

This comment has been minimized.

Copy link
@plouj

plouj Feb 28, 2016

Author Contributor

Thanks, I'll keep this list of ignored paths suggestion in mind as I try to get a better understanding of what ItemStarted and ItemFinished events mean.

@plouj plouj force-pushed the plouj:plouj-fswatcher branch from bce839e to 55787d7 Feb 28, 2016

@plouj

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2016

Regarding disabling of full scans: The way I understand it the external inotify implementation actually does disable the full scans indefinitely by repeatedly delaying them; at least the ones that happen due to scanTimer firing. However, it does sometimes tell syncthing to scan a folder with an empty list of subs which effectively causes a full rescan. My current plan is to go in this direction: disable scanTimer-based full scans and try to minimize full scan requests to situations that really require them, eg: too many files changed, event aggregation decides it's best to scan the whole folder, dropped events, unreliable notification implementation, etc.

@canton7

This comment has been minimized.

Copy link
Member

commented Feb 28, 2016

The way I understand it the external inotify implementation actually does disable the full scans indefinitely by repeatedly delaying them; at least the ones that happen due to scanTimer firing.

If it does do this, then it has issues and we need to improve on it. Inotify on Windows has limitations - anyone who's worked with it realises this.

To be honest, since we're integrating this into Syncthing, we've got the opportunity to improve the user experience. At the moment it's not at all obvious that syncthing-inotify delays scans (and indeed other inotify extensions don't do this). My vote would be for defaulting to a scan interval of something like 1 hour if inotify is enabled, or whatever we default to now if not. This is reflected in the UI, and the user can change this through the UI.

We also need to make sure that running out of inotify handles is properly handled. At the moment this case is only encountered by people who are explicitly trying to get inotify working, and who are well placed to troubleshoot it. If we start bringing inotify to the masses, we're going to get a lot of confused newbies who don't know or are what inotify handles are, but who are going to see unexpected behaviour (especially if we disable scans...).

@calmh

This comment has been minimized.

Copy link
Member

commented Mar 2, 2016

I think we should have something like two defaults for the scan interval... Possibly two user configurable values, even. In the first case we attempt to use fs notifications and a long scan interval. If we run out of fds, watches, frobbles or whatever magic is needed we fall back to regular scanning on a shorter interval (and log and indicate something useful in the GUI to signify we are not using notifications).

@calmh

This comment has been minimized.

Copy link
Member

commented Mar 3, 2016

@plouj I invited you to the github team so you can push to the syncthing repo. Once you accept that, push this to a branch here and I'll set up a review on reviews.syncthing.net. You might want to create an account there as well in the meantime. :)

(I've yet to sort out being able to review pull requests there straight up, but we'll get there.)

@calmh

This comment has been minimized.

Copy link
Member

commented Mar 4, 2016

https://reviews.syncthing.net/cru/CR-7

@plouj You'll need to do the reset-password dance with reviews.s.n.

@calmh calmh added the in-review label Mar 5, 2016

@plouj plouj force-pushed the plouj:plouj-fswatcher branch from 064af50 to 462a5da Jul 30, 2016

@plouj

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2016

I've been bringing the branch up to date with the latest master and working on it. Right now I can't figure out how to give the new config variable LongRescanIntervalS a default value when a folder is created. Also, what do I need to do to add it to the advanced page? I tried editing gui/default/index.html, but there's probably a step I'm missing in re-generating something?

@plouj plouj force-pushed the plouj:plouj-fswatcher branch from 462a5da to 6bd7f09 Jul 30, 2016

@plouj

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2016

Even with the defaultConfig change I still get rescanIntervalS="60" longRescanIntervalS="0" ignorePerms="false" in config.xml for any newly added folder.

@plouj plouj force-pushed the plouj:plouj-fswatcher branch from 6bd7f09 to d0468a2 Jul 30, 2016

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Jul 30, 2016

There are defaults when adding a folder somewhere in JS land.
For existing ones you could do a config migration.

Alternatively, if value is 0, assume it's not set in which case replace it with default in the places where you use it.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Jul 30, 2016

go run build.go assets rebuilds the assets

@plouj plouj force-pushed the plouj:plouj-fswatcher branch from d0468a2 to 76b6d5b Aug 3, 2016

@calmh

This comment has been minimized.

Copy link
Member

commented Aug 14, 2016

FWIW I would suggest going with a slightly increased scan interval (say, 60 minutes?) by default and letting inotify run causing scans "in parallel". Not all changes are picked up by inotify, and there can happen things to cause the scan to fail, so we still want the periodic scans with a not too long period in between. That also makes it easier, possibly, as it's closer to what already happens with the external inotify driver?

@plouj plouj force-pushed the plouj:plouj-fswatcher branch from 76b6d5b to 9bc12b9 Aug 16, 2016

@plouj

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2016

I decreased the full scan interval to 60 minutes (from 3 hours).

I don't really have the motivation to figure out what's causing the notify package to not build: https://build.syncthing.net/job/syncthing-pr/2658/console @Zillode?

@Zillode

This comment has been minimized.

Copy link
Member

commented Aug 16, 2016

The build you link to is gone. However the Jenkins-Github link show:

==================
WARNING: DATA RACE
Write by goroutine 129:
  github.com/syncthing/syncthing/lib/model.(*rwFolder).Serve()
      c:/jenkins/workspace/syncthing-pr-windows/src/github.com/syncthing/syncthing/lib/model/rwfolder.go:186 +0x3bf
  github.com/syncthing/syncthing/vendor/github.com/thejerf/suture.(*Supervisor).runService.func1()
      c:/jenkins/workspace/syncthing-pr-windows/src/github.com/syncthing/syncthing/vendor/github.com/thejerf/suture/suture.go:556 +0x72

@plouj plouj force-pushed the plouj:plouj-fswatcher branch from 9bc12b9 to 2da1aef Aug 16, 2016

@plouj

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2016

The issue I saw before was:

...
+ go run build.go -goarch amd64 deb
rm -r syncthing
github.com/syncthing/syncthing/lib/upgrade
github.com/syncthing/syncthing/cmd/syncthing
+ fakeroot sh -c 'chown -R root:root deb ; dpkg-deb -b deb .'
dpkg-deb: building package 'syncthing' in './syncthing_0.14.3+14-g7a30161_amd64.deb'.
+ ./build.sh all
Building darwin-amd64
rm -r syncthing
github.com/syncthing/syncthing/lib/auto
github.com/syncthing/syncthing/lib/config
github.com/syncthing/syncthing/vendor/github.com/zillode/notify
github.com/syncthing/syncthing/lib/scanner
# github.com/syncthing/syncthing/vendor/github.com/zillode/notify
vendor/github.com/zillode/notify/watcher_fsevents.go:51: undefined: stream
github.com/syncthing/syncthing/lib/nat
github.com/syncthing/syncthing/lib/pmp
github.com/syncthing/syncthing/lib/upnp
github.com/syncthing/syncthing/lib/connections
exit status 2
exit status 1
Build step 'Execute shell' marked build as failure
...

But yes, I also need to fix the race.

@Zillode

This comment has been minimized.

Copy link
Member

commented Aug 16, 2016

vendor/github.com/zillode/notify/watcher_fsevents.go:51: undefined: stream

That's ok, Jenkins is trying to build Darwin-specific code on a Debian chroot. FSEvents depends on the FSEvents Core Services Framework which is only available on an OSX machine.

For ST-Inotify, a full cross-platform build is only possible from an OSX Machine (Darwin, Linux and Windows).

@imsodin

This comment has been minimized.

Copy link
Member

commented Jan 30, 2017

Does anyone see the reason the build step is failing? I can't find anything except the messages at the end.

@plouj

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2017

I'm seeing these build errors in the notify library:

github.com/syncthing/syncthing/vendor/github.com/zillode/notify
vendor/github.com/zillode/notify/event_fen.go:23: undefined: fileAccess
vendor/github.com/zillode/notify/event_fen.go:24: undefined: fileModified
vendor/github.com/zillode/notify/event_fen.go:25: undefined: fileAttrib
vendor/github.com/zillode/notify/event_fen.go:26: undefined: fileDelete
vendor/github.com/zillode/notify/event_fen.go:27: undefined: fileRenameTo
vendor/github.com/zillode/notify/event_fen.go:28: undefined: fileRenameFrom
vendor/github.com/zillode/notify/event_fen.go:29: undefined: fileTrunc
vendor/github.com/zillode/notify/event_fen.go:30: undefined: fileNoFollow
vendor/github.com/zillode/notify/event_fen.go:31: undefined: unmounted
vendor/github.com/zillode/notify/event_fen.go:32: undefined: mountedOver
vendor/github.com/zillode/notify/event_fen.go:32: too many errors

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Jan 30, 2017

I suspect because we need native builders as this change kills crosscompiling due to platform deps.

@imsodin

This comment has been minimized.

Copy link
Member

commented Jan 30, 2017

So is this the same as #2807 (comment) and can also be fixed with xgo (#2807 (comment))?

There is nothing (as far as I can see) that changed regarding platforms or the files that now fail between the old and new notify version: Diff between old and new hash

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Jan 30, 2017

No idea, need to look at this via something that is not a phone, but I suspect dependency update to add solaris support?

@calmh

This comment has been minimized.

Copy link
Member

commented Jan 30, 2017

Yeah, those symbols come from cgo for Solaris so can't be cross compiled. This is not a big deal in the grand scheme of things, ignore that for the moment and we can fix it up when it's merge time. Or comment out the Solaris build in jenkins/build-linux.sh in the meantime.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 5, 2017

What's the general status here? The Solaris build failure we can ignore - there are a couple of unchecked checkboxes, but should I get involved in testing and nitpicking at this point or where are we :)

@imsodin

This comment has been minimized.

Copy link
Member

commented Feb 5, 2017

The smart aggregation that is in this PR is only partially working. I rewrote that part and did additional improvements on timing/delays - I see I never linked to that here: plouj#15 I probably should describe what exactly I want to achieve with this code, I am not sure whether it is obvious from code/comments/debug. However I don't have much time right now (exam coming up). I would still welcome any feedback on it.
I intentionally delayed integration into all folder types because of #3780. The mentioned PR also includes some tests (mainly aimed at aggregation, but runs an entire FsWatcher, so also tests other aspects). The enabling/disabling without restart aspect is probably not something in the scope of this PR, as any folder change (whether it concerns fswatcher or not) restarts the entire folder currently (right?).

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 5, 2017

OK, gotcha. Yeah, it just needs to start/stop with the folder. But I guess it's important that it does that, so that it releases the "watches" or whatever it is it uses when the folder stops.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 5, 2017

@st-jenkins retest this please

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 5, 2017

I was hoping all the things would turn green, but they didn't. The good news is that none of those test failures are your fault, so it's in principle green. 👍 😕

@imsodin

This comment has been minimized.

Copy link
Member

commented Feb 12, 2017

@plouj @AudriusButkevicius @calmh @canton7 @anyoneelse:
I added some explanation in prose to plouj#15 (not sure how helpful it is). From my side I am done with that and thus waiting for comments/questions now. Still no rush on my part as I am busy until the end of the coming week, but if you have some time, I am looking forward to your reviews.

What it introduces in short:

  • aggregation
  • deactivatable and configurable delay before scanning
  • timer refactor (mostly for first point)
  • Some tests
@calmh

This comment has been minimized.

Copy link
Member

commented Feb 12, 2017

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Feb 12, 2017

I think we should merge the functionality, but not enable it by default atleast for the first release by default, and have a check box. People can then use the checkbox and stop their local instances, and over time enable it by default.

@imsodin

This comment has been minimized.

Copy link
Member

commented Feb 12, 2017

I agree and I would even generally disable it for existing folders. This solves the problem for external services doing fs watching by putting them in control. Either the user drops the external service and activates manually or the service drops fs watching by itself and activates it in syncthing.

For new folders it might be a good idea to (eventually) activate it by default, as it should reduce scanning load in combination with a longer full rescan interval while syncing quicker.

@canton7

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

As the author of one of these external tools, I've got the opposite view: a weird crossover phase where some folders are handled by an external watcher and some by Syncthing is going to be a confusing nightmare for everyone. I think moving responsibility for filesystem notifications to Syncthing as quickly as possible is the right thing to do, even if there's a slightly painful cross over period.

In other words, enable it by default for all new and existing folders. External tools can disable their support, confident that Syncthing will take over.

Only enabling it for new folders is an impossible situation to manage. External tools can't remove their support, and they've somehow got to keep track of which folders are new, for which support should be disabled.

I don't think there's that much harm in enabling it by default: at worst we stat some files twice (but I don't think we can end up hashing some file twice?). They shouldn't end up conflicting with each other in any meaningful way.

On top of this we could watch for posts to the scan endpoint when filesystem notifications are enabled and show one of this acknowledgeable notifications saying "hey, you've got filesystem notifications coming from an external tool, but Syncthing does this by itself now, so you might want to disable the external to and configure Syncthing to your liking, or disable it in Syncthing".

The only place where enabling it for all existing folders would be harmful is if someone isn't using an external tool, and is in a situation where inotify would do bad things. Those people exist, but I think they're in the minority.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

Actually, fair point, the worse case of double scan is not that bad.

@calmh
Copy link
Member

left a comment

Here are some initial review comments. Trying to just look at bigger picture things for the moment. We will need some sort of integration towards the REST API and the GUI for telling whether filesystem notifications are available at all (they are not on all OSes I think?) and currently running or not. Also not sure what happens when things start up fine, but we run into "out of watches" or similar errors later?

@@ -369,6 +369,10 @@ <h4 class="panel-title">
<th><span class="fa fa-fw fa-refresh"></span>&nbsp;<span translate>Rescan Interval</span></th>
<td class="text-right">{{folder.rescanIntervalS}} s</td>
</tr>
<tr ng-if="folder.longRescanIntervalS != 3600">

This comment has been minimized.

Copy link
@calmh

calmh Feb 13, 2017

Member

I don't think we should expose this here, especially not with a cryptic camelcased name. See also below about the settings modal.

@@ -73,6 +73,13 @@
<span translate ng-if="!folderEditor.rescanIntervalS.$valid && folderEditor.rescanIntervalS.$dirty">The rescan interval must be a non-negative number of seconds.</span>
</p>
</div>
<div class="form-group" ng-class="{'has-error': folderEditor.longRescanIntervalS.$invalid && folderEditor.longRescanIntervalS.$dirty}">

This comment has been minimized.

Copy link
@calmh

calmh Feb 13, 2017

Member

I don't think we should expose this here. I would suggest something like a checkbox "Use Filesystem Notifications" in addition to the existing "Rescan Interval" box. When checked, rescan interval is greyed out. The long rescan interval lives in the advanced/hidden options only.

Challenge: GUI needs to know whether filesystem notifications are supported at all, in order to show the checkbox...

"syscall"
)

func interpretNotifyWatchError(err error, folder string) error {

This comment has been minimized.

Copy link
@calmh

calmh Feb 13, 2017

Member

I'd typically expect this as an isWatchesTooFew(err error) bool that could be used appropriately by someone inspecting a returned error, instead of something that transforms the error, cf. os.IsNotExist and similar.

fsEventChan, err := watcher.setupNotifications()
notifyModelChan := make(chan FsEventsBatch)
watcher.notifyModelChan = notifyModelChan
if err == nil {

This comment has been minimized.

Copy link
@calmh

calmh Feb 13, 2017

Member

Unless there is very good reason otherwise, please always prefer to do the err != nil check with early return and letting the success path be the default one.

if err == nil {
watcher.WatchingFs = true
watcher.fsEventChan = fsEventChan
go watcher.watchFilesystem()

This comment has been minimized.

Copy link
@calmh

calmh Feb 13, 2017

Member

This whole thing should probably be a suture.Service instead, at least in the long run. Don't rush to change that until it's one of the last things remaining, probably.

func (watcher *FsWatcher) Stop() {
if watcher.WatchingFs {
watcher.WatchingFs = false
watcher.stop <- struct{}{}

This comment has been minimized.

Copy link
@calmh

calmh Feb 13, 2017

Member

I haven't proved it yet but gut feeling says there is probably a data race here as the bool is checked in a bunch of places probably running different goroutines. Also - can a given watcher be restarted once stopped? Otherwise a close of the stop channel should be enough and more idiomatic I think.

Someone from the outside interested in whether the watcher is stopped or not can call

func IsStopped() bool {
    select {
        case <- watcher.stop:
            return true
        default:
            return false
    }
}

This comment has been minimized.

Copy link
@imsodin

imsodin Feb 13, 2017

Member

Currently an FsWatcher instance is never restarted, as folders are restarted on any configuration change. However by exposing StartWatchingFilesystem and Stop it is implied that it could be restarted (for example if in the future syncthing starts to only restart/reset the parts that are actually needed on configuration change). The WatchingFs bool is only used in Stop and StartWatchingFilesystem, so currently there should be no race, but it could be created by running those exported functions concurrently.

Your proposition is to remove the WatchingFs member of FsWatcher and use the stop channel to determine whether watching is currently active, correct?

@@ -288,13 +302,20 @@ func (f *sendReceiveFolder) Serve() {
default:
l.Infoln("Completed initial scan (rw) of", f.Description())
close(f.initialScanCompleted)
if fsWatcher.WatchingFs {
f.delayFullScan()

This comment has been minimized.

Copy link
@calmh

calmh Feb 13, 2017

Member

This only runs after the first scan - on the next scan it'll never be rescheduled, so we only ever get two full scans. We need to reschedule a full scan one long interval after each full scan, unless I misunderstand.

This comment has been minimized.

Copy link
@imsodin

imsodin Feb 13, 2017

Member

I wouldn't scrutinize the entire rwfolder.go part too much (apart from how an FsWatcher is started). There are some issue there and I had improvements in a PR together with rofolder integration, but dropped that due to incompatibility with the sendreceive PR. I will redo it once that gets finalized.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

I did some quick testing and noticed that it seems to drop events, at least on my computer and setup. I used the s1/s2 setup that is in the test directory, configured s2 with a long scan and long-scan interval (so we're mostly depending on events), and copied in some data.

After copying in a large directory (~/src/github.com/calmh, about 40k files and directories, lots of git stuff) and letting it sync to s1, s1 was missing part of the hierarchy. With our current default config it would get propagated within a minute (plus sync time), in this case it would get picked up after the next "long scan", i.e. an hour. I suspect we might see bug reports on this.

Is there some way we could detect that we're likely dropping events and finish off with a full scan? Or do it as a precautionary move after every x thousand events or something?

@imsodin

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

For your tests, did you do them with or without plouj#15? Don't do tests without it, my previous commit regarding aggregation is at best lacking some features and at worst failing. I had no tests at that time.

I just realized that even with the new aggregation there is a (probably small) chance of dropped events: If the notify backend sends events faster than they are aggregated, the buffered chan might overflow. t will add a check to detect that and issue a full scan in that case.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

@imsodin

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

I did my tests on this PR, merged with master.

This PR as in this main PR to the syncthing repo or my PR on plouj's branch (plouj#15) that extends this PR?

The workflow until now was that I made separate PRs to plouj's branch, where these changes were reviewed and eventually added to this PR by plouj.

I added a check for channel overflow in that PR, so I hope there should not be any more event loss.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

@imsodin

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

The thing is this PRs branch of origin lives in plouj's syncthing fork. So I cannot commit to it directly. I could post an entirely new PR to the syncthing repo including my changes, but that would mean abandoning this PR (and feels like plagiarizing plouj's work).
I am not quite sure how to proceed.

@calmh

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

Both you and plouj can push directly to syncthing/syncthing. In this case I'd suggest just pushing this to a new branch there, with the changes you've added. Create a new PR from that branch and link to this one in the description. Whoever has time and inclination to work on it can add more commits there without issue. We'll sort out the final blame when we merge it so everyone involved gets proper credit.

I generally prefer that people keep their branches in their own forks or we'll get overrun by litter, but this seems like a perfectly legitimate case for the branch to live in the main repo.

@calmh calmh closed this Jun 25, 2017

@syncthing syncthing locked and limited conversation to collaborators Jun 26, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.