Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow skipping Full Rescan on startup if Interval has not yet been exceeded. #5353

Open
SkyLined opened this issue Dec 7, 2018 · 15 comments

Comments

Projects
None yet
4 participants
@SkyLined
Copy link

commented Dec 7, 2018

I've setup a very long Full Rescan Interval for my folders but when I restart Syncthing, all folders are scanned even when they have already been scanned within this interval. If this is intentional, please add a setting to disable this.

Background: I am running Syncthing on a slow Raspberry Pi. Full Rescans consume 100% CPU, RAM and I/O, which makes the WebUI (and SSH) almost impossible to use. I am trying to avoid them by setting a very large interval but when I restart Syncthing, I basically cannot use the WebUI for a few hours because of all the unwanted rescans taking place.

Version Information

Syncthing Version: v0.14.54, Linux (ARM)
OS Version: Linux 4.14.69+ on Raspberry Pi
Browser Version: n/a

@desbma

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

Full Rescans consume 100% CPU, RAM and I/O

Maybe you can improve that by lowering the CPU and IO priority, or even better use a different scheduler.

For example if you use the syncthing@ service, with a syncthing user:

sudo mkdir -p /etc/systemd/system/syncthing@syncthing.service.d
echo '[Service]
CPUSchedulingPolicy=idle
IOSchedulingClass=idle' | sudo tee /etc/systemd/system/syncthing@syncthing.service.d/low-priority.conf
systemctl daemon-reload
systemctl restart syncthing@syncthing.service

For the other part of your issue, I am not familiar with that part of the code, but I suspect the initial scan can not be avoided, because Syncthing does not know if your files have been modified when it was not running.

@SkyLined

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

Thanks, that suggestion sounds like it should solve some of my problems but might make the WebUI sluggish as it will also be throttled down... so, I'll give it a try to see how well it works.

Wrt. not knowning if a rescan is needed: in that case this bug would be a feature request for Syncthing to store a timestamp for the last scan on disk and read that at startup to avoid a rescan when none is needed.

@calmh

This comment has been minimized.

Copy link
Member

commented Dec 7, 2018

Doing a scan at startup to kick off the schedule is intentional. With the default setting the difference isn't relevant. However in cases like yours where rescans should happen about once a day, I can see how rescanning everything on startup might be annoying. We could fix that, probably.

@calmh calmh added the enhancement label Dec 7, 2018

@SkyLined SkyLined changed the title Full Rescan Interval not applied on startup Allow skipping Full Rescan on startup if Interval has not yet been exceeded. Dec 7, 2018

@SkyLined

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

Fair enough, make this a feature request then :)

@desbma

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

Wrt. not knowning if a rescan is needed: in that case this bug would be a feature request for Syncthing to store a timestamp for the last scan on disk and read that at startup to avoid a rescan when none is needed.

How would you handle the case where the local files have been modified when the service (and the inotify watch) is not running?

@desbma

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

Thanks, that suggestion sounds like it should solve some of my problems but might make the WebUI sluggish as it will also be throttled down...

If the scan is currently CPU bound, yes it is likely to slow down the web UI too. However if it is IO bound, it should remain reactive, only the background scan may take longer to run.

@calmh

This comment has been minimized.

Copy link
Member

commented Dec 7, 2018

I think this could only apply in the case where filesystem notifications are disabled. If we are running with notifications we need the initial scan to have a baseline for notifications.

If we are running without notifications and have a rescan interval of an hour, it doesn't hurt to wait half an hour or whatever remains of the rescan interval on restart.

@SkyLined

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

How would you handle the case where the local files have been modified when the service (and the inotify watch) is not running?

Syncthing would now know about it until it did a full re-scan less than a day later. But this is not different from running Syncthing with "Watch for changes" disabled, so I do not see a problem with that (unless I am missing something).

@SkyLined

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

If we are running without notifications and have a rescan interval of an hour, it doesn't hurt to wait half an hour or whatever remains of the rescan interval on restart.

Indeed; in my case it is a day but I do not care if my backup is out of sync for a day.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Dec 7, 2018

I recall there were other reasons to run a scan. Mostly to prime caches todo with completion, but perhaps these days this is not important anymore.

@desbma

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

Sorry if this has already been answered elsewhere, but on Linux when inotify is used (the default), why is a periodic rescan even needed?
If Syncthing first setups the inotify watches, and then does the initial scan (in that order to avoid races), why isn't that sufficient to be immediately notified of all futures filesystem changes?

EDIT: The doc says :

Even with watcher enabled it is advised to keep regular full scans enabled, as it is possible that some changes aren’t picked up by it.

But does not explain how changes can be missed.

@calmh

This comment has been minimized.

Copy link
Member

commented Dec 8, 2018

Not all types of changes generate events, and events may be dropped due to overflow.

@desbma

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2018

Care to elaborate?

I use inotify on another projet an I am interested in its limitations. I have only found some on FUSE filesystems.

If you are refering th the /proc/sys/fs/inotify/max_* overflow, then syncthing knows when it hits that limit, and could then fallback to heavyweight "full rescan to poll changes" approach (with a big fat warning to inform users that bumping the limit will improve performance).

@calmh

This comment has been minimized.

Copy link
Member

commented Dec 8, 2018

IIRC, a chmod does not generate an event that we see.

I don't mean hitting max watches, which is handled as you say. I mean there being a flood of events faster than we can process them. There is best effort detection of this, but IIRC (again, I'm not super involved in the low level notify packages) we might not always know we missed events. Keep in mind there are a lot of underlying APIs (inotify, kqueue, whatever the thing is on Windows, etc) and the behavior of all is abstracted together.

All in all, the notify thing is something like a 99% solution. It mostly does what you expect, but for correctness we should still do periodic scans now and then.

@desbma

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2018

IIRC, a chmod does not generate an event that we see.

According to the inotify man page, it should:

IN_ATTRIB (*)
                  Metadata changed—for example, permissions (e.g.,
                  chmod(2)), timestamps (e.g., utimensat(2)), extended
                  attributes (setxattr(2)), link count (since Linux 2.6.25;
                  e.g., for the target of link(2) and for unlink(2)), and
                  user/group ID (e.g., chown(2)).

http://man7.org/linux/man-pages/man7/inotify.7.html

I don't mean hitting max watches, which is handled as you say. I mean there being a flood of events faster than we can process them. There is best effort detection of this, but IIRC (again, I'm not super involved in the low level notify packages) we might not always know we missed events.

You are probably refering to this code:

// When next scheduling a scan, do it on the entire folder as events have been lost.

If this is not sufficient to detect missed event, then I assume events can only be lost before that, in the syncthing/notify code.

Keep in mind there are a lot of underlying APIs (inotify, kqueue, whatever the thing is on Windows, etc) and the behavior of all is abstracted together.

Yes, other APIs may have limitations, I am only interested in Linux + inotify.

All in all, the notify thing is something like a 99% solution. It mostly does what you expect, but for correctness we should still do periodic scans now and then.

This still strikes me as suboptimal, but I am opening a new issue to discuss this instead of polluting this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.