New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to limit max simultaneous scans #2760

Closed
entonio opened this Issue Feb 5, 2016 · 18 comments

Comments

Projects
None yet
9 participants
@entonio
Copy link

entonio commented Feb 5, 2016

Is there a way to limit the number of concurrent scans? I have a NAS that works reasonably when processing a single folder, but performance drops exponentially when syncthing tries to rescan more than one at the same time, which is normal given disk thrashing (I suppose flash storage wouldn't suffer from this issue). Is there any way to tell syncthing (on Mac) to rescan only one folder at a time, rather than to try them all at once? If not, could that be implemented?

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

AudriusButkevicius commented Feb 5, 2016

There is no way, apart from managing rescans yourself, or making sure that the scan ranges somehow don't overlap.

That could be implemented, but needs someone who is interested in implementing this.

@AudriusButkevicius AudriusButkevicius added this to the Unplanned (Contributions Welcome) milestone Feb 5, 2016

@entonio

This comment has been minimized.

Copy link

entonio commented Feb 5, 2016

Thanks! (I'd have added the tag myself, but only the owners can do it, I believe.)

Is there any way to avoid rescan on re/starting syncthing? With that, I could manage the rescans myself, but without it I can't because they start automatically (a number of those folders are 'write only', being mirrors of remote read-only folders, so properly speaking they only need the initial scan).

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

AudriusButkevicius commented Feb 5, 2016

No, a scan on startup is mandatory, as we need to reconcile the state while we were offline.

@entonio

This comment has been minimized.

Copy link

entonio commented Feb 7, 2016

I understand that rationale, but from the moment one can disable normal rescans via the 0 interval, then Syncthing is not monitoring the folder anyway - anything can happen to it and Syncthing won't know. From that perspective, if the user disabled rescanning it, then it should apply to startup also. There's always manual scanning available if need be.
Picture 10 TB folders which are only used as mirrors ('write only') and having to rescan them with every restart for no gain. The only useful scan is on creation, because the user may have initialized the folder with a copy of the remote contents, but all subsequent ones are of no use (unless some problem comes up, but for that there's manual rescan). I do realise you didn't set out to write Syncthing for the purpose of mirroring, but it has turned out as an excellent choice for it.

@sisu4syncthing

This comment has been minimized.

Copy link

sisu4syncthing commented Feb 7, 2016

@entonio
For really big (trillions of files) folders (which almost never change in a week), I have set the rescan interval to 36000 seconds, which are 10 hours.
So maybe you can add a zero and let the folder rescan every 100 hours?

@entonio

This comment has been minimized.

Copy link

entonio commented Feb 7, 2016

@sisu4syncthing, thanks. I already have it set to 0, which makes Syncthing never to rescan them (it's not that they don't change often, they just aren't meant to change at all). But Syncthing does still rescan them on startup, for the reason @AudriusButkevicius pointed out, and what I'm discussing is thah the same rationale behind the '0' interval also applies to the startup scan (whether via an additional setting or not is a different matter).

@entonio

This comment has been minimized.

Copy link

entonio commented Feb 7, 2016

To wit, afaict there are 3 types of scans. For the specific case I'm thinking of (mirroring):

  • On creating the folder. It makes no sense to disable this one because Syncthing does have to initialise its knowledge of the local contents:
    • if the folder is created empty, then the scan wont take long anyway
    • if the folder is pre-populated then Syncthing has to know what's already there
  • Regularly, according to the rescan interval. It is currently possible to disable this altogether based on our knowledge of the fact that the folder should not change. But it's possible that changes are made. Syncthing won't know about them. If the user knows about the changes and wants them to be taken into account, they can do a manual rescan,.
  • On Syncthing startup. Every folder is rescanned. My opinion is that all the reasons behind allowing the 0 interval to disable rescanning also apply here, so it should be possible to disable scanning a folder on startup. It could be a consequence of the 0 interval, it could be a separate setting, whatever.

Of course, in a pro server environment, Syncthing is hardly ever restarted, but 1. then those folders don't ever get rescanned anyway, 2. Syncthing is so good/empowering that it's being successfully used in plenty of amateur environments.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

AudriusButkevicius commented Feb 7, 2016

So rescans on already indexed folders should cost close to nothing, as it's just a bunch of stat calls which take no CPU, hence I still don't understand the rationale behind not scanning on startup.

@entonio

This comment has been minimized.

Copy link

entonio commented Feb 7, 2016

Then my issue may lie elsewhere. The folders have been indexed, but the rescans are full scans, not just stat. lsof seems to show Syncthing doing a full read of every file. Just to clarify, these 'mirror', 'write-only' folders are stored on a NAS, mounted via SMB on a Mac, where Syncthing is running.

I know little about SMB, could it be the case that a mere stat implies a full read?

Or could there be some problem that stops the index being written to disk? I have seen nothing in the logs to that effect, but I don't know what to look for.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

AudriusButkevicius commented Feb 7, 2016

Well you can run with STTRACE=scanner, and it will probably tell you that your files changed, because SMB does something funky with modification timestamps or permissions.

@entonio

This comment has been minimized.

Copy link

entonio commented Feb 7, 2016

I'll try that out the next time I have to restart Syncthing.
The timestamps stuff isn't news.... FastCopy (http://ipmsg.org/tools/fastcopy.html.en) goes some way to deal with it (under 'Tolerance at the timestamp comparison').
I'm inclined to think that may be it, since permissions wouldn't at first sight require scanning the contents, whereas an imprecise timestamp may suggest the file has changed. But I'll only know when time comes for a restart.

@joaofl

This comment has been minimized.

Copy link

joaofl commented Jul 19, 2016

I could try to contribute on that. Could anyone point a starting point? Haven't study the code yet.

My idea is to have a queue for scanning. Could be one queue per drive. Have any idea on how practical this solution is?

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

AudriusButkevicius commented Jul 19, 2016

Per drive is a bit of a longshot, as on Linux you can't easily check which drive a particular folder belongs to, because you can have bind mounts pointing to a completely different drive nested somewhere deep inside.

The scanning code is mostly in model/model.go and model/rwfolder.go.

@GitHubGeek

This comment has been minimized.

Copy link

GitHubGeek commented Jul 23, 2016

Can a variable be added to advanced settings, "Max. number of shares to scan concurrently"?

Or, let user "group" shares. Only one share per group can be scanned concurrently.

As you can see from the below iotop screenshot, hashing 2 shares concurrently on the same HDD spindle really hurts performance. (Expected 130MB/s for single-thread scanning)

syncthing iotop

@calmh

This comment has been minimized.

Copy link
Member

calmh commented Jul 23, 2016

I didn't see it mentioned upthread so throwing it out here: in the meantime, make sure to set hashers=1 on the folder to limit concurrent scanning per folder, and perhaps set a slightly longer scan interval on the folders. As the scan interval is slightly random you most often not get several folders scanning at the same time, for a small number of folders at least.

lkwg82 added a commit to lkwg82/syncthing that referenced this issue Jan 1, 2018

@calmh calmh removed this from the Unplanned (Contributions Welcome) milestone Feb 11, 2018

AudriusButkevicius added a commit to AudriusButkevicius/syncthing that referenced this issue Apr 17, 2018

@calmh calmh closed this in ff2cde4 Dec 5, 2018

@calmh calmh added this to the v0.14.54 milestone Dec 5, 2018

@calmh calmh changed the title Max simultaneous scans? Option to limi max simultaneous scans Dec 11, 2018

@calmh calmh changed the title Option to limi max simultaneous scans Option to limit max simultaneous scans Dec 11, 2018

@Catfriend1

This comment has been minimized.

Copy link

Catfriend1 commented Jan 1, 2019

Related: #4888 (as I didn't find the reference)

@uok

This comment has been minimized.

Copy link
Contributor

uok commented Jan 4, 2019

@calmh this works great, it cuts scan time to less than 1/3 on a device with ~20 folders and 2 M files 👍
Maybe we could set this as a default?
Afaik there is no downside for fast computers (with SSD), but it's a big improvement for (slow) devices with spinning disks. It also helps if folder A is scanning and folder B starts after regular scan interval before A has finished.

@Ferroin

This comment has been minimized.

Copy link

Ferroin commented Jan 4, 2019

On the note of this possibly being a default, the only case I can see where it might truly be a bad thing is big systems with storage devices that really can handle massively parallel access (NVMe maybe?) and have large core counts (so that they can reliably hash in parallel too).

Testing on all of my systems (Ryzen 7 with conventional spinning hard disks, Core i7 with a nice SATA SSD, single-core VPS nodes with good SSD's, and my phone), all of them show at minimum improved responsiveness of the rest of the system during the startup scan, and everything but the Core i7 system showed much better behavior (slightly faster scan time overall with lower total CPU usage throughout) during regular operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment