Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore repo scanning #293

Closed
ramesh45345 opened this issue May 26, 2014 · 15 comments
Closed

Multicore repo scanning #293

ramesh45345 opened this issue May 26, 2014 · 15 comments
Labels
enhancement New features or improvements of some kind, as opposed to a problem (bug) frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion

Comments

@ramesh45345
Copy link

Strange request for enhancement: Is it possible for syncthing to be multithreaded when performing a scan? My Corei7 appears to use one core to perform a scan, would probably see improvements in speed when using all 8...

@jpjp
Copy link
Contributor

jpjp commented May 26, 2014

Are you running a recent version? calmh@21335d6

@calmh
Copy link
Member

calmh commented May 26, 2014

A single repo is still scanned on only one thread.

@calmh calmh changed the title Multithreading Multicore repo scanning May 26, 2014
@ramesh45345
Copy link
Author

Yeah, I'm running 0.8.10 gc6ba020.

I suppose that the walk function used by that ScanRepo function would have to perform walkAndHashFiles using the same sort of go routine? Then again, this doesn't look like C...

@calmh
Copy link
Member

calmh commented May 26, 2014

Yeah, what needs to happen to distribute load is to fire up multiple goroutines to do the reading/hashing and then distribute the files or blocks to them with a queue. It's not rocket science, just complexity that hasn't been necessary yet.

@calmh
Copy link
Member

calmh commented May 26, 2014

Should probably do some benchmarking to see how much CPU a SHA256 hasher needs anyway. I think I was going by the assumption that in most cases the disks were going to be the bottleneck. Of course, fast SSDs and RAID setups shift that balance.

@ramesh45345
Copy link
Author

Hmm. I just ran a sha256sum on some files while syncthing is running. I notice that the CPU isn't being taxed much more, but it seems like two threads helped saturate the disk I/O even more.

Syncthing: CPU: 92% (this is out of 800% for 8 cores), Disk I/O: 55MB/sec
sha256sum: CPU: 32%, Disk I/O: 33MB/sec

The disk I/O was measured using iotop, so I'm not sure how accurate it is. It seems like multithreading can help with disk I/O more than CPU, since even sha256sum wasn't pegging the CPU too much...

@calmh
Copy link
Member

calmh commented May 26, 2014

Note also that if/when this is implemented it will immediately generate a counter-bug-report saying that CPU usage is too high and this affects media playback or whatever. So it'll need to be tuneable etc.

@jedie
Copy link
Contributor

jedie commented May 26, 2014

For too much CPU usage, is "nice" and to much IO usage is "ionice" for. So IMHO syncthing must not have any tuneable settings for this.
But tuneable should be the RAM consume. That can be a problem...

just my 2 cents.

@calmh
Copy link
Member

calmh commented May 26, 2014

Windows.

@jpjp
Copy link
Contributor

jpjp commented May 26, 2014

Who says syncthing should sync as fast as possible? I want it to stay out of the way as much as possible, not turn my laptop into a battery dead heat monster :)

@calmh
Copy link
Member

calmh commented May 26, 2014

^ There we go! :)

@jedie
Copy link
Contributor

jedie commented May 26, 2014

windows has also priority for CPU and IO ;)
But ok, you need some tools for it to set ;)

@AlexDaniel
Copy link

I agree that it should stay out of the way, but what if we add a "Thread count" option?

@calmh
Copy link
Member

calmh commented Jun 5, 2014

Note that this actually already exists. As part of the Go runtime, you can set the environment variable GOMAXPROCS to the maximum number of CPU bound threads syncthing should start. It defaults to the number of cores in your box, and is anyway limited by how much work syncthing schedules on the CPU. But as it is, 200% isn't unusual when scanning two repos, or syncing data to fast nodes etc. If you set GOMAXPROCS=1, syncthing will be limited to one CPU core, so should not exceed 100% usage. It's not more fine grained than that unfortunately, so you can't limit it to 75% of a core or so.

@calmh calmh closed this as completed in 2be1218 Jul 30, 2014
@calmh
Copy link
Member

calmh commented Jul 30, 2014

Syncthing doing initial scan (no index) of two repos, prior to this commit - one core per repo:

cpu-single

With parallel hasher, many cores per repo:

cpu-multi

Note that it still doesn't manage to saturate all cores when scanning a single repo (the period from ~12 to 26 seconds). I guess it needs more threads doing I/O as well to be able to get the most data from disks. Anyway, good enough for now. As above, use GOMAXPROCS to cap CPU usage if this is undesirable... If this becomes a thing, we could add a GUI-accessible setting for "Max CPU Threads".

@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Jun 17, 2017
@syncthing syncthing locked and limited conversation to collaborators Jun 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New features or improvements of some kind, as opposed to a problem (bug) frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion
Projects
None yet
Development

No branches or pull requests

6 participants