Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory and CPU usage is prohibitively high #468

Closed
pedrocr opened this issue Aug 4, 2014 · 86 comments
Closed

Memory and CPU usage is prohibitively high #468

pedrocr opened this issue Aug 4, 2014 · 86 comments
Labels
enhancement New features or improvements of some kind, as opposed to a problem (bug) frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion

Comments

@pedrocr
Copy link

pedrocr commented Aug 4, 2014

I've been testing syncthing across 3 machines, a laptop with 8GB of RAM and two NAS-style servers with 1GB and 2GB of RAM. My repositories have the following sizes:

  • 19 items, 536 KiB
  • 83471 items, 16.2 GiB
  • 181482 items, 387 GiB

To sync these three repositories syncthing 0.9.0 uses a bit over 700MB of RAM and while syncing continuously pegs the CPU at 150% on all nodes.

While the CPU usage I could manage during the initial sync the memory usage is simply too high. A typical NAS server like the two I have holds >4TB of storage. At the current level of usage that would require ~8GB of memory just for syncthing.

Without looking at the code I assume an index is being kept in memory for all the repository contents. 700MB is 2.6kb per item on disk, which seems way too high. The index should only really need to store filename, parent, permissions, size and timestamp. On average (depends on filename sizes) that should only be 50-60 bytes per item which would only be 13MB. Moving that index to disk would also make a lot more sense.

I assume the CPU usage is hashing lots of files. There I assume using librsync might be a better option.

@calmh
Copy link
Member

calmh commented Aug 4, 2014

The memory usage should not scale linearly with the repo size, precisely because the index is not kept in ram. Perhaps there's something going on here related to the initial sync. The sync will use as much cup as it can, for hashing, compressing and encrypting. You can limit this in various ways.

@pedrocr
Copy link
Author

pedrocr commented Aug 4, 2014

The CPU I can live with even though it seems too high compared with ssh+rsync that provides a similar primitive. The rsync protocol may be much cheaper though. Using librsync for sync would probably be a good bet as you'd also get fast syncing of changes to large files.

The memory usage is an actual bug them. Maybe it's leaking somehow (although Go has GC I think).

@pedrocr
Copy link
Author

pedrocr commented Aug 4, 2014

I just tried restarting the syncthing instances and both of them went down to ~400MB and are now rising again. There's nothing suspicious in the logs.

@calmh
Copy link
Member

calmh commented Aug 4, 2014

For the record, this is steady state for me with a comparable amount of data:

screen shot 2014-08-04 at 21 15 54

(In Swedish unfortunately, but you'll recognize the fields.) I'd be interested in seeing how it looks for you once indexing is complete; I'm not sure what state you're in currently.

@pedrocr
Copy link
Author

pedrocr commented Aug 4, 2014

The ~400MB is after it's already started syncing so any indexing should be complete. Yours also shows ~1kB/item which seems much too high. At 400MB mine is at ~1.5kB per item so it's comparable. My data just has a lot more smaller files than yours.

@pedrocr
Copy link
Author

pedrocr commented Aug 4, 2014

The syncing also seems CPU bound as the receiving node is constantly pegged at 150% while only sporadically receiving data. Most of the time the network connection is idle. I've left it on for 24hours and it's only synced ~28GB of ~400 across two repos.

@calmh
Copy link
Member

calmh commented Aug 4, 2014

Yours also shows ~1kB/item which seems much too high.

No. It's not that simple.

I've left it on for 24hours and it's only synced ~28GB of ~400 across two repos.

That's exceedingly slow, indeed. These NAS-style servers, are they very underpowered?

@pedrocr
Copy link
Author

pedrocr commented Aug 4, 2014

No. It's not that simple.

On average that's what your syncthing process is spending in RAM per item. The usage may be in other stuff though.

That's exceedingly slow, indeed. These NAS-style servers, are they very underpowered?

Not particularly. Both of them are AMD64 machines. The one currently online and syncing slowly is an "AMD Athlon(tm) II Neo N36L Dual-Core Processor". Not particularly fast but also not the slow ARM stuff that's common in NAS servers.

@calmh
Copy link
Member

calmh commented Aug 4, 2014

Yeah, that's not very slow under the circumstances, so then I don't really know why it syncs that slowly.

About the memory, I understand averages, but the fact is only very little memory is spent on file indexes since the index is not in RAM. Slices of it are, at times (up to a few thousand items) but the rest is buffers of various kinds (for holding blocks in transfer, compression, encryption), database cache, all other kinds of state kept everywhere, etc. Multiply by 1.5 or so for GC.

Obviously there's something going on there with your memory usage though...

@pedrocr
Copy link
Author

pedrocr commented Aug 4, 2014

Obviously there's something going on there with your memory usage though...

There doesn't seem to be anything particular about my case. Your machine is roughly in line. If this is all fixed cost and doesn't grow if the repository is 1 or 2 TB then it's less serious. It's still quite high though. I'll have to read the code to understand this better.

@pedrocr
Copy link
Author

pedrocr commented Aug 5, 2014

I've given up on this for now. rsync is able to saturate the line speed with negligible memory usage and pretty low CPU usage (only SSH really uses CPU and not that much).

Edit: Forgot to mention, after two days it's only been able to sync 50GB while pegging the CPU constantly and using 500MB of RAM on the sending side and 700MB on the receiving side.

@calmh
Copy link
Member

calmh commented Aug 6, 2014

Yeah, no idea what's going on there, sorry. You mentioned the sync was running over a tinc vpn, perhaps there's some bad interaction there. If nothing else it's ssl-over-tcp-over-ip-over-ssl-over-tcp-over-ip which is suboptimal for sure.

@calmh calmh closed this as completed Aug 6, 2014
@pedrocr
Copy link
Author

pedrocr commented Aug 6, 2014

If you'd like to isolate the factor I can run a test without tinc. tinc adds a CPU cost because of its own encryption, and that penalty is also paid by rsync, which runs fine. It may be that the receiving side isn't able to hash fast enough to keep up with line speed though. Since rsync doesn't hash it wouldn't hit that issue.

As for memory usage, that shouldn't have anything to do with tinc. The memory is used even when not syncing, and your example shows similar memory usage. 1-2MB per GB of repo size is way too high.

I see that you've closed the issue though, so let me know if you want help tracking down what's happening with the CPU and memory usage.

@calmh
Copy link
Member

calmh commented Aug 7, 2014

There is no memory cost of "1-2MB per GB of repo size", it should stabilize around 100-150 megs or so maximum, irrespective of repo size. More than that is a bug, which is not reproducible in my installations. I'd like to get to the bottom of it though.

(There's a bug in the memory reporting in the GUI in v0.9.0, it'll show a little more than the truth in some cases so comparing actual RSS values from top or so would be nice, but I don't think it accounts for more than a few percent of the total.)

@calmh
Copy link
Member

calmh commented Aug 7, 2014

Some more info here, since I've done some testing. I created a test repo with 200 GB in 250,000 files and measured memory usage. With v0.9.1 at idle (syncthing having done nothing for several minutes; this is relevant, more about that later) syncthing uses about 110 MB of RAM. I'd say this is acceptable, but a bit on the high side. But the GUI is a bad side. Actually loading or reloading the GUI drives the mem usage up to 520 MB. That's obviously crappy. The reason is that the GUI does a bunch of REST calls in parallel that all result in basically linear scans of the entire DB, counting all files and sizes etc. This drives up the peak usage a lot, and it is only slowly returned to the OS once the GC mechanism figures we're not going to need it again anytime soon. Hence the note about idle above. I implemented some tweaks to trigger GC earlier when doing expensive database operations that helped a bit with this. With these changes I get a peak of about 180 MB when reloading the GUI and an an idle usage around 50 MB:

screen shot 2014-08-07 at 23 02 55

There's more to be done here, particularly things that should be kept calculated so that we don't need to scan the full db to figure it out when we load the GUI...

@pedrocr
Copy link
Author

pedrocr commented Aug 7, 2014

There is no memory cost of "1-2MB per GB of repo size", it should stabilize around 100-150 megs or so maximum, irrespective of repo size. More than that is a bug, which is not reproducible in my installations. I'd like to get to the bottom of it though.

Your example showed similar values per item (and not per GB) to what I was seeing. I mixed up the per-item values with the per-GB values. My repo has a lot of small files which apparently is a corner case here. 100-150MB fixed size would be fine for me.

With v0.9.1 at idle (syncthing having done nothing for several minutes; this is relevant, more about that later) syncthing uses about 110 MB of RAM. I'd say this is acceptable, but a bit on the high side. But the GUI is a bad side. Actually loading or reloading the GUI drives the mem usage up to 520 MB. That's obviously crappy.

Ah, that makes more sense. Since I was testing this out I always had the GUI open. Why is being idle relevant? Does syncing drive the memory usage up? I'll have to try to monitor the usage without the GUI to see what happens in my setup.

Thanks for looking into it.

@calmh
Copy link
Member

calmh commented Aug 7, 2014

Actually syncing files will use a bit more memory than idle, yeah, but probably not as much as loading the GUI. Anyway, that feels a little bit more legitimate since it's actually doing something then.

The idleness part is relevant because of how Go:s garbage collector works. Basically, it keeps an area of memory for objects. The regular garbage collection cycles marks "free" memory as reusable for new objects, but doesn't return free memory to the OS. When a given chunk of memory has been unused for a few minutes, it's returned to the OS.

So in syncthings case, you'll drive up memory usage by opening the GUI. But if syncthing isn't actually doing anything, then neither will the GUI (even while it's still open) and after a few minutes you'll see the memory usage reduce back to previous levels.

tintamarre pushed a commit to tintamarre/syncthing that referenced this issue Aug 8, 2014
@pedrocr
Copy link
Author

pedrocr commented Aug 9, 2014

Actually syncing files will use a bit more memory than idle, yeah, but probably not as much as loading the GUI. Anyway, that feels a little bit more legitimate since it's actually doing something then.

After thinking about this my opinion is that using ~100MB when idle, and going up to 300MB-600MB when scanning and using the GUI is 10x to 100x worse than it should be. All the cases for using that much memory and for using more memory when scanning should be fixable:

  • When doing the initial scan the process should just be a recursive trip down the tree saving info to disk, there's no reason not to be able to do this with constant memory.
  • When scanning for changes to sync just do like modern rsync and create an incremental file list to sync. Whenever you have N (say 100) items in the list, stop scanning and sync them until you have M items in the list (say 20) and resume the scan.
  • When scanning to produce the info for the interface just run through the tree calculating the sizes and number of items. No reason to not be able to do this with constant memory as well.
  • If the problem is just that a lot of intermediate objects are created and then need to be GC'd then according to a recent discussion on hacker news about Go GC[1] that should be fixable by either stack allocation or object pools, so that the GC doesn't really need to run to reclaim the space, because the same stack or pool space just gets reused without any GC at all.

There shouldn't really be anything stopping syncthing from having a maximum memory usage of ~10MB irrespective of repository size and that would be really useful in multi-user multi-TB environments.

[1] https://news.ycombinator.com/item?id=8148666

@calmh
Copy link
Member

calmh commented Aug 9, 2014

There shouldn't really be anything stopping syncthing from having a maximum memory usage of ~10MB irrespective of repository size

Your pull request will be accepted with gratitude.

@avignat
Copy link

avignat commented Aug 23, 2014

On a Raspberry Pi with no repos it's fine but, just with one repo the CPU usage average is between 90 and 100% used

@alex2108
Copy link
Contributor

This should only happen during inital scan, syncing or when the GUI is open. When the GUI is closed and nothing is happening i get almost zero usage on my pi and only a bit higher during periodic rescan.

@seidler2547
Copy link

Syncthing seems to be leaking memory a lot recently. My 0.9.8 instance is up to 440MB memory usage, whereas with 0.9.2 it never got over 180MB. How can I debug this?

@AudriusButkevicius
Copy link
Member

It's a garbage collected language, so its quite hard to leak stuff.
What are your stats? Number of repos, number of files per repo, number of nodes the repos are shared with?

@calmh
Copy link
Member

calmh commented Aug 28, 2014

@seidler2547 Is that memory as reported by syncthing, or by some other method?

@jpjp
Copy link
Contributor

jpjp commented Aug 28, 2014

I ran syncthing on a Pi, it really does max out the hardware. CPU was the problem for me.

@pedrocr
Copy link
Author

pedrocr commented Sep 11, 2014

After restarting the two nodes it seems it's finally been able to sync the three nodes. The end result is the same as before:

  • The laptop where I do most file changes is sitting at ~800MB of RAM
  • The two NAS servers are each sitting at ~500MB of RAM

The RAM usage seems to go up just by using the web interface. I can run the same tests monitoring only with top if that is helpful.

The issue has been closed but it's definitely not fixed for me. Please let me know if you still want input or if this is really WONTFIX so I can move on.

@seidler2547
Copy link

About the memory usage, I have finally let syncthing run for long enough to get some (hopefully) usable heap profiles, please find the tar.xz file here: https://yttr.co/o/vlt5q63r.xz

I hope this will help to see that the memory usage continues to increase over time with no apparent reason. The initially good 40-50MB memory usage is at 216MB again on my NAS.

To summarize heap increases:
Sep 21 14:24 startup, very low memory usage
Sep 21 14:36 settle-in/repo scans finished, good memory usage
Sep 21 15:24 first bunp, maybe first sync kicked in
Sep 21 17:09-21:23 gradual increase in memory usage, possibly GUI was used and/or lots of sync
Sep 22 15:44 45% increase in memory usage, no apparent reason
Sep 23 19:59 35% increase in memory usage, no apparent reason

I'll keep on monitoring and will send more heap profiles if it increases more.

@crobarcro
Copy link

I know this issue is closed, but on my Raspberry pi using syncthing armv6-0.10.9 it uses 75-90% cpu virtually all the time. There is only one repo and it ony has 5 files making up around 500kB in it, and they are unchanged. In my config.xml file I have

<configuration version="6">
    <folder id="default" path="/home/pi/Sync" ro="false" rescanIntervalS="600" ignorePerms="false">
        <device id="BLAH-BLAH-1"></device>
        <device id="BLAH-BLAH-2"></device>
        <versioning></versioning>
        <lenientMtimes>false</lenientMtimes>
    </folder>
    <device id="BLAH-BLAH-1" name="rpi" compression="false" introducer="false">
        <address>dynamic</address>
    </device>
    <device id="BLAH-BLAH-2" name="work-laptop" compression="false" introducer="false">
        <address>dynamic</address>
    </device>
    <gui enabled="false" tls="false">
        <address>127.0.0.1:8080</address>
    </gui>
    <options>
        <listenAddress>0.0.0.0:22000</listenAddress>
        <globalAnnounceServer>announce.syncthing.net:22026</globalAnnounceServer>
        <globalAnnounceEnabled>true</globalAnnounceEnabled>
        <localAnnounceEnabled>true</localAnnounceEnabled>
        <localAnnouncePort>21025</localAnnouncePort>
        <localAnnounceMCAddr>[ff32::5222]:21026</localAnnounceMCAddr>
        <maxSendKbps>0</maxSendKbps>
        <maxRecvKbps>0</maxRecvKbps>
        <reconnectionIntervalS>60</reconnectionIntervalS>
        <startBrowser>false</startBrowser>
        <upnpEnabled>true</upnpEnabled>
        <upnpLeaseMinutes>0</upnpLeaseMinutes>
        <upnpRenewalMinutes>30</upnpRenewalMinutes>
        <urAccepted>0</urAccepted>
        <restartOnWakeup>true</restartOnWakeup>
        <autoUpgradeIntervalH>0</autoUpgradeIntervalH>
        <keepTemporariesH>24</keepTemporariesH>
        <cacheIgnoredFiles>true</cacheIgnoredFiles>
    </options>
</configuration>

Memory usage is not that high though. But i had to nice syncthing to even get access to the pi over ssh as it was eating all the cpu constantly.

@AudriusButkevicius
Copy link
Member

Was this caused by the upgrade from 0.10.8 to 0.10.9?
I guess you need to run CPU profiling to see what's actually burning the CPU.
-help explains how to do that up to some extent.

@crobarcro
Copy link

Actually, on furher investigation, I think this might be related to a corrupt index database. I ran from the command line and saw an error message about the db, so I deleted it, so syncthing would rebuild. Now syncthing only uses a tiny amount of CPU when idle as expected.

@AudriusButkevicius
Copy link
Member

@calmh, we should try and recover from these ourselves as you suggested.

@calmh
Copy link
Member

calmh commented Nov 30, 2014

Yeah. Just need a good spot to do the leveldb.Recover call.

@ropeladder
Copy link

Syncthing is baking my CPU on my new quad core i7 Windows 8.1 laptop, with 400-600% load any time everything is not fully synced. I just made a .pprof file with the STCPUPROFILE command. Does anyone want it?

@AudriusButkevicius
Copy link
Member

Well, it will be hashing that's burning your cpu, so the pprof won't say much.

@ropeladder
Copy link

So it's normal that it's using so much CPU? For hours on end?

@AudriusButkevicius
Copy link
Member

Depends on which state it's in.
If it's scanning for the first time yes, as it has to hash all the files.
Given the files are not changing it should go back to idle. At the point it notices the files have changed, it will have to hash the file again, bringing the CPU load back up.

It also hashes when downloading files, so CPU load might go up as well, but I guess the network speed should be the bottleneck in that case, causing the CPU to be underutilized.

@ropeladder
Copy link

It's finished with the initial scan, so I guess it's hashing while downloading. But my download speed is generally in the in the B/s range (possibly because it's a lot of very small files?), so should the CPU still be getting so much use?

@AudriusButkevicius
Copy link
Member

There is a known bug on some platforms causing slow speeds while Web UI is open (#867) hence you might want to check speeds via some external set of tools.

I wasn't able to reproduce it, and I am not sure under what circumstances it happens exactly.
What device are you running on?

@ropeladder
Copy link

The home computer with the high CPU usage I'm referring to above is a new i7 Win 8.1 laptop.

Unrelated: Is there an easy way to have Syncthing write new log files each time using the -logfile switch, instead of just overwriting?

I've been playing around on my work computer and noticed a few things:
-The work computer had been giving several warnings (pause puller for 1m, I think b/c it couldn't find the file). On the log file it said "it was unable to delete directory X because it is not empty". I don't think it should be deleting any of the directories. I'm assuming this deleting is related to #1221 somehow.
-When I restart Syncthing and it reconnects to my home computer and tries to finish syncing (it's around 98%) the CPU use is very high (around 300% on an i3 desktop) even though none of the directories say they are scanning and only one of them is syncing.
-On my latest restart of Syncthing it has been unable to connect for some reason, but after the scanning finishes the CPU use does decrease to around zero (in the absence of any syncing).

I'll try it with the GUI closed as well.

@bigbear2nd
Copy link
Contributor

May I ask which folders you are syncing?
Are there any programs running on your home-pc which change the files in your sync folder?

@ropeladder
Copy link

The main folder that seems to be having trouble is the folder for the Scrapbook plugin for Firefox. It saves websites in subfolders inside a data directory and has some .rdf index files that are infrequently updated. (I think it only rebuilds the index when you search it, which is infrequent) (I'm running the extension in Firefox right now and none of the core files have been modified since last week)

@pedrocr
Copy link
Author

pedrocr commented Jan 21, 2015

I was curious if new releases had made this any better so tried 0.10.20 on the same set of files. I'm still seeing ~800MB RAM usage on ~430GB/400k files. The CPU (i5-3320M) is also completely pegged on the sending node even after all the scanning is done (so no more hashing should actually be needed). While this is happening the sync is making very little progress (~40kB/s transfer rates).

@alex2108
Copy link
Contributor

close the GUI and it should be better, there is already a issue about that #867
the RAM usage should also go down after a while of not having the GUI open and not syncing.

@roberth1990
Copy link

How on earth is it acceptable that syncthing uses 10-25 % cpu when syncronizing?

syncthing_cpu_usage

@AudriusButkevicius
Copy link
Member

Sorry but crypto is not free. Read the FAQ.

@pgrm
Copy link

pgrm commented Oct 24, 2015

@AudriusButkevicius crypto isn't free, but it can be much cheaper: https://en.wikipedia.org/wiki/AES_instruction_set or is syncthing already using a library which takes advantage of AES-NI?

@AudriusButkevicius
Copy link
Member

No, because Go's crypto is implemented in Go mostly, rather than ASM. If you can point me to a library which provides a cross-platform, cross-arch AES instruction set based cross-compileable, statically linkable library, we'll swap right away.

@pgrm
Copy link

pgrm commented Oct 24, 2015

So, if you're go's crypto is being used, the one mentioned here: golang/go#11929 than it should already take advantage of AES-NI, and according to that issue it should even speed up faster with 1.6 - or am I getting something wrong?

@AudriusButkevicius
Copy link
Member

Potentially for some parts, as TLS is not the only thing we use, we also use SHA256 which obviously has a cost.

Also the CL only seem to relate to amd64, so it's not an improvement across the board, but probably for newer CPU's (post 2010) which aren't maxed out that much anyway.

@AudriusButkevicius
Copy link
Member

Actually SHA256 already seems to be written in ASM, so perhaps that's the last leg of how good it could be ;)

@pgrm
Copy link

pgrm commented Oct 25, 2015

@AudriusButkevicius I actually came here because I had a huge problem with my syncthing running as a docker container. It was constantly eating 100% of the one CPU to which I restricted it (this was going on for few weeks even though syncthing had nothing to do).

Turns out, if you limit syncthing to less memory than it needs, it goes berserk. I had my container limited to 512MiB of memory. Because before that it was eating giga bytes of RAM. After removing the limitation and letting it run for few hours, the container is using ~750MiB of memory and syncthing reports to use only half as much, 360MiB - that part is ok since I have in total roughly 215k files. Anyhow, now the CPU even during syncing is back to max 20% and usually <1% when nothing is going on.

Any idea why it was constantly consuming 100% cpu when it was running out of memory? Syncing worked, just my computer was slightly unusable.

@AudriusButkevicius
Copy link
Member

So I think there are two factors. First, cgroups does not change what memory is reported by the machine, hence things such as database probably tries to allocate more than it's allowed, causing issues.
Secondly, probably asking to allocate more memory, and mallocs failing potentially kicks off GC.
A CPU profile (check -help) will probably tell you the truth in whats actually consuming the CPU.

@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Jun 17, 2017
@syncthing syncthing locked and limited conversation to collaborators Jun 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New features or improvements of some kind, as opposed to a problem (bug) frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion
Projects
None yet
Development

No branches or pull requests