Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage #3318

Closed
whyrusleeping opened this issue Oct 18, 2016 · 75 comments
Closed

Reduce memory usage #3318

whyrusleeping opened this issue Oct 18, 2016 · 75 comments
Labels
status/deferred Conscious decision to pause or backlog

Comments

@whyrusleeping
Copy link
Member

whyrusleeping commented Oct 18, 2016

As part of our resouce consumption reduction milestore, Lets make an effort to get the idle memory usage of an ipfs node down below 100MB.

Things that could help here are:

  • peerstore written to disk
  • providers garbage collection smarter
  • fewer goroutines per peer connection
  • bitswap wantlists to disk

READ BEFORE COMMENTING

Please make sure to upgrade to the latest version of go-ipfs before chiming in. Memory usage still needs to be reduced but this gets better every release.

@chevdor
Copy link

chevdor commented Nov 4, 2016

Yes please! On a small instance, it looks like this:
screen shot 2016-11-04 at 17 56 42

Guess who is the big one taking all the room ? :)

@matthewrobertbell
Copy link

A nice test for this would be running IPFS on a 512MB VPS. I ran it, using Ubuntu 14.04, running the deamon in the background, and "ipfs get" an 83MB file, the daemon was OOM killed. Testing that the get completes successfully would be nice.

I also got this, which I think is already noted as a bug:

15:19:13.466 ERROR flatfs: too many open files, retrying in 600ms flatfs.go:121

@chevdor
Copy link

chevdor commented Nov 24, 2016

@mattseh the chart I included above is a VPS with 512 MB running pretty much only ipfs and nginx with no swap. IPFS takes way too much, enabling swap is then sadly required.

@matthewrobertbell
Copy link

Agreed, I only tried running it on the tiny VPS to see what the simplest thing was that would kill the deamon.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 9, 2017

Node js implementation is separate, please report that on js-ipfs repo.

This repo is about go-ipfs implementation.

@skorokithakis
Copy link

I'm seeing 700 MB RAM usage on my VPS instance as well, it would be great if this could be lowered.

@timthelion
Copy link
Contributor

I get OOM killed even on a system with 4 gigs of free memory.

root@hobbs:/var/log# dmesg | egrep -i 'killed process'
[764907.341661] Out of memory in UB 2046: OOM killed process 31956 (ipfs) score 0 vm:6157228kB, rss:3875436kB, swap:0kB
[2499620.020001] Out of memory in UB 2046: OOM killed process 25440 (ipfs) score 0 vm:4503708kB, rss:3905584kB, swap:0kB
root@hobbs:/var/log# ipfs version
ipfs version 0.4.10
root@hobbs:/var/log# 

Is there any limitation to how much ipfs uses? How does the ipfs.io gateway stay alive? Do you just restart it every time it dies?

@Kubuxu
Copy link
Member

Kubuxu commented Jul 30, 2017

Our gateways are mostly stable. As a note we are working on connection closing which should solve most of this issue.

@pors
Copy link

pors commented Aug 11, 2017

I have exactly the same issue on a VPS of the same size. I have swapping on and that is what is happening. My VPS provider is complaining :)

@Kubuxu is there an ETA? I'm happy to help test an early version.

@dokterbob
Copy link
Contributor

The only way I have been able to more-or-less stably run ipfs in production was inside a memory-constrained container (systemd cgroup), restarting it everytime it crashed because not having 'enough' memory. This was about half a year ago.

Perhaps this should be considered higher priority as some newer features as it does, fundamentally, affect stability and performance of IPFS in a very bad way.

@timthelion
Copy link
Contributor

timthelion commented Sep 13, 2017 via email

@kpcyrd
Copy link
Contributor

kpcyrd commented Sep 13, 2017

I'd love to see some improvements as well, I'm currently running high memory instances for ipfs. :)

My memory usage is around 1G and 2G.

@pors
Copy link

pors commented Sep 14, 2017

Hey @dokterbob, we meet again :)

@skorokithakis
Copy link

Don't I get any love, @pors?

@pors
Copy link

pors commented Sep 14, 2017

@skorokithakis huh? Scary shit :)

@whyrusleeping
Copy link
Member Author

Hey everyone, we identified and resolved a pretty gnarly memory leak in the dht code. The fix was merged into master here: #4251

If youre having issues with memory usage, please try out latest master (and the soon to be tagged 0.4.11-rc2) and let us know how things go.

@skorokithakis
Copy link

skorokithakis commented Sep 20, 2017

Can we get a build uploaded somewhere, for us plebs? Also, how confident are you that this doesn't contain any show-stopping bugs? We're really really wanting to put a less leaky version to production, but we obviously don't love crashes either :/

@whyrusleeping
Copy link
Member Author

Yeah, builds will be uploaded once I cut the next release candidate. We are quite confident there are no show-stopping bugs (otherwise we wouldnt have merged it), but to err on the safe side its best to wait for the final release of 0.4.11

@whyrusleeping
Copy link
Member Author

Once dns finishes propogating, the 0.4.11-rc2 builds will be here: https://dist.ipfs.io/go-ipfs/v0.4.11-rc2

The non-dns url is: https://ipfs.io/ipfs/QmXYxv8gK4SE3n1imq1YAyMGVoUDiCPgaSynMqNQXbAEzm/go-ipfs/v0.4.11-rc2

@dokterbob
Copy link
Contributor

@pors Nice to run into you again! Still would like to have a proper look at hackpad. How may I contact you? IRC or something?

@whyrusleeping Thanks for another rc. Let's see how this runs. ^^

@pors
Copy link

pors commented Sep 20, 2017

@dokterbob you can email me at mark at pors dot net. And we can change to Dutch :)

@kpcyrd
Copy link
Contributor

kpcyrd commented Sep 20, 2017

Can we please keep this on topic?

@skorokithakis
Copy link

I've been running rc2 all day, and memory usage seems much better than before. It's at 16% now whereas it was at 35% before the upgrade, but, given the nature of leaks, we won't know until after a week or so.

@whyrusleeping
Copy link
Member Author

@skorokithakis Thanks! Please let us know if you notice any perf regressions, fixing this properly meant putting a bit more logic in a synchronous hot path and we arent yet sure if it will be an issue in real world scenarios.

@Calmarius
Copy link

Calmarius commented Oct 9, 2017

More than 4 GB RAM usage here with 0.4.11, according to top:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                     
15384 calmari+  20   0 4797260 2,487g    188 S   3,3 86,2  25:54.47 ipfs                                                                                        
   33 root      20   0       0      0      0 S   0,3  0,0   0:10.12 kswapd0                                                                                     
 1114 mysql     20   0 1246948  12356      0 S   0,3  0,4   0:09.18 mysqld                                                                                      
    1 root      20   0  119780   1304    580 S   0,0  0,0   0:01.96 systemd                                                                                     
    2 root      20   0       0      0      0 S   0,0  0,0   0:00.00 kthreadd                                                                                    
    3 root      20   0       0      0      0 S   0,0  0,0   0:00.64 ksoftirqd/0                                                                                 
    5 root       0 -20       0      0      0 S   0,0  0,0   0:00.00 kworker/0:0H                                                                                
    7 root      20   0       0      0      0 S   0,0  0,0   0:05.87 rcu_sched                                                                                   
    8 root      20   0       0      0      0 S   0,0  0,0   0:00.00 rcu_bh                                                                                      
    9 root      rt   0       0      0      0 S   0,0  0,0   0:00.01 migration/0                                                                                 
   10 root      rt   0       0      0      0 S   0,0  0,0   0:00.07 watchdog/0                                                                                  
   11 root      rt   0       0      0      0 S   0,0  0,0   0:00.07 watchdog/1                                                                                  
   12 root      rt   0       0      0      0 S   0,0  0,0   0:00.01 migration/1                                                                                 
   13 root      20   0       0      0      0 S   0,0  0,0   0:01.08 ksoftirqd/1                                                                                 

SSH session became unresponsive, needed to kill the daemon to get my control back.

It should be noted that my node is pinning a few popular JS frameworks, like jQuery and Mathjax. It might be the cause, but I'm not sure.

I cannot run the node all the time this way.

@burdakovd
Copy link

burdakovd commented Sep 25, 2018

Here is another datapoint. On a machine with 1.7 GB of RAM and 3G of swap, running only IPFS daemon in server mode and nginx, after 4 days we see 1.6 GB of RAM used and 550 MB of swap space used.

             total       used       free     shared    buffers     cached
Mem:       1730344    1654536      75808         36      15120      54728
-/+ buffers/cache:    1584688     145656
Swap:      3014648     616008    2398640

Version is docker image jbenet/go-ipfs:latest, 0.4.17.

image

@klueq
Copy link

klueq commented Sep 26, 2018

Can we have a flag to limit the number of peers and some smart logic to discard bad peers and get good ones? That seems to be the unavoidable reason for high memory footprint.

@dokterbob
Copy link
Contributor

@klueq
Copy link

klueq commented Sep 26, 2018

Cool. Looks like it's already there.

Another suggestion is why not to use UDP? From my limited understanding, all those 800 TCP connections with peers are idle 99% of the time, but they have real memory buffers and other overhead on both sides. Instead, we could send a UDP ping from time to time to check if the peer is online and if we need to transfer some data reliably, we send another UDP message, the peer acks it and we create a temporary TCP channel.

@Kubuxu
Copy link
Member

Kubuxu commented Sep 26, 2018

There is experimental QUIC transport that uses UDP.
Should roll out wider soon-ish.
https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#quic

@whyrusleeping
Copy link
Member Author

The TCP buffers arent the dominating consumer of memory here. Plus, re-establishing a new connection is very much non-trivial.

@Stebalien
Copy link
Member

The memory issue comes from our internal buffers/state. We're constantly working on improving this but it'll take time. (semi related: libp2p/go-libp2p#438)

@rob-deutsch
Copy link
Contributor

rob-deutsch commented Oct 4, 2018

This is something I've been interested in recently (see #5530) and a reality that we have to contend with is that Go is memory hoggish, and is getting moreso (golang/go#23687).

[note: I edited this post a lot as I got closer to the heart of the issue, apologies!]

Go is very reluctant to give any memory back to the OS unless the OS very much needs it.

I think that something that could be considered for go-ipfs is: What does it mean when we say 'idle memory less than 100MB'?

This is an important question because it determines how much work should be done on how go-ipfs uses memory, not just how much memory go-ipfs uses.

Some examples are:

  1. Steady-state memory usage is less than 100MB.
    E.g. loaded binary = 20MB, stacks = 20MB, heap = 60MB
    But the Go runtime will wait until the heap doubles before running the GC. So maybe...
  2. Allocated memory is less than 100MB.
    E.g. loaded binary = 20MB, stacks = 20MB, heap = 30MB, garbage = 30MB.
    But the Go runtime will hold on to some memory after cleaning garbage, to make future alloc's quicker...
  3. Allocated plus alloc'd cache memory is less than 100MB.
    E.g. loaded binary = 20MB, stacks = 20MB, heap = 24MB, garbage = 24MB, alloc'd cache = 12MB.
    But the Go runtime doesn't truly give any memory back to the OS. It just marks it MADV_FREE which the OS doesn't act upon straight away....
  4. Resident memory is less than 100MB.
    E.g. loaded binary = 20MB, stacks = 20MB, heap = 15MB, garbage = 15MB, alloc'd cache = 10MB, MADV_FREE cache = 20MB.

Or, in table form:

Limit on Binary Stacks Heap Garbage Alloc'd Cache MADV_FREE cache Total
Steady-state 20 20 60 60 30 60 250
Allocated 20 20 30 30 15 30 145
Allocated+ 20 20 24 24 12 24 124
Resident 20 20 15 15 10 20 100

@Calmarius
Copy link

I'm running IPFS 0.4.17 on my Raspberry PI (using the low power profile if I remember correctly). When it starts it has low memory usage, But that memory usage slowly climbs up as the hours pass. Within 1-2 days the OOM killer usually kills it. The number of connections are low the "ipfs swarm peers" don't fill the terminal window.

So I think something is leaking there. Or in the Go runtime.

@whyrusleeping
Copy link
Member Author

@Calmarius Try 0.4.18. There has been significant work towards reducing memory usage. Likely not fully resolved, but should be noticeably better.

@dokterbob
Copy link
Contributor

dokterbob commented Nov 27, 2018 via email

@Calmarius
Copy link

Yes! Updated to 0.4.18 and it's running without problems for 2 weeks on my rpi. Great progress indeed!

@voxsoftware
Copy link

0.4.20, using more than 3GB Ram, problem still

@lordcirth
Copy link

0.4.20, using more than 3GB Ram, problem still

Under what load, and after running for how long? Does it cap at 3GB for you, or continue to grow?

@voxsoftware
Copy link

voxsoftware commented May 27, 2019

I am running about 2 days, but really i still not use too much. Continue growring, now is 3.7GB Ram. Under Ubuntu 18.04 package: go-ipfs. I am on a machine with 9.5 GB Ram, and getting killed processed due to go-ipfs

@whyrusleeping
Copy link
Member Author

I really wonder whats causing this. mars (our first bootstrapper node, and arguably the most connected to ipfs node) peaks at just over 4GB, but every time I check, that much memory is not actually in use, its just the go runtime refusing to return the memory to the OS.

@Stebalien can we try running that memory profile dumper thing on some machines? https://gist.github.com/whyrusleeping/b0431561b23a5c1d8b2dfce5526751aa

@voxsoftware
Copy link

Just now was killed the daemon. This makes go-ipfs unusable. Have js-ipfs the same problem?

@Stebalien
Copy link
Member

@voxsoftware try disabling the DHT by running the daemon with ipfs daemon --routing=dhtclient. Also, I'd consider upgrading to the latest RC (go-ipfs 0.4.21-rc3) or just wait for the release (likely tonight or tomorrow).

@voxsoftware
Copy link

I started with --routing=dhtclient. In about 1 hour, now is 850MB memory. I evaluated using ipfs on desktop app, but with this, I see really still unusable for that purpose

@skorokithakis
Copy link

Same here. I'm thinking of having systemd restart the daemon once a day, which is, unfortunately, the last thing I'm going to try before giving up on IPFS altogether...

@lordcirth
Copy link

Same here. I'm thinking of having systemd restart the daemon once a day, which is, unfortunately, the last thing I'm going to try before giving up on IPFS altogether...

I set a memory cap in systemd. The memory pressure keeps it's usage down, and if it still goes over, it gets automatically killed and restarted. Works well enough.

@voxsoftware
Copy link

Same here. I'm thinking of having systemd restart the daemon once a day, which is, unfortunately, the last thing I'm going to try before giving up on IPFS altogether...

I set a memory cap in systemd. The memory pressure keeps it's usage down, and if it still goes over, it gets automatically killed and restarted. Works well enough.

Can you give an example please? An this works ok for Linux backend, but, how about using in a desktop app for example in Windows?

@lordcirth
Copy link

Same here. I'm thinking of having systemd restart the daemon once a day, which is, unfortunately, the last thing I'm going to try before giving up on IPFS altogether...

I set a memory cap in systemd. The memory pressure keeps it's usage down, and if it still goes over, it gets automatically killed and restarted. Works well enough.

Can you give an example please? An this works ok for Linux backend, but, how about using in a desktop app for example in Windows?

Like so:
https://gist.github.com/lordcirth/378ae7c3a8d2786874d00867098cbad1

As for Windows, dunno. Haven't used it much in a long time.

@skorokithakis
Copy link

@lordcirth this is extremely helpful, thank you.

@Stebalien
Copy link
Member

Closing as stale (most of the issues raised here have been addressed, or are recorded in more specific issues).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/deferred Conscious decision to pause or backlog
Projects
No open projects
Development

No branches or pull requests