Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Constraints + Limits #1482

Open
jbenet opened this Issue Jul 15, 2015 · 56 comments

Comments

Projects
None yet
@jbenet
Copy link
Member

jbenet commented Jul 15, 2015

We need a number of configurable resource limits. This issue will serve as a meta-issue to track them all and discuss a consistent way to configure/handle them.

I'm going to use a notation like thingA.subthingB.subthingC. we dont have to keep this at all, just helps us bind scoped names to things. (using . instead of / as the . could reflect json hierarchy in the config, but it may not have to (e.g. repo.storage_max and repo.datastore.storage_gc_watermark could be in config as Repo.StorageMax and Repo.StorageGC, or something.).

Possible Limits

This is a list of possible limits. I don't think we need all of them, as other tools could limit this more, particularly in server scenarios. but please keep in mind that some users/use cases of ipfs demand that we have some limits in place ourselves, as many end users cannot be expected to even know what a Terminal is (e.g. if they run ipfs as an elecron-app or as a browser extension).

  • node.repo.storage_max: this affects the physical storage that a repo takes up. this must include all the storage, datastore + config file size (ok to pre-allocate more if neeeded), so that people can set a maximum. (MUST be user configurable) #972
    • node.repo.datastore.storage_max: hard limit on datastore storage size. could be computed as repo.storage_max - configsize where configsize could be live, or could be a reasonable bound. #972
    • node.repo.datastore.storage_gc_watermark: soft limit on datastore storage size. after passing this threshold, automatically run gc. could be computed as node.repo.datastore.storage_max - 1MB or something. #972
  • node.network_bandwidth_max: limit on network bandwidth used.
    • node.gateway.bandwidth_max: limit on bandwidth allocated to running the gateway. this could be calculated from node.network_bandwidth_max - all other bandwidth use. #1070
    • node.swarm.bandwidth_max: limit on network bandiwdth allocated to running the ipfs protocol. this could be calculated from node.network_bandwidth_max - all other bandwidth use.
    • node.dht.bandwidth_max: limit on network bandwidth allocated to running the dht protocol. this could be calculated from node.network_bandwidth_max - all other bandwidth use.
    • node.bitswap.bandwidth_max: limit on network bandwidth allocated to running the bitswap protocol. this could be calculated from node.network_bandwidth_max - all other bandwidth use.
  • node.swarm.connections: soft limit on ipfs protocol network connections to make. the reason for this limit is that there is overhead to every connections kept alive. the node could try to stay within this limit.
  • node.gateway.ratelimit: a number of requests per second. with this limit, the user could reduce the accept load on the gateway. #1070
  • node.memlimit: a limit on the memory allocated to ipfs. could try to use smaller buffers if under different constraints. this is hard to do, prob wont be used end-user-side, and likely easier to do with tools around it sysadmin-side (docker, etc).

note on config: the above keys need not be the config keys, but we should figure out some keys that make sense hierarchically.

What other things are we interested in limiting?

@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Jul 15, 2015

The most pressing are:

  • node.repo.storage_max
  • node.network_bandwidth_max
@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Jul 15, 2015

@rht would this be an issue you could work on? it's needed sooner than later. particularly node.repo.storage_max (+ running GC if we get close to it) and node.network_bandwidth_max.

@whyrusleeping your help will be needed no matter who implements this.

@whyrusleeping

This comment has been minimized.

Copy link
Member

whyrusleeping commented Jul 15, 2015

@jbenet yeap. My concern is that before we even think about configurable limits and such, we need to determine how the system behaves when you are out of a certain resource, whether thats open connections, disk space, or memory. Once we determine how a limit will be manifest in the application, we can start setting those limits.

@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Jul 15, 2015

We already know how some of those would behave, for example, disk. Trigger gc after a threshold, and stop accepting blocks after the limit.


Sent from Mailbox

On Wed, Jul 15, 2015 at 12:36 PM, Jeromy Johnson notifications@github.com
wrote:

@jbenet yeap. My concern is that before we even think about configurable limits and such, we need to determine how the system behaves when you are out of a certain resource, whether thats open connections, disk space, or memory. Once we determine how a limit will be manifest in the application, we can start setting those limits.

Reply to this email directly or view it on GitHub:
#1482 (comment)

@whyrusleeping

This comment has been minimized.

Copy link
Member

whyrusleeping commented Jul 15, 2015

okay, when we stop accepting blocks, how does that affect the user? Do we just start returning 'error disk full' up the stack everywhere? (probably)

@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Jul 16, 2015

yeah, it's a write error. same would happen if the OS's disk got full.

On Wed, Jul 15, 2015 at 1:12 PM, Jeromy Johnson notifications@github.com
wrote:

okay, when we stop accepting blocks, how does that affect the user? Do we
just start returning 'error disk full' up the stack everywhere? (probably)


Reply to this email directly or view it on GitHub
#1482 (comment).

@rht rht self-assigned this Jul 16, 2015

@jbenet jbenet referenced this issue Jul 27, 2015

Closed

Sprint Q #23

15 of 43 tasks complete

@jbenet jbenet referenced this issue Aug 3, 2015

Closed

Sprint R - July 27 #24

20 of 35 tasks complete
@davidar

This comment has been minimized.

Copy link
Member

davidar commented Sep 14, 2015

👍 the daemon keeps consuming my meager ADSL upload bandwidth

@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Sep 14, 2015

These are a big deal, we should get back on these.

@slothbag

This comment has been minimized.

Copy link

slothbag commented Nov 8, 2015

My VPS runs out of RAM pretty quickly with IPFS consuming 80% of it (this is not adding, just idling).. other daemons start to shut down due to out of memory.

Granted my VPS has only 128 or 256mb (cant remember which), but still, I would think its possible to seed some content with minimal resources.

@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Nov 10, 2015

agreed. we should start adding memory constraints as tests for long running nodes to ipfs

@rht

This comment has been minimized.

Copy link
Contributor

rht commented Nov 24, 2015

Update here:

  • Datastore.StorageMax, Datastore.StorageGCWatermark has been implemented. However, it'd say it would consume much less resource to simply calculate / keep track of number of hashes stored in datastore.
  • For network bandwidth, I haven't found a battle-scarred rate limiting lib to use (there are plenty, but haven't reviewed them), but meanwhile I propose that unit-less constraint can be implemented with golang.org/x/net/netutil, to limit the number of simultaneous connection to the http api/gateway.
  • swarm bandwidth has been indirectly constrained through the fd limit
    const concurrentFdDials = 160
    -- if this fdconstraint doesn't exist, does limiting the number of swarms indirectly limits number of fd dials, @whyrusleeping ? If so, it is more intuitive to just limit the swarms, and expose this to config.
  • memory. I don't need to run the ipfsnode long enough to require double C-c to kill it (is this an evidence of zombie goroutines?). More systematic mem leak reports would open path here.
@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Nov 30, 2015

Thanks for update @rht

Re limits, i think people will mostly want to set hard BW caps in explicit KB/s.

@SCBuergel

This comment has been minimized.

Copy link

SCBuergel commented Dec 15, 2015

What other things are we interested in limiting?

I just randomly found this discussion while trying to limit the overall output traffic (per day / month). I think limiting output traffic could be an interesting thing (especially with respect to file coin one day) as egress traffic is typically limited in cloud settings like AWS or Azure. There I am fine with temporary spikes of high bandwidth as long as my output traffic stays within some bounds per unit of time. Setting a limit per hour / day / month might make sense to prevent from blowing a months volume in a day / hour.

@PlanetPlan

This comment has been minimized.

Copy link

PlanetPlan commented Jan 17, 2016

Hi, thanks very much for IPFS.

I did not carefully read the above, so some of the following may be duplicates. This is all long-term things to think about, nothing that is a headache for me right now. The following are some usage models that may suggest features for controlling resources:

  • My normal network connection is slow by many standards. When I am using the network interactively, I'd like IPFS to avoid/reduce background traffic, though stills serve my foreground file requests at full bandwidth. When I am idle (not interactive), I'd like IPFS to ramp up network usage so my system can be a friendly member of the caching/serving community.
  • A similar comment applies to IPFS disk bandwidth and CPU usage: back off when I am interactive, use freely when I am idle.
  • I want actual files to be cached someplace other than ~/.ipfs so they are not part of my backup state.
  • On a laptop, I have some network connections that are pay-per-byte. I'd like to leave IPFS enabled so I can use it, but I'd like to be an "unfriendly" member of the community because network traffic costs are quite high. Conversely, when I am on a fast/cheap network, I'd like to build up "credit" so I get good service when I am on a high-price network and being "momentarily unfriendly".
  • A similar comment applies to removable media: I have limited built-in storage on a laptop and so often plug in a removable drive when relatively "stationary". It would be useful to have both a "for sure" area for IPFS on the built-in drive plus an "optional" area on removable drives.
@clownfeces

This comment has been minimized.

Copy link

clownfeces commented Jul 2, 2016

For vpn users, being able to limit the maximum number of connections is a very important feature, since many vpns automatically disconnect you if you have to many open connections (it's probably some sort of protection to fight spammers and ddosers). IPFS by default creates hundreds of connections, so its barely usable, unless you don't care if you regularly get disconnected.

@davidak

This comment has been minimized.

Copy link

davidak commented Aug 6, 2016

I want to report some resource usage stats:

I have an ipfs node version 0.4.2 running on a VM with 1 core and 1 GB RAM. No files added or pinned!

bildschirmfoto 2016-08-06 um 18 00 41
bildschirmfoto 2016-08-06 um 18 01 09
It uses 465 MB RAM just to keep connections to 214 peers open. (are that all running nodes?)

@Kubuxu

This comment has been minimized.

Copy link
Member

Kubuxu commented Aug 7, 2016

It means that it is directly in connection with 214 peers, those are live nodes in the network, we might want to start limiting that. Deluge (torrent client) by default allows for 200 connections and only 50 active at the time, but it uses utp which we were unable to do successfully due to utp lib for Go hanging.

@davidak is that netdata collector for IPFS? Looks nice, have you published it somewhere?

@davidak

This comment has been minimized.

Copy link

davidak commented Aug 7, 2016

@Kubuxu the IPFS netdata plugin just got merged some minutes ago ;)

netdata/netdata#761

@fiatjaf

This comment has been minimized.

Copy link

fiatjaf commented Aug 8, 2016

What bothers me is the network usage:

Makes even ssh'ing to my VPS horribly slow.

@slothbag

This comment has been minimized.

Copy link

slothbag commented Aug 8, 2016

I've had some luck using linux "tc" command to throttle IPFS down to about 10KB/s outbound.. this has the side-effect of dropping incoming down to about 15-20KB/s

I can see IPFS is using 100% of its allocated 10KB/s all day every day, but at least I can calculate how much bandwidth that is per month to ensure I don't go over my quotas.

And a nice bonus is it significantly reduces memory usage, which is now hovering around 50-100Mb.

@jbenet

This comment has been minimized.

Copy link
Member Author

jbenet commented Aug 8, 2016

@slothbag does it work in that condition?
On Mon, Aug 8, 2016 at 19:24 slothbag notifications@github.com wrote:

I've had some luck using linux "tc" command to throttle IPFS down to about
10KB/s outbound.. this has the side-effect of dropping incoming down to
about 15-20KB/s

I can see IPFS is using 100% of its allocated 10KB/s all day every day,
but at least I can calculate how much bandwidth that is per month and
ensure I don't go over my quotas.

And a nice bonus is it significantly reduces memory usage, which is now
hovering around 50-100Mb.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1482 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoVvIbJott0tMhgPRIq8P8Kv9pdA3ks5qd7q9gaJpZM4FZWAT
.

@zekesonxx

This comment has been minimized.

Copy link

zekesonxx commented Sep 21, 2016

@whyrusleeping I have awful Internet: 130 KiB/s peak down and 20 KiB/s peak up (mutually exclusive). I think IPFS is massively interesting, but any idle bandwidth usage is not acceptable for my connection.

@whyrusleeping

This comment has been minimized.

Copy link
Member

whyrusleeping commented Sep 21, 2016

@zekesonxx Yeah, having a connection like that makes it difficult to properly utilize ipfs right now due to our use of the dht for content routing. The vast majority of the idle traffic comes from the dht, participating in the dht means that you are helping store routing information and peer information for the network and also responding to lookup requests of that information. As i mention in my previous comment we are working on future solutions that will allow nodes to be a part of the ipfs network without having to run a dht.

@loadletter

This comment has been minimized.

Copy link

loadletter commented Oct 1, 2016

@whyrusleeping Wouldn't it be possible to run the dht over plain, connectionless, unreliable UDP, with a compact format/compression and only use tcp for transfers/fallback, something like /ip4/0.0.0.0/udp/4001/dht ?
Both eMule/Kad and bittorrent dht manage to work with very little overhead (running eMule on a 56k connection with thousands of files and 2K+ dht entries comes to mind)

@whyrusleeping

This comment has been minimized.

Copy link
Member

whyrusleeping commented Oct 1, 2016

@loadletter Yeah, using udp for the dht is something I want to try at some point, but its not a magic bullet. The benefit of UDP is that we really don't care too strongly if packets get dropped, but what we lose out on from tcp (or similar) is the congestion control. If we switch the dht to udp theres a likely chance it would actually make things worse. When comparing ipfs to bittorrents mainline DHT (or any other form of DHT) the primary difference is that ipfs provides random access to subsets of files. Where with torrents and similar systems, its more of an 'all or none', you're part of this torrent and/or part of that torrent, pieces of one torrent don't ever get shared with another. Some of the advantages here are that it becomes much easier to view and work with subsets of large datasets, share and help pin only the pieces you're interested, and also if some number of different datasets share segments of data, users who have either dataset can serve that data. The cost for that is that we need to do a good chunk more content routing to accomplish it, using the DHT for content routing the same way that other systems do just isnt long term scalable, which is why we're researching newer ways to provide this 'random access' without (or with less) dht traffic (with with no dht at all).

In the short term though, I have just merged a new still very experimental feature that allows your node to choose not to serve dht traffic, while still being able to make requests. To try this out, build latest master from source and run the daemon with ipfs daemon --routing=dhtclient. And please report any issues you have with running it in that mode.

@loadletter

This comment has been minimized.

Copy link

loadletter commented Oct 1, 2016

Using dhtclient the idle bandwidth usage does seem to decrease, tough it picks up again for a few minutes after retrieving some files from another node.

Also when running normal dht or during spikes with dhtclient the bandwidth usage looks pretty symmetrical.

@mib-kd743naq

This comment has been minimized.

Copy link
Contributor

mib-kd743naq commented Dec 5, 2016

 node.repo.storage_max: this affects the physical storage that a repo takes up. this must include all the storage, datastore + config file size (ok to pre-allocate more if neeeded), so that people can set a maximum. (MUST be user configurable)

I think given #3444 one also needs to add a config for maximum data entries. Without such a limit it is trivial to hard-DoS a node by simply asking it to get a DAG with 1 million 1-byte raw data nodes.

/cc @matthiasbeyer @Kubuxu

@lgierth

This comment has been minimized.

Copy link
Member

lgierth commented Dec 18, 2016

Small note, I unchecked the storage_max todo in this thread's root comment -- there is a Datastore.StorageMax option, but it's currently only taken into account with regard to GC. It doesn't currently set a hard limit on storage usage.

@pataquets

This comment has been minimized.

Copy link

pataquets commented Aug 28, 2017

Where applicable, different bw limits for pinned items would be a nice feature to have. Users might be more inclined to providing bandwidth for files they find important enough to pin.

@ajbouh

This comment has been minimized.

Copy link

ajbouh commented Aug 28, 2017

@dokterbob

This comment has been minimized.

Copy link
Contributor

dokterbob commented Sep 13, 2017

+1 for node.memlimit

Although @jbenet suggests we can have this done on a higher level, a long-running actively used IPFS daemon will currently eat all memory available on a system which basically means that, without memory constraints it will not be stable.

Obviously, the memory footprint (#3318) could be reduced but given that the project moves forward very fast feature wise, there will be new kinds of memory waste popping up.

@haasn

This comment has been minimized.

Copy link

haasn commented Nov 1, 2017

ipfs for me has several hundreds of open connections, which triggers a number of warning mechanisms including TCP resets/s (many dozens) and makes it look like a network scan.

Connecting to this many peers seems insane for a p2p network. Being able to limit this would be a high priority for me.

@whyrusleeping

This comment has been minimized.

Copy link
Member

whyrusleeping commented Nov 1, 2017

@gwpl

This comment has been minimized.

Copy link

gwpl commented Jan 17, 2018

I need also limit for maximum open files! (causes: #4589 )

@KrzysiekJ

This comment has been minimized.

Copy link

KrzysiekJ commented Jan 23, 2018

@whyrusleeping: go-ipfs v0.4.13 still maintains several hundreds of open connections.

@whyrusleeping

This comment has been minimized.

Copy link
Member

whyrusleeping commented Jan 23, 2018

@KrzysiekJ Yeah, DHTs need to maintain a decent number of open connections for proper functioning. You can tweak it lower in your configuration file, Look for Swarm.ConnMgr

@EternityForest

This comment has been minimized.

Copy link

EternityForest commented Feb 5, 2018

Does the DHT actually need to maintain large numbers of connections to work? It seems like you need to know the locations of a good number of DHT peers, but why actually connect to them?

Can't we just keep a list of a few thousand peers, and figure out if they're still up if/when they're needed?

Connectionless DHT queries should only take 1 UDP round trip per hop if you don't use a handshake or encryption, and it's not like you can't monitor someone pretty easily as is(Connect to them, and watch their wantlist broadcasts).

Congestion doesn't seem like it should be that much of an issue, especially if you limit retries, If they aren't there after 3 or 4 attempts, you just assume they aren't online anymore and try a different path.

An advantage of connectionless is that you can potentially store the last known IP of millions of nodes, meaning most of the network can be within 2 or 3 hops.

That has the issue of concentrating traffic on a few nodes for popular content, but I suspect there's ways of managing that.

@Stebalien

This comment has been minimized.

Copy link
Contributor

Stebalien commented Feb 7, 2018

Does the DHT actually need to maintain large numbers of connections to work? It seems like you need to know the locations of a good number of DHT peers, but why actually connect to them?

Correct. Unfortunately, we don't have any working UDP based protocols at the moment anyways. However, we're working on supporting QUIC. While this wouldn't be a connection-less protocol, connections won't take up file descriptors and we can save memory/bandwidth by "suspending" unused connections (remember the connection's session information but otherwise go silent).

In the future, we'd like a real packet transport system but we aren't there yet. The tricky part will be getting the abstractions right will take a bit of work because we try to make all parts of the IPFS/libp2p stack pluggable.

Connectionless DHT queries should only take 1 UDP round trip per hop if you don't use a handshake or encryption, and it's not like you can't monitor someone pretty easily as is(Connect to them, and watch their wantlist broadcasts).

The encryption isn't just about monitoring, it also prevents middle boxes from being "smart". However, as we generally don't care about replay or perfect forward secrecy for DHT messages, we may be able to encrypt these requests without creating a connection (although that gets expensive if we send more than one message). Again, the tricky part will be getting the abstractions correct (and, in this case, not creating a security footgun).

An advantage of connectionless is that you can potentially store the last known IP of millions of nodes, meaning most of the network can be within 2 or 3 hops.

Unfortunately, IPFS nodes tend to go offline/online all the time. Having connections open helps us keep track of which ones are online. However, the solution here is to just not have flaky nodes act as DHT nodes.

@andrewchambers

This comment has been minimized.

Copy link

andrewchambers commented Apr 9, 2018

FWIW: Many operating systems provide facilities for limiting all of those things e.g. consider using linux containers and separate disk partitions. It is then up to ipfs to just handle error conditions returned by the OS properly.

@dokterbob

This comment has been minimized.

Copy link
Contributor

dokterbob commented Apr 9, 2018

@Macil

This comment has been minimized.

Copy link

Macil commented Apr 9, 2018

If you make the OS / docker limit the memory that ipfs uses, then will ipfs be careful to use less than that amount? If not, ipfs might just keep charging headfirst into the limit and get regularly killed/restarted by the system.

@dokterbob

This comment has been minimized.

Copy link
Contributor

dokterbob commented Apr 9, 2018

@Kubuxu

This comment has been minimized.

Copy link
Member

Kubuxu commented Apr 10, 2018

We would hard limit the amount of used memory if Golang allowed for it but it does not.
This means we can only chase bugs and try to fix them to limit the memory usage.

@Macil

This comment has been minimized.

Copy link

Macil commented Apr 10, 2018

I don't want limits in order to limit the impact of bugs; I'm worried about limiting the amount of memory that ipfs uses under arbitrarily high load. I want to do things like set ipfs to refuse or queue new connections if it's processing too many right now, etc.

@Kubuxu

This comment has been minimized.

Copy link
Member

Kubuxu commented Apr 11, 2018

@AgentME this isn't a problem right now. Currently, AFAIK, most memory issues are due to bugs.

@CocoonCrash

This comment has been minimized.

Copy link

CocoonCrash commented Jun 5, 2018

Bugs happen and no one should only rely on the fact no problems will happen once known ones are corrected. I think most limitations as mentioned by @jbenet are necessary as is a seatbelt while driving.

Golang can't have ressource consumption limits set, but "breathing sleep" of some milliseconds can get coded for an end user not to "loose control" of its device for example. And/or the number of effective TCP connections/used bandwidth could also be limited as those are part of the software design.

My personal understanding about this is that one of the numerous goals of IPFS is efficiency, so consuming a lot of ressources (cpu, memory, bandwich) on edges while in idle mode is not an option as it could be seen as an "uncontrolled" software. Would you want a computer knowing that if connected to internet it couldn't get used as it's ensuring everything is working well? Remember me antivirus running on Windows years ago.

I'm far from an IPFS/Libp2p expert, but maybe each node could implement a pub/sub like scheme to open only one connection to listen for heartbeats sent from other nodes referencing it. And when a node's heartbeat is missing for too long it could trigger the DHT routing table to be renewed the regular TCP way. That would be a compromise between UDP/TCP as discussed by @loadletter and @whyrusleeping earlier.

This could also be used to optimise/adapt routing as it could offer a pseudo-latency or workload/availability shared monitoring between nodes, even if I think libp2p already implement many close things as of nodes auto discovering on a common network or that IPFS intends to work even if part of the network get splited in subnetworks etc...

I really hope this will get improved as I think it currently is an adoption barrier. IPFS is a really great and promising thing, and I really thank every designer/contributor for all the work done, but I also would really love seeing it spreading to the whole universe ;)

@theduke

This comment has been minimized.

Copy link

theduke commented Oct 5, 2018

As a note of reference, I had problems with ipfs-daemon consistently killing my WiFi connection after a few minutes. I had to disconnect and reconnect manually. (OS: Arch Linux + NetworkManager).

After limiting the maximum connections to 300 (with Swarm.ConnMgr.HighWater).

It works fine now, but this is really bad for the average user where they might just not understand why their internet is suddenly so slow or not working correctly.

The default setup should be very conservative with resources used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.