feat: add error log when resource manager throttles crawler #772

guseggert · 2022-05-18T17:14:30Z

Since this could leave the Accelerated DHT client in a degraded state,
due to not being able to completely populate its routing table, we
want to signal this to the user until we can modify the client to
degrade more gracefully when hitting resource limits.

Note that this logs only once per crawl, to avoid spamming the user.

aschmahmann

Seems reasonable to me, don't have time to test this out at the moment (somewhat necessary given the abysmal test coverage of the accelerated DHT client 😬). Should be able to later today or tomorrow.

If you want another set of hands to run tests to see how helpful this is at finding the limits @Jorropo may be free.

aschmahmann · 2022-05-18T18:15:50Z

fullrt/dht.go

+			func(p peer.ID, err error) {
+				if errors.Is(err, network.ErrResourceLimitExceeded) {
+					limitErrOnce.Do(func() {
+						logger.Errorf("Accelerated DHT client was unable to fully refresh its routing table due to Resource Manager limits, which may degrade content routing. Consider increasing resource limits. See debug logs for details.")


How does it look when testing this out within go-ipfs? Have you been able to use this mechanism to discover what we should raise the default limits to such that this should not trigger for people?

See debug logs for details.

What types of logs are you expecting people are going to examine? For example, do they need to look at the dht-crawler logs rather than the fullrt ones, do they need to look at rcmgr logs, etc.

I'm not sure there's enough info here for a user to "figure it out" more than just filing an issue on this repo. If we need to add text to the README or put a link to a wiki for this repo we can do that, otherwise if there's some simple text we can use that

How does it look when testing this out within go-ipfs? Have you been able to use this mechanism to discover what we should raise the default limits to such that this should not trigger for people?

So the logs look like this (next to ipfs/kubo#8980):

2022-05-18T11:34:24.234-0400 ERROR fullrtdht fullrt/dht.go:304 Accelerated DHT client was unable to fully refresh its routing table due to Resource Manager limits, which may degrade content routing. Consider increasing resource limits. See debug logs for details. 2022-05-18T11:34:32.201-0400 ERROR resourcemanager libp2p/rcmgr_logging.go:51 Resource limits were exceeded 55405 times, consider inspecting logs and raising the resource manager limits.

There aren't details about specifically which kind of limit was hit (outbound conns, streams, mem, FDs), and at which scope, so that error alone is not enough to know what limits need to be raised. I can add more brains to ipfs/kubo#8980 to try to come up with some recommendation but that will take some time, since we'd need to register DHT interactions w/ libp2p as a proper resource manager "system". I figured we could release it like this and see how frequently this happens (with raised default limits).

What types of logs are you expecting people are going to examine? For example, do they need to look at the dht-crawler logs rather than the fullrt ones, do they need to look at rcmgr logs, etc.

For these specific errors related to refreshing the accelerated DHT client's routing table, yeah it's just dht-crawler. I can add that into the error message, since this is specific to that case.

The accelerated DHT client more broadly uses at least three logging subsystems (dht-crawler, fullrtdht, dht) where RM errors might surface.

Since this could leave the Accelerated DHT client in a degraded state, due to not being able to completely populate its routing table, we want to signal this to the user until we can modify the client to degrade more gracefully when hitting resource limits. Note that this logs only once per crawl, to avoid spamming the user.

guseggert requested a review from aschmahmann May 18, 2022 17:14

aschmahmann reviewed May 18, 2022

View reviewed changes

guseggert force-pushed the feat/rcmgr-logging branch from 3a1113f to b874f6b Compare May 18, 2022 19:41

aschmahmann mentioned this pull request May 19, 2022

fix: adjust rcmgr limits for accelerated DHT client rt refresh ipfs/kubo#8982

Merged

BigLep assigned guseggert May 19, 2022

BigLep mentioned this pull request May 24, 2022

AcceleratedDHTClient needs more resources than the ResourceManager has by default ipfs/kubo#8945

Closed

3 tasks

guseggert merged commit 5918da9 into master May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add error log when resource manager throttles crawler #772

feat: add error log when resource manager throttles crawler #772

guseggert commented May 18, 2022

aschmahmann left a comment

aschmahmann May 18, 2022

guseggert May 18, 2022

feat: add error log when resource manager throttles crawler #772

feat: add error log when resource manager throttles crawler #772

Conversation

guseggert commented May 18, 2022

aschmahmann left a comment

Choose a reason for hiding this comment

aschmahmann May 18, 2022

Choose a reason for hiding this comment

guseggert May 18, 2022

Choose a reason for hiding this comment