-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stall when fetching locally available content #6386
Comments
Mutex profile might be helpful here:
|
Here's a mutex profile as requested, along with another cpuprof / stacks FYI for these, the problem resolved itself a few minutes after I ran these, so I can't be 100% positive these will help. Hopefully they do. |
Here's a new log set for a different, but similar dataset being retrieved that caused this error: This hash was still bugging out at the time of posting this. |
It should be noted that restarting the node in question fixes this issue instantly (for awhile) |
Some more information, when this bug is encountered, the node can lag / stall. We've had to resort to resetting our nodes when encountering this. |
@magik6k @hannahhoward |
This might be related to the issue in #6442, mind updating to master to see if it got fixed? |
@magik6k this doesn't appear to have helped unfortunately |
Further information that may or may not be of any help: The root directory hash seems to load fine, but once I click in to one of the sub directories, the child directories inside of that won't load. To better explain. The data looks like: Root Directory -> child directories -> grandchild directories -> files. The gateway is getting stuck after we jump into one of the child directories and attempt to jump into one of the grandchild directories. |
Update: We believed this may have been caused by a useless task consuming all the CPU time in bitswap but that doesn't appear to be the case. This can be reproduced on master as of the beginning of July. |
@obo20 have you seen this since? |
Version information:
Description:
ipfs pin verify $HASH
works.curl $LOCAL_GATEWAY/ipfs/$HASH
stalls.Additionally, this dataset includes ~50e6 files, sharded up into multiple sub-directories.
See logs.zip for the stack traces and a CPU profile taken while reproducing the issue.
It looks like there's some kind of live-lock (and maybe a deadlock?) in go-bitswap.
Note: Unfortunately, the stack traces are truncated due to go's 64MiB pprof stack trace limit.
Originally reported by @obo20.
The text was updated successfully, but these errors were encountered: