-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probable goroutine
leak running 1.0.2-beta
#1002
Comments
Also, this node is not exposed publicly to clients, furthermore, it receives little to 0 RPC calls. |
Perfect @cffls, By the way, removing the However,
|
Ok, fix deployed, I'll let you know in a few hours |
I don't see anymore Will report back again later to see how it goes. |
That's a great news. Thank you @maoueh for testing it out! |
It ran full weekend without showing any signs of goroutine leak, the fix was the good one, thanks guys! |
Thanks @maoueh for the update. Will close this issue. |
System information
Bor client version: v1.0.2-beta
Heimdall client version:
OS & Version: Linux
Environment: Polygon Mumbai
Type of node: Full node
Additional Information: Going via
cmd/geth
(see #details for extra provided details)Overview of the problem
At some point in time, the process starts to drift unable to import new chain segments fast enough to keep up with the network. This problem happened yesterday at ~13:00 EST and again this morning at ~06:00 EST.
We have two full nodes syncing, both starts lagging (but a bit differently, timing are not exactly that same, ~1h apart).
Reproduction Steps
Sync with:
Details
Note
So a disclaimer, we are still using
cmd/geth
entrypoint right now for legacy reasons, I don't think the problem is related. But in all cases, I'm going to test usingcmd/cli
as the entrypoint to see if there is a difference today.So, this morning when the problem happened, I took at
pprof
profiles of different element namely the goroutine dump.It appears there is ~200K goroutine active when the node is crawling down to its knee. 50% of them are coming from
reportMetrics
and 50% fromgithub.com/JekaMas/workerpool.(*WorkerPool).dispatch
Probably that both are related together I imagine. Stack for
reportMetrics
:Source:
Logs / Traces / Output / Error Messages
Not attaching anything for now, let me know what you need, I have:
sf_problem_mumbai_0.log
(node's log)sf_problem_mumbai_0_goroutine.pb.gz
sf_problem_mumbai_0_goroutine_full.txt
sf_problem_mumbai_0_heap.pb.gz
sf_problem_mumbai_0_iostat.txt
sf_problem_mumbai_0_profile.pb.gz
The text was updated successfully, but these errors were encountered: