-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage in seed node on setzling #772
Comments
This suggests that there are a lot of connections, is that the case? |
There's 0 traffic on port 1234. Confirmed that by running tcpdump and sending a dummy handcrafted packet which has been detected. No other traffic has been observed on this port all the while CPU usage is at 100% percent in those 4 threads. It's the same on port 80. Zero traffic until I issue a request myself. |
Are you saying it enters this state while not doing anything? Can this be
reproduced? Is it specific to the environment?
|
It seems so.
It happens by itself several hours after startup.
So far I've only seen it on setzling. What's different about setzling is it's currently running seed compiled in debug mode precisely to aid tracking down issue like this one, although we can have debugging symbols in release mode too with a cargo setting. |
Ah wonderful :/
The stack traces suggest that more than one than one thread is trying to acquire
a write lock on the same dashmap shard. Which I would expect a concurrent map to
be able to handle.
|
Turns out Dashmap is using a homebrewed RwLock with a spin lock which spins
forever. A PR moving to parking_lot instead has not seen any activity since more
than two months.
So I guess we need a proper concurrent map. Any suggestions?
|
That's really surprising given the popularity of the crate. As for suggestions, it really depends on what the requirements on the concurrency semantics are. For instance, is the hashmap being used for blocking on a specific key until another one inserts, or are all the operations required to be lock-free (in the progress guarantees sense, not in the sense of "not using locks"). If the latter, there's evmap for instance. |
Ah wait, this was introduced in 760d310 (net: store the actual connection in
conntrack, 2021-07-29), which makes the same thread try to acquire a lock
already held. Which is verboten with Dashmap. I point out that
https://github.com/radicle-dev/radicle-link/blob/master/librad/src/net/quic/connection/tracking.rs#L221-L223
|
I am wondering btw where the |
|
yaaaaaakkkkkk |
I'm observing what looks like a livelock/high contention on setzling:
radicle-bins repo git sha it's happening on:
d3f366ca0965d80e892d924613a715fc26fc3733
. Granted, it's not master but I've observed the same behavior before, when someone else's branch was deployed there.Note that PIDs on the screenshot above that are readily correlated with thread ids in the stack traces: 29571, 29572, 29573, 29583.
If it helps, strack traces taken yesterday, when a different branch was deployed.
The text was updated successfully, but these errors were encountered: