-
Notifications
You must be signed in to change notification settings - Fork 560
Unstable eth websocket event subscriptions #1745
Comments
Hi, ref. https://github.com/evmos/evmos/releases/tag/v12.1.2 This did not solve the problem |
Public Evmos RPC websocket server is broken. Does not emit all Uniswap2 sync events: Replication: then do a test trade on Diffusion |
I have been able to break the
I had not been subbing to the public RPC lately. So I think you waited too long to try this. If you open up 2-3 multiple concurrent subscriptions the public rpc will break after a few hours |
May I ask if there's event received after reconnect? If yes, it would be subscription issue in client side. If no, it would be websocket issue in server side. |
Upon re-connect while "broken" you can send the eth_subscribe, and you will obtain a receipt subscription id in response. So the connection is working fine. However, after obtaining the receipt id, you will not get any further logs pushed into the channel. If you are working on this, please pm me directly at chris.rossel@gmail.com so that I can help you move quickily to get it solved. I am having production interruptions on every Ethermint chain, whereas no issues on any geth, erigon, moonbeam, polkadot-based nodes. I am sure this is damaging Cosmos defi significantly. |
The lack of events after confirmed subscription id is replicated across multiple listning software, such as direct wscat query, or with Python websockets or Rust tokio tungstenite. I am 100% certain this issue is server side. Remember that this error only occurs after an hour or more of polling. Typically a node restart will help for a period. |
I went to log level 'error' and got these: May 09 13:47:09 v2202211186789206564 cantod[10998]: 1:47PM ERR failed to add event topic to event bus error="topic already registered" api=websocket-server module=websocket-client topic="tm.event='Tx' AND message.module='evm'" May 09 13:54:17 v2202211186789206564 cantod[11161]: 1:54PM ERR failed to add event topic to event bus error="topic already registered" api=websocket-server module=websocket-client topic="tm.event='Tx' AND message.module='evm'" After these errors, no more events were pushed out |
I think you found the correct error message ch := make(chan coretypes.ResultEvent)
es.topicChans[f.event] = ch
if err := es.eventBus.AddTopic(f.event, ch); err != nil {
es.logger.Error("failed to add event topic to event bus", "topic", f.event, "error", err.Error())
} Look, Just few lines of comment, I'll let someone else fix it if my guess is correct. |
Great! Hope that it. I am currently debugging on my Canto node which is running an older version. It seems newheads subscription work while logs do not, so the problem is likely isolated to topic subscriptions |
@CRossel87a May I ask the steps to verify not solved? |
@mmsqe Same steps as described above. However, I think the reason you have not been able to re-produce it is because it seems to happen if you have a program that is attempting to create multiple subscriptions at the same time, such as when an app attempts to subscribe to 3 topics at once over 3 separate connections. I am now attempting to add time between subscriptions to see if this produces stability. Testing a hypothesis that there is some multi-threaded collision |
How about also adding |
Or backup existing (if any) then restore in case of |
I am a bit out of my element with Golang :) Just hacking around |
If channel get changed somewhere when concurrent subscribe happens, your fix should work. I'll try add more test for it |
Maybe I’m misunderstanding the context of the last few messages, but just want to be sure to note that this bug shouldn’t have anything to do with a collision on concurrent subscriptions. Just a single subscribe (with no other connections) will fail after a time. |
@mmsqe @VictorTrustyDev I think we have the solution! Persistent and stable websocket event subscription for two days running |
Nice, how did u fixed it? |
This issue is stale because it has been open 45 days with no activity. Remove |
Hi,
I have re-produced this bug with Evmos, Cronos and Canto clients, and have found Ethermint to be the common factor.
For my app I use multiple long-duration event subscriptions to newheads and topics. It is a consistent and provable problem across multiple versions, multiple binaries, multiple compiles (external, and my own). It occurs both when the client has a dedicated server, and when it is shared with other clients.
When a problem occurs, the clients will accept a new subscription and respond with a subscription id, but then not actually push any events. This problem will typically start occurring after running for 2-6 hours. It also seems like the problem occurs more frequently when event activity on-chain is lower. Re-connecting to the client on every 2minute time-outs do not solve the problem. Only a forced client restart will do it.
One temporary fix I have used is to program a periodic restart of the clients every 2-3h. That seems to "minimize the damage".
Steps to Reproduce
New blocks:
wscat -c ws://127.0.0.1:8546
{"jsonrpc":"2.0","id":1,"method":"eth_subscribe","params":["newHeads"]}
For Uniswap2 Sync events:
wscat -c ws://127.0.0.1:8546
{"jsonrpc":"2.0","id":1,"method":"eth_subscribe","params": ["logs", {"topics": ["0x1c411e9a96e071241c2f21f7726b17ae89e3cab4c78be50e062b03a9fffbbad1"]}]}
The text was updated successfully, but these errors were encountered: