Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Websocket protocol errors cause full node halt/stall/crash #3124

Closed
drewstone opened this issue Jul 16, 2019 · 16 comments
Closed

Websocket protocol errors cause full node halt/stall/crash #3124

drewstone opened this issue Jul 16, 2019 · 16 comments

Comments

@drewstone
Copy link
Contributor

We're running a custom runtime against master from ~July 14th. Some nodes are configured with rpc-cors "*". Oftentimes, the following happens which requires a full restart of the node. The node doesn't respond to ctrl + c, it must be shut down externally (closing the window, etc.)

2019-07-15 09:24:13 Imported #12867 (0x8bde…c736)
2019-07-15 09:24:14 Idle (14 peers), best: #12867 (0x8bde…c736), finalized #6819 (0x937e…4184), ⬇ 23.8kiB/s ⬆ 20.1kiB/s
2019-07-15 09:24:17 Discovered new external address for our node: /ip4/10.12.2.10/tcp/30333/p2p/QmRuqvRv3Uudf81vXtSmVfD6bwWTgmjwLd4PG6PSh2Q2oP
2019-07-15 09:24:19 Idle (14 peers), best: #12867 (0x8bde…c736), finalized #6819 (0x937e…4184), ⬇ 17.9kiB/s ⬆ 20.3kiB/s
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
@drewstone
Copy link
Contributor Author

And from another node

2019-07-15 14:36:49 Imported #15991 (0x70cb…4e59)
2019-07-15 14:36:49 Idle (10 peers), best: #15991 (0x70cb…4e59), finalized #6819 (0x937e…4184), ⬇ 12.7kiB/s ⬆ 14.0kiB/s
2019-07-15 14:36:52 Node PeerId("QmYKdhwso9p8iLwxWTm7DpXn8Ejo49VziaCnJFfWg2jpzM") has reported more than 30 addresses; it is identified by "/substrate/1.0" and "cennznet-node/v0.9.24-beta-13d4ed5-x86_64-windows-msvc (unknown)"
2019-07-15 14:36:54 Idle (10 peers), best: #15991 (0x70cb…4e59), finalized #6819 (0x937e…4184), ⬇ 15.5kiB/s ⬆ 15.4kiB/s
2019-07-15 14:36:55 Imported #15992 (0x21e4…bdf4)
2019-07-15 14:36:59 Idle (10 peers), best: #15992 (0x21e4…bdf4), finalized #6819 (0x937e…4184), ⬇ 12.8kiB/s ⬆ 14.1kiB/s
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.

@tomusdrw
Copy link
Contributor

If it's reproducible it would be really helpful to gather logs with -lws=trace

@drewstone
Copy link
Contributor Author

drewstone commented Jul 16, 2019

Ok, will try with that from now on @tomusdrw

EDIT: @tomusdrw, it happened again and there were no extra logs (not even the WSS errors in trace mode).

@drewstone
Copy link
Contributor Author

drewstone commented Jul 19, 2019

@tomusdrw here are detailed logs from trace. Let me know if you want me to post more, but I think this is just being replicated from a downed node.

https://gist.github.com/drewstone/6b53510604755f18443880a6b6cbba88

EDIT: The node is only outputting info of the sort contained in the gist, it has dropped off of the network.

@bkchr
Copy link
Member

bkchr commented Jul 22, 2019

@tomaka maybe any of your future rework that could lead to this problem?

@tomaka
Copy link
Contributor

tomaka commented Jul 22, 2019

@bkchr The RPC runs on separate threads that aren't covered by any of my rework. The logic of these threads is entirely contained within the jsonrpc-ws-server crate.

@tomusdrw
Copy link
Contributor

@drewstone unfortunately these logs don't really tell me anything cause I can't correlate them with any other activity happening on that node (is it pre-crash, right after crash, etc). Could you please send a complete log that shows both ws trace, the WebSocket Protocol Error: Unable to parse WebSocket key. and then regular substrate logs?

@bkchr I also don't think it's related. This error seems purely sending an incorrect WS upgrade request we should simply be rejected, but I'm really interested why it interferes with regular node operations.

@drewstone
Copy link
Contributor Author

@tomusdrw Here are more extensive logs, my tmux has clipped it though so I cannot see how it originates. On the topic, the node never crashes per se, but it does drop connection from the rest of the network.. instead the node is just stuck outputting these logs.

https://gist.github.com/drewstone/32432cd67a0e55f8ea4a8707e8170105

@tomusdrw
Copy link
Contributor

So the error doesn't seem to be related:

2019-07-22 08:46:01.896  TRACE ws::connection  Handshake request received:
POST /wp-cron.php?doing_wp_cron=1563785161.8742790222167968750000 HTTP/1.1
Connection: upgrade
Host: 127.0.0.1:9944
Content-Length: 0
User-Agent: WordPress/5.1.1; https://www.prontogc.com.br
Accept: */*
Accept-Encoding: deflate, gzip
Referer: https://www.prontogc.com.br/wp-cron.php?doing_wp_cron=1563785161.8742790222167968750000
Content-Type: application/x-www-form-urlencoded


2019-07-22 08:46:01.897  TRACE ws::io  Encountered error while reading: WebSocket Protocol Error: Unable to parse WebSocket key.
2019-07-22 08:46:01.897  ERROR ws::handler  WS Error <Protocol>: Unable to parse WebSocket key.
2019-07-22 08:46:01.897  TRACE ws::io  Scheduling connection to 127.0.0.1:39100 as Writable

It seems like some kind of automated wordpress exploiter script that tries to connect to your WS server, but the server correctly closes the connection and the server keeps operating after that afaict.

Can you post -lrpc=trace logs to see what kind of queries are being invoked on your node and what are the timinings? Maybe there is some expensive query / lock contention that causes the node to halt?

@drewstone
Copy link
Contributor Author

Yep, will report back once it halts again.

@gterzian
Copy link
Contributor

gterzian commented Jul 25, 2019

One way to look at this, which isn't going to solve this particular problem but might still be worth thinking about, is that we could try to run different components in their own process, so that crashes of one don't bring down the entire node.

There is precedence for this in ongoing refactorings in web browsers, see for example: https://www.chromium.org/servicification

@0xthreebody
Copy link

0xthreebody commented Dec 3, 2019

I got the same error. When I use the Load Balance to configuration the https(wss):9944 -> http(ws):9944.

The substrate node got:

Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.

@tomusdrw
Copy link
Contributor

tomusdrw commented Dec 3, 2019

@jiangfuyao This error is not critical - it just means that an unexpected request has been made to the WebSockets server (for instance a HTTP post). It's just a regular node operation.

Did your node stop working after that? If not then it's not related to this issue. Also I'm going to close this as it seems it has been 6 months without a halt ("Yep, will report back once it halts again."). Please re-open if the issue happens again.

@tomusdrw tomusdrw closed this as completed Dec 3, 2019
@0xthreebody
Copy link

0xthreebody commented Dec 3, 2019

The node is work, but the wss can't connect. And stop the Load Balance service, the error will disappear.

@tomusdrw
Copy link
Contributor

tomusdrw commented Dec 3, 2019

@jiangfuyao How is this service working? Maybe it expects some liveness endpoint to be open on the WS server? It tries to ping that endpoint, fails (you get the error message), and then assumes the service is dead, hence it doesn't work.

I think we are running nginx reverse proxy with ssl-termination without issues for substrate, so that's perhaps some misconfiguration.

@JoshOrndorff
Copy link
Contributor

For anyone who lands here in the future.

I got this websocket error (but not a node crash or anything like that) when I was using curl to submit RPC requests to the ws port (9944 by default) rather than the correct http port (9933 by default).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants