Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket protocol errors cause full node halt/stall/crash #3124

Open
drewstone opened this issue Jul 16, 2019 · 11 comments

Comments

@drewstone
Copy link
Contributor

commented Jul 16, 2019

We're running a custom runtime against master from ~July 14th. Some nodes are configured with rpc-cors "*". Oftentimes, the following happens which requires a full restart of the node. The node doesn't respond to ctrl + c, it must be shut down externally (closing the window, etc.)

2019-07-15 09:24:13 Imported #12867 (0x8bde…c736)
2019-07-15 09:24:14 Idle (14 peers), best: #12867 (0x8bde…c736), finalized #6819 (0x937e…4184), ⬇ 23.8kiB/s ⬆ 20.1kiB/s
2019-07-15 09:24:17 Discovered new external address for our node: /ip4/10.12.2.10/tcp/30333/p2p/QmRuqvRv3Uudf81vXtSmVfD6bwWTgmjwLd4PG6PSh2Q2oP
2019-07-15 09:24:19 Idle (14 peers), best: #12867 (0x8bde…c736), finalized #6819 (0x937e…4184), ⬇ 17.9kiB/s ⬆ 20.3kiB/s
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
@drewstone

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

And from another node

2019-07-15 14:36:49 Imported #15991 (0x70cb…4e59)
2019-07-15 14:36:49 Idle (10 peers), best: #15991 (0x70cb…4e59), finalized #6819 (0x937e…4184), ⬇ 12.7kiB/s ⬆ 14.0kiB/s
2019-07-15 14:36:52 Node PeerId("QmYKdhwso9p8iLwxWTm7DpXn8Ejo49VziaCnJFfWg2jpzM") has reported more than 30 addresses; it is identified by "/substrate/1.0" and "cennznet-node/v0.9.24-beta-13d4ed5-x86_64-windows-msvc (unknown)"
2019-07-15 14:36:54 Idle (10 peers), best: #15991 (0x70cb…4e59), finalized #6819 (0x937e…4184), ⬇ 15.5kiB/s ⬆ 15.4kiB/s
2019-07-15 14:36:55 Imported #15992 (0x21e4…bdf4)
2019-07-15 14:36:59 Idle (10 peers), best: #15992 (0x21e4…bdf4), finalized #6819 (0x937e…4184), ⬇ 12.8kiB/s ⬆ 14.1kiB/s
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
Encountered an error: WebSocket Protocol Error: Unable to parse WebSocket key.
Enable a logger to see more information.
@tomusdrw

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

If it's reproducible it would be really helpful to gather logs with -lws=trace

@drewstone

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

Ok, will try with that from now on @tomusdrw

EDIT: @tomusdrw, it happened again and there were no extra logs (not even the WSS errors in trace mode).

@drewstone

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2019

@tomusdrw here are detailed logs from trace. Let me know if you want me to post more, but I think this is just being replicated from a downed node.

https://gist.github.com/drewstone/6b53510604755f18443880a6b6cbba88

EDIT: The node is only outputting info of the sort contained in the gist, it has dropped off of the network.

@bkchr

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@tomaka maybe any of your future rework that could lead to this problem?

@tomaka

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

@bkchr The RPC runs on separate threads that aren't covered by any of my rework. The logic of these threads is entirely contained within the jsonrpc-ws-server crate.

@tomusdrw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@drewstone unfortunately these logs don't really tell me anything cause I can't correlate them with any other activity happening on that node (is it pre-crash, right after crash, etc). Could you please send a complete log that shows both ws trace, the WebSocket Protocol Error: Unable to parse WebSocket key. and then regular substrate logs?

@bkchr I also don't think it's related. This error seems purely sending an incorrect WS upgrade request we should simply be rejected, but I'm really interested why it interferes with regular node operations.

@drewstone

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

@tomusdrw Here are more extensive logs, my tmux has clipped it though so I cannot see how it originates. On the topic, the node never crashes per se, but it does drop connection from the rest of the network.. instead the node is just stuck outputting these logs.

https://gist.github.com/drewstone/32432cd67a0e55f8ea4a8707e8170105

@tomusdrw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

So the error doesn't seem to be related:

2019-07-22 08:46:01.896  TRACE ws::connection  Handshake request received:
POST /wp-cron.php?doing_wp_cron=1563785161.8742790222167968750000 HTTP/1.1
Connection: upgrade
Host: 127.0.0.1:9944
Content-Length: 0
User-Agent: WordPress/5.1.1; https://www.prontogc.com.br
Accept: */*
Accept-Encoding: deflate, gzip
Referer: https://www.prontogc.com.br/wp-cron.php?doing_wp_cron=1563785161.8742790222167968750000
Content-Type: application/x-www-form-urlencoded


2019-07-22 08:46:01.897  TRACE ws::io  Encountered error while reading: WebSocket Protocol Error: Unable to parse WebSocket key.
2019-07-22 08:46:01.897  ERROR ws::handler  WS Error <Protocol>: Unable to parse WebSocket key.
2019-07-22 08:46:01.897  TRACE ws::io  Scheduling connection to 127.0.0.1:39100 as Writable

It seems like some kind of automated wordpress exploiter script that tries to connect to your WS server, but the server correctly closes the connection and the server keeps operating after that afaict.

Can you post -lrpc=trace logs to see what kind of queries are being invoked on your node and what are the timinings? Maybe there is some expensive query / lock contention that causes the node to halt?

@drewstone

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

Yep, will report back once it halts again.

@gterzian

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

One way to look at this, which isn't going to solve this particular problem but might still be worth thinking about, is that we could try to run different components in their own process, so that crashes of one don't bring down the entire node.

There is precedence for this in ongoing refactorings in web browsers, see for example: https://www.chromium.org/servicification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.