-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface connection problems in the WASM to the Javascript #435
Comments
There are two category of reasons why we can fail to connect to a node in general:
The bootnodes should as much as possible be treated the same way as all the other nodes on the network. Failing to connect to a bootnode because of one of these "normal situations" should in my opinion not be reported to the user. However, the "abnormal situations" aren't supposed to happen in the case of bootnodes. The reason why you put a bootnode in your chain specs is because you know that it's compatible with your node, and incompatibilities should indeed be reported to the user. Instead of having a callback for connection errors, I would suggest that you emit an 'error' whenever smoldot emits an error on its logs, and smoldot would emit an error whenever there's an "abnormal situation" w.r.t the bootnodes. |
You're right this is indeed more subtle than I gave it justice in the description. What if smoldot cannot connect to any bootnodes?
I'm not sure I agree with this. The client is not usable from the perspective of the consumer if the bootnodes are not connectable and syncing the chain. While in the scenario that the user is offline, this is perfectly "normal" from the perspective of smoldot. The consumer will want to know about it so that they can, for example, display some sort of client offline message. What if smoldot can connect to some but not all bootnodes?
Agreed, this suggests a JavaScript consumer needs more context when smoldot reports bootnode connectivity problems You can see in the devtools when something is wrong (e.g the recent 502 bad gateways and when a TLS connection is not supported) but there is no way currently to programmatically catch these conditions or make UI decisions about what to report to the user and what not to report. Do we think that we might want to display some sort of degraded performance UI when some but not all bootnodes are available for syncing? Currently on westend the implication is that only one bootnode is available and therefore syncing will be severely hampered as we trigger the DOS protection on the node, but maybe that will not be the case long term and on other networks. What if the consumer is building some sort of network monitor tool? Maybe the devops team of a substrate chain want to build a JavaScript dashboard and is using a WASM client to display network status? ... ok this is probably not realistic as they'd be better off using other tools for monitoring, but maybe they want to have a nicer UI for a dashboard What if smoldot discovers other non-bootnodes from the bootnode and fails to connect to them?In the future when more than just the bootnodes for a given chain have public TLS secured websockets. I can think of scenarios where we would like to be able to visualise the adoption of pure browser WASM clients across that network to demonstrate the effectiveness of unstoppable apps (maybe also as a marketing tool to encourage adoption). "How many nodes do I know about and how many did I fail to connect to?" What other node connectivity use cases are there? What do we want to support? cc @Stefie @goldsteinsveta @wirednkod |
Not sure what we do about this.. can you give an example of why this would happed @tomaka ?
I categorise this as programmer error. I.E. the consumer made a mistake doing the integration. We should unceremoniously throw an error.
I think what we show depends on if it is one or all of the bootnodes and the consumer needs to make the decision. I.E. is their UI interested in partial bootnode connectivity problems?
|
This would work but log grepping feels brittle to me especially if we want to provide more context as outlined in the "What if smoldot can connect to some but not all bootnodes?" scenario above |
This could be done by querying the number of peers smoldot is connected to (e.g. through the
Not being connected to any bootnode, but instead to other nodes is a completely normal situation. In fact, in the long term, we probably want to disconnect from bootnodes as soon as we are connected to non-bootnodes, as a polite way to give room to other nodes connecting.
I'd suggest that they build their own alternative to what is in While the content of This design choice influences more than just logging: old blocks are discarded, storage entries aren't cached, we assume that the finalized chain is close to the head of the chain, etc.
For what it's worth, discovering other nodes is disabled (because there's no other node on the network at the moment anyway), but can be added by adding the missing ~10 lines of code. However, the discovery code as it exists would just randomly discover parts of the network. Discovering the entire network is a bigger beast.
Well, a bug in the code.
It can also be caused by a mistake on the bootnode's configuration. We shouldn't make smoldot crash if a bootnode is misconfigured. One of the selling point of the light client is that it should continue to work and can't be hacked even if bootnodes are hacked.
The reason why I'm suggesting logs is that the entire purpose of error and warn logs is to indicate to the user that something is wrong. Even if we add a mechanism separate from logs to report problems when connecting to bootnodes, it would still be desirable to show error/warn logs to the user anyway. |
So you suggest SmoldotProvider reports an error event if I also don't think it is the responsibility of the consumer of the smoldot NPM package to keep track of whether it ever thought smoldot managed to connect to a peer (I.E. peer count was non-zero at some point in time) because "client offline" is a common scenario.
Ok 👍
Is there a tracking issue where we can talk about plans and ideas for peer discovery, I'm sure there will be lots to think about!
Ok - see next reply, we should treat it the same
I didn't mean crash smoldot, I agree it should never crash. We should throw a JavaScript error in the glue code that lives in the smoldot repo. Bugs in the code / incorrect usage should cause JavaScript application to crash.
I think given what you've said above logs are reasonable beyond the initial bootstrapping phase and for more complex scenarios a different custom WASM should be built. However we still need to know about "I never managed to connect to any nodes" and I don't think that can reasonably done by a consumer. So do we do that in the smoldot javascript code or is it the responsibility of the rust code? If it is in the JavaScript glue code then that would be the |
I think that whenever
I don't understand why you want a different treatment between before and after we have managed to contact bootnodes for the first time. If we didn't manage to connect to any bootnode because of a bug/problem/error, then it will be reported in the logs. If we didn't manage to connect to any bootnode because of a connectivity issue (the "normal situation"), then what is the point of telling the user "you were connected to the Internet before, but know it doesn't work anymore", compared to just "no connectivity"?.
Peer discovery is already fully designed and has been fully working for years now. Smoldot only implements the existing system, so there's no concrete plan/idea. The latest development is that W3F has plans to contact academic researchers to explore possible improvements, but that's not tracked by an issue.
The point is the same: substrate-connect/smoldot should continue to be usable normally even if someone manages to get into the Parity servers (or RandomBlockchainCompany's servers) and alter the bootnodes in some way. |
Because if the client fails to connect to any boot nodes it can never make a successful RPC call to find out it isn't connected to any peers. I.E. if it doesn't connect to any peers it has no way of knowing programmatically that it never did so and cannot choose to show UI that it's not connected. PolkadotJS makes some metadata etc RPC calls to bootstrap the client but if they error but they aren't triggered by us in the provider, so we can't catch the errors. Also the failures also don't cause the Api to fail to instantiate So the current behaviour is that no errors or events are thrown to indicate that the client isn't connected: The PolkadotJS Api instantiates successfully and the only indication given by smoldot is out-of-band in log messages. For comparison, the pre-existing scenario where the PolkadotJS API is connecting to a remote RPC node, it would emit an error when failing to connect the websocket to that node .. and so the consumer can handle that and show appropriate UI, Either PolkadotJS needs to change or smoldot if we want to be able to handle "offline at instantiation" like we can with the existing
Cool if you have any pointers for more reading that would be much appreciated 😄 I got the impression from gav's comments in element, that there was more to be done in this area: "done Truly Right, there should be nothing in there that identifies anybody, not even IP addresses. [S]o that would mean that it'd need to connect to some DHT/kademlia style tracker network to download the current peers." This is what I was thinking about when I asked if this has a place for discussion.
Ok I need to think about this more to get my head round the scenarios. Let's focus this issue about being able to detect not connecting to any bootnodes on instantiation. |
I just tested this by putting FF offline and actually with the default |
What is supposed to happen if at initialization smoldot manages to connect to one bootnode, then immediately disconnects from it 10 milliseconds later? Again, my opinion is that the equivalent of "connected to the JSON-RPC" in the
According to the context, I think he's referring to GitHub not being able to track your IP address. |
Ok I see what you're saying and that makes sense. The idea of connectivity is just fundamentally different. So that suggests that I add
|
So based on what we discussed: we shouldn't have uncaught browser errors - I.e. handle them gracefully in the |
Well, if the error is about the node being unreachable, I'd say to not print a log message. In both cases, yes, indeed, it shouldn't go straight in the console because of the browser. |
Consumers of the WASM light client JavaScript need to know about connection errors to boot nodes as they will want to either alert the user or show a different UI if the client cannot sync. It shouldn't be a hard crash but a
connection_error_callback
function. I would handle this and emit an 'error' even on the SmoldotProvider (consistent with other PolkadotJS providers).The text was updated successfully, but these errors were encountered: