Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add light client platform WASM compatible #1026

Merged
merged 39 commits into from
Jul 18, 2023

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Jun 23, 2023

This PR exposes a pure-rust code base for light-client interaction in WASM environments.

  • The low-level light-client implementation is moved to a dedicated crate part of the subxt workspace. This is done for modularity and better feature flag support while abstracting the environment interface.
  • The low-level smoldot_light::PlatformRef is implemented for both Native and WASM. Although the Native could use the smoldot_light::DefaultPlatform, this is reused with minimal code for better light-client code coverage. The main advantage of this approach is pure-rust code for WASM environments without gluing JavaScript code. A downside is that the common code of the platform needs maintenance.
  • A minimal dependency low-level WebSocket is implemented using wasm-bindgen
    • This takes inspiration from gloo-net, a crate that we also use in jsonrpsee for WASM environments
    • Most of the code is boilerplate for registering JavaScript closures/callbacks for open, message, error and close event
    • Besides minimal dependencies, the in-house WebSocket allows us to efficiently implement AsyncRead and AsyncWrite by levering a VecDeque

To enable the light-client users need to enable the "unstable-light-client" feature flag.
For wasm support, the "web" feature flag is also expected, which keeps consistency across subxt flags.

Testing Done

Similarly to the native light-client, a WASM compatible client syncs with the Polkadot live chain and subscribes to a few finalized chain blocks.

Builds upon: #965
Closes #964.

@lexnv lexnv requested a review from a team as a code owner June 23, 2023 17:00
@@ -0,0 +1,161 @@
// Copyright 2019-2023 Parity Technologies (UK) Ltd.
Copy link

@tomaka tomaka Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already asked this question to the Cumulus team, and I'm repeating it here: why rewrite this implementation of Platform?

Smoldot provides an implementation, and #965 uses it. Why decide to copy-paste its code here (without caring for the licensing (which is incompatible with Parity's), but let's forget that) when there's already a fully working and maintained version provided by smoldot itself?

To me you (and Cumulus) are really shooting yourself in the foot by doing this, and I just don't understand it.

We all know that this code will rot over time. You have the choice between the simple option that will work forever, and the complicated option that will likely require effort to maintain, and you go for the complicated one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I've implemented the low level Platform here to expose a pure-rust based implementation for wasm. While at it, I've also ported the platform for native (using tokio sockets) to allow us to reuse the same code for both platforms for better coverage. The differences are mainly in the connect function which uses a custom websocket with minimal dependencies that also implements AsyncRead / AsyncWrite.

Yep, I also felt the frustration with the wasm-bindgen, fortunately, we are using it for the custom socket. The socket is an oversimplification of gloo-net (this is what we use by default with jsonrpsee for wasm environments).
I'm not quite sure if we could have a pure-rust based implementation without some types of bindings for the websocket 🤔 I would be up for transitioning to something different

I also want to point out that the code in this PR doesn't support WebRTC.

Yep, we'd need to maintain the platform in the meanwhile if we pick this road in the end and look for alternatives of webrtc

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure if we could have a pure-rust based implementation without some types of bindings for the websocket

You can't. All that wasm-bindgen does is generate these bindings automatically, but they are always bindings.

Copy link
Collaborator

@jsdw jsdw Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lexnv what are the major blockers to using @tomaka's Platform implementation? I agree that it would be nicer to rely on an implementation that was supported and kept uptodate for us, so would be good to understand exactly the issues with doing so before we commit to our own impl :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pierre's platform works perfectly for native environments and there's also an alternative for WASM.
To make it fully compatible with our WASM needs the functions from lightclient/src/platform/wasm.rs are added to this PR.
For WASM we need: spawn, now_from_unix_epoch, now, sleep, and most importantly connect together with a few types, while for native we (including Cumulus) need just the connect to use a different socket.

Maybe we could leverage 90% of the smoldot Platform by having an extension trait.
The parts of the Platform that handles buffers are common between environments (lightclient/src/platform/default.rs), and the trickiest to implement right. Then we would need to implement just the connect for native and avoid duplicating the code here and in Cumulus.

// Extension trait to reuse the common parts of the Platform that handles buffers internally
trait PlatformRefBuffered {
 type Instant;
 type Delay;
 
 fn spawn() ..
 fn now() ..
 }

impl<T: PlatformRefBuffered> Platform for T {
...
}

Smoldot Wasm Node

There's a crate in smoldot that implements this.
The crate requires the javascript realm to expose these bindings.

#[link(wasm_import_module = "smoldot")]
extern "C" {
  pub fn buffer_copy(buffer_index: u32, target_pointer: u32);
  ...

These are expected to be placed under the linking module "smoldot".
Before manually implementing the Platform here I also tried to create these bindings in rust, then reuse the wasm platform from this crate.

However, I was unable to expose any external functions under the smoldot module.
The WAT binary would expect the bindings as:

(import "smoldot" "buffer_copy" (func $_ZN18smoldot_light_wasm8bindings11buffer_copy17hf21d3e4eee7463c1E (type 6)))
(import "smoldot" "advance_execution_ready" (func $_ZN18smoldot_light_wasm8bindings23advance_execution_ready17h100f38f424c0220dE (type 0)))
(import "smoldot" "connection_new" (func $_ZN18smoldot_light_wasm8bindings14connection_new17hab2362b6b1ccdf39E (type 28)))

While regardless of where I tried to implement the functions they would be exported from rust as

(export "buffer_copy" (func $buffer_copy))
(export "buffer_size" (func $buffer_size))
(export "json_rpc_responses_non_empty" (func $json_rpc_responses_non_empty))

We could overcome this eventually by modifying smoldot and removing #[link(wasm_import_module = "smoldot")].
However, all the binding functions are designed to work within the JavaScript world.
For example, the buffer_copy function takes as parameter buffer_index: u32 which would mean we'd also need to implement in Rust a way to map a buffer's memory region to an index.
There's an example in timers.rs from the wasm-node, which uses some static variables to handle the mappings.

Just for reference, here is the JavaScript counterpart of the wasm platform.
An alternative to this would be writing the JavaScript code then binding to it in rust. Or trying to bind directly to the JavaScript code from wasm-platform.

That felt a bit more involved than having our slim socket bindings and using that common parts of the platform and I would be up for adjusting/modifying this to make things better for us 🙏

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parts of the Platform that handles buffers are common between environments (lightclient/src/platform/default.rs), and the trickiest to implement right

What I'm suggesting is to simply de-duplicate the code.

Sure the code is the same as in smoldot, but the choice here is between:

  • Copy-pasting part of smoldot's Platform implementation into subxt and use it for both native and wasm sockets.
  • Copy-pasting part of smoldot's Platform implementation into subxt and use only for wasm sockets.

I don't really see how solution 1 (which you're going for) is in any way better than solution 2.

Before manually implementing the Platform here I also tried to create these bindings in rust,

That's not possible and I don't think it's a good approach nonetheless.

My personal intuition about the way forward is that I should try improve the Platform trait to be easier to implement, and/or maybe allow easily reusing smoldot's JavaScript.
But I can't really guarantee in any way that I'll succeed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That totally makes sense! 🙏

Indeed removing the native Platform from here and keeping only the WASM part will simplify the code for us.
The main advantage here would be that most users will be using a fully tested Platform provided by smoldot that we can easily update.
I was considering the native Platform from this PR to lay the foundation for testing and give us a bit more confidence, but I think we could overcome that if we get manage to get a bit more WASM testing.
And since we'll release this under an experimental/unstable feature flag in the beginning, I think we are fine for this PR.

My personal intuition about the way forward is that I should try improve the Platform trait to be easier to implement, and/or maybe allow easily reusing smoldot's JavaScript.

Thanks! That would help us indeed! Just a crazy idea, maybe initially having a small platform extension for native only would also simplify the work from Cumulus side. I think Cumulus just needs a way to change the socket from fn connect

Copy link

@tomaka tomaka Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we get manage to get a bit more WASM testing

IMO you need WASM-specific testing anyway, because browser APIs have many weird corners. I don't think it's a good idea to assume that things "just work" if you just plug APIs together.

For example, browsers famously throw an exception if you try to connect to a non-secure WebSocket from an HTTPS website. I don't know how you can handle that properly with wasm-bindgen but it should be handled.

Thanks! That would help us indeed! Just a crazy idea, maybe initially having a small platform extension for native only would also simplify the work from Cumulus side. I think Cumulus just needs a way to change the socket from fn connect

I'm planning to rework the connect function quite a lot, for example to pass an already-parsed multiaddr, and to let smoldot do the WebSocket implementation on top of TCP (in other words the native implementation would just need to support TCP). But again these are just ideas and there are many more things on my to-do list that have priority.

@tomaka
Copy link

tomaka commented Jun 25, 2023

While it's not my problem, I just want to warn you about wasm-bindgen.
I tried for around six months to use wasm-bindgen, and after completely pulling my hair off out of frustration I have decided to get rid of it and write JavaScript code manually on top.

Smoldot notably is capable of running itself in a worker thread (which is important when you have CPU-heavy operations), throttling itself to respect a CPU rate limit, compressing the .wasm file for faster loading, and runs perfectly well on all platforms and bundlers, which are things that are complicated (to say the least) with wasm-bindgen. Maybe the situation has improved nowadays, but I highly doubt it.

I also want to point out that the code in this PR doesn't support WebRTC. As soon as WebRTC lands in Substrate (which, granted, is more than 6 months late already) and is working reasonably well, we want to deprecate the hell out of WebSockets.

Base automatically changed from lexnv/light_client_support to master June 26, 2023 09:10
@lexnv lexnv requested a review from a team as a code owner June 26, 2023 09:10
lexnv added 10 commits June 26, 2023 12:57
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv force-pushed the lexnv/light_client_support_with_wasm branch from 83af93c to 2e660bb Compare June 26, 2023 11:01
lexnv added 14 commits June 26, 2023 14:05
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…modularity

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Cargo.toml Outdated Show resolved Hide resolved
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv changed the title Add light client platform WASM and native compatible Add light client platform WASM compatible Jun 29, 2023
Copy link
Collaborator

@jsdw jsdw Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to have this as a separate crate!

'static,
Result<PlatformConnection<Self::Stream, Self::Connection>, ConnectError>,
>;
type StreamUpdateFuture<'a> = future::BoxFuture<'a, ()>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh GATs!


fn update_stream<'a>(&self, stream: &'a mut Self::Stream) -> Self::StreamUpdateFuture<'a> {
Box::pin(future::poll_fn(|cx| {
let Some((read_buffer, write_buffer)) = stream.buffers.as_mut() else { return Poll::Pending };
Copy link
Collaborator

@jsdw jsdw Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poll::Pending here would get stuck forever (nothing would ever wake it up again except by chance); does it make more sense to return an error of some kind? Basically, when would None be returned here and what does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the Poll::Pending here will never wake up, since we don't store the context anywhere.

The stream's buffers can only be empty if the connection is in the Reset state. The fn connect(&self, multiaddr: &str) -> Self::ConnectFuture method is called first to establish the connection that initializes the buffers here.

Looking a bit at smoldot, my understanding is:

  • A task is spawned to handle incoming connection attempts (here)
  • The task checks in a loop that progress can be made on the connect; and updates some internal state machine
  • The loop is ended when it cannot send a message back to the background task; or when the socket is closed
  • The last step of that loop waits for one of the following to happen
    • stream update (this future) to complete
    • receive a message from the coordinator (some internal detail of smoldot)
    • a timeout to happen

I believe that we drop this future after some period of time when we return Poll::Pending here.

The smoldot's type StreamUpdateFuture<'a>: Future<Output = ()> + Unpin + Send + 'a; forces us to return () and considering the above we might introduce a busy wait loop if we could return Poll::Ready(Err()))

// cc @tomaka would love to hear your thoughts on this 🙏

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to read smoldot's source code, you just have to read the documentation of the function you're implementing: https://docs.rs/smoldot-light/latest/smoldot_light/platform/trait.PlatformRef.html#tymethod.update_stream

It says:

Returns a future that becomes ready when “something” in the state has changed.

Once a connection is in the reset state, it state never changes, and thus returning an infinite future is correct.

Copy link

@tomaka tomaka Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why the connections-related functions in the Platform trait are so different from AsyncRead and AsyncWrite is that AsyncRead and AsyncWrite are full of weird corner cases.

What errors can actually be returned and what do they represent? For example, can you suddely get a PermissionDenied error?
Can the reading side return errors and the writing side still work, or vice versa?
Are some errors recoverable? Can you continue reading/writing again despite an error being returned?
If you call poll_close and it returns Pending, can you start calling poll_write again after?
And so on.

So to solve all of this I created my own abstraction which follows what I think are straight forward rules. There are no errors, and just 5 possibles states (open, read open write closed, read closed write open, both closed, and reset).
It's a bit annoying to plug into existing APIs, but I'm going to provide helpers in the future.

if update_stream_future_ready {
Poll::Ready(())
} else {
Poll::Pending
Copy link
Collaborator

@jsdw jsdw Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever Poll::Pending is returned, the Context (cx) must at some point be used to wake the future up again, so that it continues to make progress. in at least one case, it looks like we return Pending here when nothing would be planning to ever call wake() on the context to wake it up again.

I'd have another look at this whole function and see how it can be structured so that we only ever return Pending if something like .poll_close, .poll_flush, .poll_read etc returns Pending (because these things each take in the cx and are all expected to wake the future when progress can possibly be made again). You can never return Pending "on your own" without it coming from some other thing, unless you also wake the future up appropriately etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, after looking a bit at the code I believe we might have some room for improvement here.

As you mentioned, when the read, write, close or flush cannot make progress, we simply return Poll:Pending without waking this up.

And since in smoldot this call is coupled with:

  • receive a message from the coordinator (some internal detail of smoldot)
  • wait for a timeout to happen

We could potentially wake this future up again before receiving a msg from the coordinator, or waiting for the timeout. I'm not sure how complicated the code would look / if its worth doing.

I would keep it in sync with the platform's behavior for now (code from smoldot).

Have added some comments around it

// cc for reference @tomaka

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankyou for adding the comments; as a reviewer it's scary to see the Pending's without context wakers being used/stored/whatever, but I think I understand how it's working a bit better now!

cx: &mut Context<'_>,
buf: &mut [u8],
) -> Poll<Result<usize, io::Error>> {
let mut inner = self.inner.lock().expect("Mutex is poised; qed");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another Pending/Waker thing:

Because this WasmSocket is behind an Arc<Mutex<..>>, a question I have is; what happens if we make, say, 3 copies of this, and then for each copy, in some different task, we .read().await?

Each copy would call poll_read, and then the waker given to each poll_read would be saved into inner to be woken up. But the wakers would overwrite eachother; 2 of the wakers would therefore never be used, and 2 of the three tasks would never make progress.

I had to consider this in some personal thing where I implemented a small MPSC style channel (https://github.com/jsdw/asyncified/blob/main/src/channel.rs), and there I had each clone of the sender allocate itself a new ID, and then I stored all wakers in a HashMap<ID, waker>) so that I would remove them at needed on cleanup, and wake them all up on progress.

That all said, pub struct WasmSocket is not itself clonable, so, does it need the Arc inside it at all? (or the mutex for that matter)? Because if you could clone the socket and read from it, different clones would be getting different parts of the output anyway which probably isn't what we want.

So maybe we can just remove the Arc<Mutex>? Or at least the Arc? And then the type system enforces that it can only be used in one place, with one waker, at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoldot requires this object to be Send, which may be a side-effect of the libp2p to have the Transport layer as send.

By default, the websys WASM socket is not Send. To accommodate the interface I've used the SendWrapper in a few places which should give us a bit of a stronger guarantees.

I've tried removing the Arc but somehow we'd still need to wake up the waker variable from inside the closures (similar to gloo-net) and have this Send.

There's also a plan to add support to libp2p for a rust based code for WASM that uses something similar to an Arc<Mutex<shared>> (from transports/websocket-websys/Cargo.toml).

I've changed the Arc<Mutex with an Rc<RefCell + the SendWrapper. While at it, I have also moved the socket and callbacks out of the RefCell (prev Mutex).
I'm not quite sure how we could improve it further, would love to hear your thoughts 🙏

Copy link
Collaborator

@jsdw jsdw Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oohhh sorry, so the reason for the Arc<Mutex<..>> was because this thing needed sharing across some different closures below; I missed that bit! And when compiled to WASM the mutex will disappear anyway since it's single threaded.

I think given that the outer struct is not cloneable, I'd remove the SendWrapper thing (and crate) and just put the Arc<Mutex<..>> back (it feels cleaner than this SendWrapper<Rc<RefCell<..>>> thing). Just maybe a dev comment on WasmSocket that the thing cannot impl Clone would do I think :)

Copy link
Collaborator

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good; just a few questions and thoughts mainly around Poll::Pending uses which could be problematic :)

lexnv added 11 commits July 17, 2023 14:50
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…support_with_wasm

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Copy link
Collaborator

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @lexnv; thanks for looking into those Pending things and checking that they are ok (ie smoldot is dropping the future or whatever anyway in those cases).

I'm happy to see this merge now and we can iterate on it from there if we run into things :)

@lexnv lexnv merged commit 4bda673 into master Jul 18, 2023
8 checks passed
@lexnv lexnv deleted the lexnv/light_client_support_with_wasm branch July 18, 2023 11:27
@jsdw jsdw mentioned this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[light-client] Wasm compatible light-client
5 participants