Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Add WebAssembly support and the ability to Dial from browsers #51

Closed
wants to merge 24 commits into from

Conversation

albrow
Copy link
Contributor

@albrow albrow commented Aug 8, 2019

This PR is still a work in progress as there are some details to work out (especially around error handling). I should probably also do one more round of cleanup. However, the basics of the transport are working. This PR also includes an integration test which covers basic communication over a stream between a server (running the vanilla Go transport) and a headless Chrome browser (running the WebAssembly/JavaScript version).

After doing some more investigation, I think it makes more sense to introduce direct WebAssembly support here rather than adding it to the gorilla/websocket dependency. The main reason is that the gorilla/websocket API has a relatively large surface area, their own set of configuration options, and their own way of establishing connections that don't translate well to Wasm/JavaScript. The subset of the gorilla/websocket API that is used in this package is much easier to translate.

The Listen method is not supported Wasm/JavaScript environments and calling it will always return an error. Browsers cannot listen for incoming WebSocket connections; they can only dial.

I'm open to suggestions about how to organize the files, but right now I'm using the file name suffixes _go and _js to denote files that only work in vanilla Go or JavaScript/Wasm environments, respectively. Files without any suffix work in both environments.

So for example, conn.go has been split into three files:

  • conn.go contains code that is shared between vanilla Go implementation and the JavaScript bindings.
  • conn_go.go contains the vanilla Go implementation.
  • conn_js.go contains the JavaScript bindings.

While the implementations defer, the exported API in _go and _js files is the same and a lot of code is shared.

@albrow albrow mentioned this pull request Aug 8, 2019
@albrow
Copy link
Contributor Author

albrow commented Aug 8, 2019

CI is failing because it uses Go 1.11. My PR is targeting the latest go version as of now which is 1.12. Can we update Travis to use the latest version? If you want to run the browser integration tests in Travis, we might also need to install Chrome.

conn_js.go Outdated
closeSignal chan struct{}
}

func (c *Conn) Read(b []byte) (int, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate it if someone could take a closer look at this implementation of Read. I'm not super well-versed in the semantics/behavior libp2p expects.

conn_js.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

Is there any reason to merge this here instead of to simply create an entirely new transport? This is basically just a different implementation of the websocket transport and there isn't any shared code.

We'd then conditionally import the correct transport from go-libp2p.

@albrow
Copy link
Contributor Author

albrow commented Aug 8, 2019

@Stebalien there is a decent amount of shared code. Every file without a build tag at the top is shared. This includes:

  • addrs.go
  • addrs_test.go
  • websocket.go
  • conn.go

In terms of functionality, the shared code:

  • Specifies the multiaddress format for the transport and converts to/from a WebSocket net.Addr.
  • Specifies the protocol and codec (WsProtocol and WsCodec).
  • Uses a tptu.Upgrader to upgrade a multiaddr-net connection into a full libp2p-transport connection.
  • Implements most of the methods in the Transport interface: CanDial, Protocols, Proxy, and Dial (partially at least; the environment-specific maDial method does most of the heavy-lifting).

There is a benefit to keeping this shared code in one repository so that if we ever need to change it in the future, we only need to change it in one place.

There is another big advantage to putting the WebAssembly-related code here: it allows us to write integration tests that ensure compatibility. The goal here is to make the API and behavior the same regardless of whether you are compiling to WebAssembly or a binary executable, and integration tests are an important part of making sure we adhere to that goal. It's possible to have integration tests if the code lives in two different repositories, but again that would make it more difficult to manage if we ever made changes in the future.

Finally, keeping this code in the same place makes it easier for developers who want to enable/disable specific transports instead of just using the libp2p defaults. In that case, they can just import github.com/libp2p/go-ws-transport and it will work in both vanilla Go and JavaScript/Wasm environments. This the same approach that we followed in go-libp2p-webrtc-direct and github.com/pion/webrtc and it works really well. If the code lived in two different repositories, developers who want to build isomorphic apps would need to create two different files with different build tags and import two different packages.

@Stebalien
Copy link
Member

My general concern is that it means anyone wanting to modify the normal go transport needs to make sure they don't accidentally break the wasm transport. However, you're right. It's a lot simpler to explain "here's the websocket transport" instead of "here's the websocket transport for platform A, here's the websocket transport for platform B".

Note, in practice:

  1. The same config won't quite work as most users will want to enable TCP (which can't work in WASM).
  2. We can select the correct defaults for the platform in go-libp2p.

However, this is still probably the best way to do it.

@Stebalien
Copy link
Member

I'm open to suggestions about how to organize the files, but right now I'm using the file name suffixes _go and _js to denote files that only work in vanilla Go or JavaScript/Wasm environments, respectively.

Maybe _browser versus _system (or _native)? This doesn't technically involve any javascript.

@Stebalien
Copy link
Member

The Listen method not supported Wasm/JavaScript environments and calling it will always return an error. Browsers cannot listen for incoming WebSocket connections; they can only dial.

That should be fine. The user will just get an error on construction if they ask the browser to listen on a websocket address.

@albrow
Copy link
Contributor Author

albrow commented Aug 9, 2019

My general concern is that it means anyone wanting to modify the normal go transport needs to make sure they don't accidentally break the wasm transport.

I understand that concern and I'm happy to help field any issues that might come up 😄This package is critical for my project at work so I can commit some time to help maintain it. I'd also be happy to add some documentation here to help get contributors get oriented since most people will not have prior experience with Wasm.

The same config won't quite work as most users will want to enable TCP (which can't work in WASM).

Yeah that's a good point. There will be some difference in configuration but it's still easier if you don't have to import different packages for different platforms. And you're right it's probably still a good idea to have go-libp2p choose the appropriate defaults.

@albrow
Copy link
Contributor Author

albrow commented Aug 9, 2019

I should've shared this before. For anyone unfamiliar with how WebAssembly works in Go, these are good resources to get started:

conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
@albrow
Copy link
Contributor Author

albrow commented Aug 14, 2019

@Stebalien I addressed all your feedback. I still need to add more robust error handling. Feel free to take another look now or I'll ping you again after error handling has been improved.

conn_browser.go Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_js.go Outdated Show resolved Hide resolved
This allows us to avoid the rather obtuse FileReader API and makes it
easier to read incoming data syncronously.
@albrow
Copy link
Contributor Author

albrow commented Aug 20, 2019

@Stebalien I took a stab at adding proper error handling. It was definitely tricky to work with the event-driven JavaScript API. It also has some other weird quirks like the fact that the error event doesn't return any useful information about what the error actually was (you have to use the close event for that).

The basic idea is:

  1. If the error or close event triggers while we are trying to open the connection, return an error in Dial.
  2. If the error or close event triggers after the connection is already open, first allow Conn.Read to read any remaining data and then return the error on the next call to Conn.Read.

@albrow albrow changed the title WIP: Add WebAssembly support and the ability to Dial from browsers Add WebAssembly support and the ability to Dial from browsers Aug 20, 2019
@albrow
Copy link
Contributor Author

albrow commented Aug 20, 2019

I removed the WIP prefix from the title. @Stebalien I believe I've addressed all your feedback and this PR is ready for final review. Ideally we can also get the tests working in TravisCI.

@albrow
Copy link
Contributor Author

albrow commented Aug 28, 2019

@Stebalien just a friendly reminder that I believe I've addressed all your feedback and am waiting for you to take another look. Let me know if I missed anything or if you have any new thoughts.

@raulk raulk requested a review from Stebalien August 29, 2019 10:51
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. We should be getting close.

(and sorry for the delay).

browser_integration_browser_test.go Show resolved Hide resolved
websocket_browser.go Show resolved Hide resolved
conn_browser.go Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
c.currDataMut.RUnlock()
if n == 0 {
if c.firstErr != nil {
return 0, c.firstErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's firstErr? If there's a connection error (not just a polite close), we should abort immediately without bothering to read the rest of the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien Apparently, due to security concerns most browsers always use code 1006: connection unexpectedly closed whenever a WebSocket connection is closed. I don't think there is any way for us to differentiate between a polite close and an unexpected error in the underlying WebSocket connection.

Given that constraint, what do you want to do here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing anything to suggest that. From the looks of it, 1006 means "something went wrong". We just need to distinguish between that case and "the connection was closed correctly".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see https://stackoverflow.com/a/53340067. That's a bug in chrome and there's nothing we can do about it. Where are you seeing the talk about this being a security feature?

Copy link
Contributor Author

@albrow albrow Sep 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the link I shared above:

However, Chrome will rarely report any close code 1006 reasons to the javascript side. This is likely due to client security rules in the WebSocket spec to prevent abusing websocket. (such as using it to scan for open ports on a destination server, or for generating lots of connections for a denial-of-service attack).

Empirically, 1006 is the only error code I've observed from Chrome and I don't see a way to get other information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. After reading this again, the quote doesn't exactly back up what I said. It just says that if there is a code 1006 error there is no way to get additional information. Let me do some more research. Maybe the reason I've been seeing 1006 has to do with how the connection is closed.

Copy link
Contributor Author

@albrow albrow Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some more testing, I have observed Chrome reporting status code 1005 (also with an empty reason field). However, this doesn't really help us much. I can't find any way to produce code 1000 which is what we would expect for a polite close. This might be because of the bug in Chrome that you mentioned or because gorilla/websocket isn't sending the appropriate close frame?

There is that comment you shared from StackOverflow that claims Chrome never reports code 1000 but I did find these tests in Chromium which suggest there are at least some cases where it does report code 1000.

I'm not sure how to proceed and I have two open questions:

  1. Can we rely on close codes to differentiate polite closes from unexpected errors?
  2. How can we rectify the asynchronous JavaScript API (which uses the onclose event to communicate most errors) with the largely synchronous interface expected by libp2p (e.g. errors should be returned from Read and Write when they occur)? The way I do this now is by saving the first error that is reported via onclose in the firstErr field and then returning it at the next opportunity. It sounds like maybe you aren't satisfied with this approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. Turns out, we aren't sending a nice close message: #54

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like maybe you aren't satisfied with this approach?

I'm fine with an async approach. The network itself is async so there's nothing we can do about that.

My concern is:

  1. When we receive an error, we should start failing reads/writes ASAP. This doesn't need to be synchronous, I just don't want to bother finishing reading.
  2. When we receive an EOF (nice close), we should continue reading to the end.

Note: We're mostly getting into pedantic correctness edge-cases here.

conn_browser.go Outdated Show resolved Hide resolved
conn_browser.go Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
conn_browser.go Show resolved Hide resolved
conn_browser.go Outdated Show resolved Hide resolved
albrow referenced this pull request in nhooyr/websocket Sep 23, 2019
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, actually almost there now.

c.currDataMut.RUnlock()
if n == 0 {
if c.firstErr != nil {
return 0, c.firstErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. Turns out, we aren't sending a nice close message: #54

c.currDataMut.RUnlock()
if n == 0 {
if c.firstErr != nil {
return 0, c.firstErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like maybe you aren't satisfied with this approach?

I'm fine with an async approach. The network itself is async so there's nothing we can do about that.

My concern is:

  1. When we receive an error, we should start failing reads/writes ASAP. This doesn't need to be synchronous, I just don't want to bother finishing reading.
  2. When we receive an EOF (nice close), we should continue reading to the end.

Note: We're mostly getting into pedantic correctness edge-cases here.

c.signalClose()
c.mut.Lock()
// Store the error in c.firstErr. It will be returned by Read later on.
c.firstErr = errorEventToError(args[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we're taking the lock when setting this but not when reading it. We should either:

  1. Take the lock when reading it.
  2. Guard it with the channel:
  3. Set it within signalClose(err).
  4. Set it before closing.
  5. Only read it after observing that the close-signaling channel is actually closed.

browser_integration_browser_test.go Show resolved Hide resolved

// checkOpen returns an error if the connection is not open. Otherwise, it
// returns nil.
func (c *Conn) checkOpen() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just wait for the "close" event to happen then check the closeSignal channel after that? That should save us a round-trip through the webasm runtime.

select {
  case <-c.closeSignal:
    return true
  default:
    return false
}

}

// arrayBufferToBytes converts a JavaScript ArrayBuffer to a slice of bytes.
func arrayBufferToBytes(buffer js.Value) []byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... go 1.13 is out. Want to go ahead and use CopyBytesToGo and CopyBytesToJs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference would be to release a version that works with Go 1.12 and then include this optimization. There are some speed bumps with upgrading to Go 1.13 so some projects (0x Mesh included) might not have updated yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Note: libp2p is likely to update to 1.13 before too long as we need it for ed25519+TLS.

@albrow
Copy link
Contributor Author

albrow commented Sep 30, 2019

@Stebalien Thanks for your latest feedback and for investigating that graceful close issue. I'm sorry for the delay on my end. At 0x we've been gearing up for our next big protocol upgrade (version 3) so I've had to temporarily de-prioritize WebAssembly/browser integration. I'll get back to this PR as soon as I can.

@Stebalien
Copy link
Member

Np. And you never need to apologize for a delay. If it comes to it, I can address my own feedback and merge.

Thanks for the heads up and the awesome work!

@Stebalien
Copy link
Member

@albrow I rebased this and fixed the last few nits (and got the tests to pass): #55

(we were pretty close and I didn't want it to bitrot)

@Stebalien
Copy link
Member

@albrow I've merged #55. Please let me know if I broke something.

@Stebalien Stebalien closed this Oct 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants