Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/compile: add go:wasmexport directive #65199

Open
johanbrandhorst opened this issue Jan 22, 2024 · 66 comments
Open

proposal: cmd/compile: add go:wasmexport directive #65199

johanbrandhorst opened this issue Jan 22, 2024 · 66 comments
Labels
arch-wasm WebAssembly issues Proposal
Milestone

Comments

@johanbrandhorst
Copy link
Member

johanbrandhorst commented Jan 22, 2024

Background

#38248 defined a new compiler directive, go:wasmimport, for interfacing with host defined functions. This allowed calling from Go code into host functions, but it’s still not possible to call from the WebAssembly (Wasm) host into Go code.

Some applications have adopted the practice of allowing them to be extended by calling into Wasm compiled code according to some well defined ABI. Examples include Envoy, Istio, VS Code and others. Go cannot support compiling code to these applications, as the only exported function in the module compiled by Go is _start, mapping to the main function in a main package.

Despite this, some users are designing custom plugin systems using this interface, utilizing standard in and standard out for communicating with the Wasm binary. This shows a desire for exporting Go functions in the community.

There have been historical discussions on implementing this before (including #42372, #25612 and #41715), but none of them have reached a consensus on a design and implementation. In particular, #42372 had a long discussion (and design doc) that never provided a satisfying answer for how to run executed functions in the Go runtime. Instead of reviving that discussion, this proposal will attempt to build on it and answer the questions posed. This proposal supersedes #42372.

Exporting functions to the wasm host is also a necessity for a hypothetical GOOS=wasip2 targeting preview 2 of the WASI specification. This could be implemented as a special case in the compiler but since this is a feature requested by users it could reuse that functionality (similar to go:wasmimport today).

Proposal

Repurpose the -buildmode build flag value c-shared for the wasip1 port. It now signals to the compiler to replace the _start function with an _initialize function, which performs runtime and package initialization.

Add a new compiler directive, go:wasmexport, which is used to signal to the compiler that a function should be exported using a Wasm export in the resulting Wasm binary. Using the compiler directive will result in a compilation failure unless the target GOOS is wasip1.

There is a single optional parameter to the directive, defining the name of the exported function:

//go:wasmexport [name]

The directive is only allowed on functions, not methods.

Discussion

Parallel with -buildmode=c-shared and CGO

The proposed implementation is inspired by the implementation of C references to Go functions. When an exported function is called, a new goroutine (G) is created, which executes on a single thread (M), since Wasm is a single threaded architecture. The runtime will wake up and resume scheduling goroutines as necessary, with the exported function being one of the goroutines available for scheduling. Any other goroutines started during package initialization or left over from previous exported function executions will also be available for scheduling.

Why a "-buildmode" option?

The wasi_snapshot_preview1 documentation states that a _start function and an _initialize function are mutually exclusive. Additionally, at the end of the current _start functions as compiled by Go, proc_exit is called. At this point, the module is considered done, and cannot be interacted with. Given these conditions, we need some way for a user to declare that they want to build a binary especially for exporting one or more functions and to include the _initialize function for package and runtime initialization.

We also considered using a GOWASM option instead, but this feels wrong since that environment variable is used to specify options relating to the architecture (existing options are satconv and signext), while this export option is dependent on the behavior of the "OS" (what functions to export, what initialization pattern to expect).

What happens to func main when exports are involved?

Go code compiled to a wasip1 Wasm binary can be either a "Command", which includes the _start function, or a "Reactor/Library", which includes the _initialize function.

When using -buildmode=c-shared, the resulting Wasm binary will not contain a _start function, and will only contain the _initialize function and any exported functions. The Go main function will not be exported to the host. The user can choose to export it like any other function using the //go:wasmexport directive. The _initialize function will not automatically call main. The main function will not initialize the runtime.

When the -buildmode flag is unset, the _start function and any exported functions will be exported to the host. Using //go:wasmexport on the main function in this mode will result in a compilation error. In this mode, only _start will initialize the runtime, and so must be the first export called from the host. Any other exported functions may only be called through calling into host functions that call other exports during the execution of the _start function. Once the _start function has returned, no other exports may be called on the same instance.

Why not reuse //export?

//export is used to export Go functions to C when using buildmode=c-shared. Use of //export puts restrictions on the use of the file, namely that it cannot contain definitions, only declarations. It’s also something of an ugly duckling among compiler directives in that it doesn’t use the now established go: prefix. A new directive removes the need for users to define functions separately from the declaration, has a nice symmetry with go:wasmimport, and uses the well established go: prefix.

Handling Reentrant Calls and Panics

Reentrant calls happen when the Go application calls a host import, and that invocation calls back into an exported function. Reentrant calls are handled by creating a new goroutine. If a panic reaches the top-level of the go:wasmexport call, the program crashes because there are no mechanisms allowing the guest application to propagate the panic to the Wasm host.

Naming exports

When the name of the Go function matches that of the desired Wasm export, the name parameter can be omitted.

For example:

//go:wasmexport add
func add(x, y int) int {
    return x + y
}

Is equivalent to

//go:wasmexport
func add(x, y int) int {
    return x + y
}

The names _start and _initialize are reserved and not available for user exported functions.

Third-party libraries

Third-party libraries will need to be able to define exports, as WASI functionality such as wasi-http requires calling into exported functions, which would be provided by the third party library in a user-friendly wrapper. Any exports defined in third party libraries are compiled to exported Wasm functions.

Module names

The current Wasm architecture doesn’t define a module name of the compiled module, and this proposal does not suggest adding one. Module names are useful to namespace different compiled Wasm binaries, but it can usually be configured by the runtime or using post-processing tools on the binaries. Future proposals may suggest some way to build this into the Go build system, but this proposal suggests not naming it for simplicity.

Conflicting exports

If the compiler detects multiple exports using the same name, a compile error will occur and warn the user that multiple definitions are in conflict. This may have to happen at link time. If this happens in third-party libraries the user has no recourse but to avoid using one of the libraries.

Supported Types

The go:wasmimport directive allows the declaration of host imports by naming the module and function that the application depends on. The directive applies restrictions on the types that can be used in the function signatures, limiting to fixed-size integers and floats, and unsafe.Pointer, which allows simple mapping rules between the Go and Wasm types. The go:wasmexport directive will use the same type restrictions. Any future relaxing of this restriction will be subject to a separate proposal.

Spawning Goroutines from go:wasmexport functions

The proposal considers scenarios where the go:wasmexport call spawns new goroutines. In the absence of threading or stack switching capability in Wasm, the simplest option is to document that all goroutines still running when the invocation of the go:wasmexport function returns will be paused until the control flow re-enters the Go application.

In the future, we anticipate that Wasm will gain the ability to either spawn threads or integrate with the event loop of the host runtime (e.g., via stack-switching) to drive background goroutines to completion after the invocation of a go:wasmexport function has returned.

Blocking in go:wasmexport functions

When the goroutine running the exported function blocks for any reason, the function will yield to the Go runtime. The Go runtime will schedule other goroutines as necessary. If there are no other goroutines, the application will crash with a deadlock, as there is no way to proceed, and Wasm code cannot block.

Authors

@johanbrandhorst, @achille-roussel, @Pryz, @dgryski, @evanphx, @neelance, @mdlayher

Acknowledgements

Thanks to all participants in the go:wasmexport discussion at the Go contributor summit at GopherCon 2023, without which this proposal would not have been possible.

CC @golang/wasm @cherrymui

@ydnar
Copy link

ydnar commented Jan 22, 2024

Thanks for putting this together—this is exciting.

Generating a module that can act as a reactor and a command sounds like a great idea. I noticed this might conflict with how Node interprets a module. If a module exports both _start and _initialize, it will throw an exception: https://nodejs.org/api/wasi.html

  1. One could argue this is undesired behavior, and Node could change.
  2. What happens if a host detects and calls both _initialize and _start?
  3. Exporting one or the other, but not both, implies some kind of configuration or detection.

@ydnar
Copy link

ydnar commented Jan 22, 2024

The directive is only allowed on functions, not methods.

Using //go:wasmimport on methods has been helpful for mapping Component Model resource methods in WASI Preview 2:

From https://github.com/ydnar/wasm-tools-go/blob/9b4707e054a8b528b27240cba6c05557c4e26a53/wasi/io/error/error.wit.go:

// ToDebugString represents the method "wasi:io/error.error#to-debug-string".
//
// Returns a string that is suitable to assist humans in debugging
// this error.
//
// WARNING: The returned string should not be consumed mechanically!
// It may change across platforms, hosts, or other implementation
// details. Parsing this string is a major platform-compatibility
// hazard.
func (self Error) ToDebugString() string {
	var ret string
	self.to_debug_string(&ret)
	return ret
}

//go:wasmimport wasi:io/error@0.2.0-rc-2023-11-10 [method]error.to-debug-string
func (self Error) to_debug_string(ret *string)

Subjectively, using methods seems better aligned with the Component Model semantics than the equivalent:

//go:wasmimport wasi:io/error@0.2.0-rc-2023-11-10 [method]error.to-debug-string
func error__to_debug_string(self Error, ret *string)

Given that resources are opaque i32 handles, the same could be true for implementing exported methods via //go:wasmexport.

@johanbrandhorst
Copy link
Member Author

Thanks for putting this together—this is exciting.

Generating a module that can act as a reactor and a command sounds like a great idea. I noticed this might conflict with how Node interprets a module. If a module exports both _start and _initialize, it will throw an exception: https://nodejs.org/api/wasi.html

1. One could argue this is undesired behavior, and Node could change.

2. What happens if a host detects and calls both _initialize and _start?

3. Exporting one or the other, but not both, implies some kind of configuration or detection.
   
   * In this TinyGo PR I experimented with detecting lack of main.main as the trigger for "reactor" mode with _initialize as the entry point: [runtime, builder: WebAssembly reactor mode tinygo-org/tinygo#4082](https://github.com/tinygo-org/tinygo/pull/4082)

Thank you for the information about Node's behavior here, I wasn't aware. That is certainly troubling. I will try to see what if any other precedent there is for this behavior in the ecosystem to see whether we or Node are in the wrong.

If a host calls both _initialize and _start, it will run initialization once (initialization has to be protected with something like a sync.Once to be idempotent) and then run func main(). Just calling _start will accomplish the same thing.

Indeed, if we do need some way to allow users to choose whether to build a command (executing func main()) or library (just initializating and exporting functions), this proposal would need to add some way for users to turn that knob. I don't want to prejudice that discussion until we know if we need it.

@johanbrandhorst
Copy link
Member Author

Given that resources are opaque i32 handles, the same could be true for implementing exported methods via //go:wasmexport.

This may be true, but I still think this proposal serves as an MVP that we can enhance with method support in a subsequent proposal once the initial hurdles have been overcome.

@ydnar
Copy link

ydnar commented Jan 22, 2024

Thank you for the information about Node's behavior here, I wasn't aware. That is certainly troubling. I will try to see what if any other precedent there is for this behavior in the ecosystem to see whether we or Node are in the wrong.

If a host calls both _initialize and _start, it will run initialization once (initialization has to be protected with something like a sync.Once to be idempotent) and then run func main(). Just calling _start will accomplish the same thing.

Maybe it’s a bigger question about what is defined behavior. Is having both _initialize and _start valid, or undefined? Having only one entry point is less ambiguous, e.g. the host can only call one, but not both (or choose, which could be contrary to the user’s expectation).

@ydnar
Copy link

ydnar commented Jan 22, 2024

Indeed, if we do need some way to allow users to choose whether to build a command (executing func main()) or library (just initializating and exporting functions), this proposal would need to add some way for users to turn that knob. I don't want to prejudice that discussion until we know if we need it.

Have had previous discussions about -buildmode=wasm-reactor to mirror -buildmode=c-shared.

@johanbrandhorst
Copy link
Member Author

I created an issue to ask the NodeJS devs for the source of this design decision: nodejs/node#51544

@cjihrig
Copy link

cjihrig commented Jan 22, 2024

Hey. Node developer that implemented that design decision here. 👋

That change was nearly four years ago, and I have since forgotten the exact motivation. However, I was able to dig this up: WebAssembly/WASI@d8b286c. At that point in time, WASI commands had a _start() function, and WASI reactors had an _initialize() function. Commands and reactors were mutually exclusive.

WASI has changed a good bit since then. I no longer work on WASI, so I don't know if that design decision is still valid or not. I would recommend checking with the folks in the WASI repos.

@zetaab
Copy link

zetaab commented Jan 22, 2024

WebAssembly/wasi-http#95 contains discussion to use _initialize func. So if that is not possible to golang, it would be difficult

@johanbrandhorst
Copy link
Member Author

Any user created func init() would be run in _initialize, is this not sufficient?

@johanbrandhorst
Copy link
Member Author

Hey. Node developer that implemented that design decision here. 👋

That change was nearly four years ago, and I have since forgotten the exact motivation. However, I was able to dig this up: WebAssembly/WASI@d8b286c. At that point in time, WASI commands had a _start() function, and WASI reactors had an _initialize() function. Commands and reactors were mutually exclusive.

WASI has changed a good bit since then. I no longer work on WASI, so I don't know if that design decision is still valid or not. I would recommend checking with the folks in the WASI repos.

Thanks so much for providing your input and this reference. It seems this doc now lives at https://github.com/WebAssembly/WASI/blob/a7be582112b35e281058f1df7d8628bb30a69c3f/legacy/application-abi.md. I wonder, given that this is now under the legacy heading, whether this statement is still true:

These kinds are mutually exclusive; implementations should report an error if asked to instantiate a module containing exports which declare it to be of multiple kinds.

If so, this design would need to change to allow the user to choose whether to compile a Command or a Library (Reactor). @sunfishcode perhaps you could provide some guidance here?

@sunfishcode
Copy link

sunfishcode commented Jan 23, 2024

The _start and _initialize functions and legacy/application-abi.md file are all Preview 1 things. Many Preview 1 Wasm engines recognize _start for commands, and some recognize _initialize as an entrypoint for reactors.

Preview 2 is based on the Wasm component model.

  • For the command world, there is an exported run function which is the command entrypoint (analogous to what _start was in Preview 1).
  • There isn't an export for reactor-style wasm programs. The component model itself has a mechanism to call functions on initialization. Unlike core-Wasm's start section, the component-model's start section doesn't need to worry about memory not being exported yet, so we can use for arbitrary initialization (analogous to what _initialize was in Preview 1).
    • If the tooling you use to go from a core-wasm module to a component supports it, the core-wasm _initialize function may be automatically wired up to the component-model start section.

Edit: I was mistaken about the component-model start function. It's not permitted to call imports, so it's not usable for arbitrary initialization code. There are ongoing discussions about this.

@johanbrandhorst
Copy link
Member Author

Thanks for the explanation. This proposal targets our existing wasm implementations, js/wasm and wasip1/wasm. We'll have a think about the best way to go about this that doesn't paint us into a corner when it comes to adding support for wasip2 down the line.

@ydnar
Copy link

ydnar commented Jan 23, 2024

  • If the tooling you use to go from a core-wasm module to a component supports it, the core-wasm _initialize function may be automatically wired up to the component-model start section.

What’s an example of tooling that converts a module to a component that supports the component model start section?

Wasmtime seems to not support the start section? https://github.com/bytecodealliance/wasmtime/blob/e9d580776ee27f4ed59ba334765aacbcc22fa6e4/crates/environ/src/component/translate.rs#L623

@johanbrandhorst
Copy link
Member Author

johanbrandhorst commented Jan 24, 2024

In light of the discussion around NodeJS's behavior and the documented separation between _initialize and _start in wasip1, we've updated the proposal to include a new -buildmode=wasip1-reactor, used to instruct the compiler to produce a Wasm binary with an _initialize function in place of the _start function. The use of go:wasmexport is limited to this new build mode, which is only available for GOOS=wasip1.

@cherrymui
Copy link
Member

Thanks for the proposal! Looks good overall.

-buildmode=wasip1-reactor

Is there something similar for js/wasm? Or the library/export mechanism is very different? Also, will the mechanism be similar for later wasip2, or eventual wasi? If so, maybe we can choose a more general name like wasm-library, so we don't need to have a different build mode for each of them? (For start it is okay to only implement on wasip1, just like the c-shared build mode is not implemented on all platforms.)

_initialize

Is _initialize required to be called before any exported functions can be called? Or, the first time it calls into Go _initialize is called if not already? Or the Wasm execution engine always automatically calls _initialize on module load time, so it is guaranteed to be called first?

In the absence of threading or stack switching capability in Wasm, the simplest option is to document that all goroutines still running when the invocation of the go:wasmexport function returns will be paused until the control flow re-enters the Go application.

So, this sounds like that at the end of the exported function, the Go runtime will not try to schedule other goroutines to run but directly return to Wasm? I assume this might be okay. But js.FuncOf seems to choose a different approach. This is also related to the discussion in #42372. Could you explain the reason for choosing this approach?

GODEBUG=wasmgoroutinemon=1

I'm not sure we want this debug mode. As you mentioned, it is probably not uncommon to have background goroutines. If one wants to ensure there is no goroutine at the time of exported function exiting, one probably can check it with runtime.NumGoroutine.

Thanks.

@johanbrandhorst
Copy link
Member Author

-buildmode=wasip1-reactor

Is there something similar for js/wasm? Or the library/export mechanism is very different? Also, will the mechanism be similar for later wasip2, or eventual wasi? If so, maybe we can choose a more general name like wasm-library, so we don't need to have a different build mode for each of them? (For start it is okay to only implement on wasip1, just like the c-shared build mode is not implemented on all platforms.)

Any wasm module can declare exports, but we don't anticipate that exporting methods like this is generally useful to users of js/wasm - we have js.FuncOf today to make Go code callable from JS, and making it callable from Wasm doesn't seem nearly as useful for that platform.

For wasip2, as illustrated by Dan's reply above, it's not clear what the export mechanism would look like yet. The name wasip1-reactor is chosen to be deliberately specific to wasip1. The exact functionality in this proposal would be limited to wasip1 forever, and any hypothetical wasip2 proposal would likely have to explain how/if wasmexport will be available for that target initially.

_initialize

Is _initialize required to be called before any exported functions can be called? Or, the first time it calls into Go _initialize is called if not already? Or the Wasm execution engine always automatically calls _initialize on module load time, so it is guaranteed to be called first?

The expectation within the greater wasip1 ecosystem seems to be that if _initialize is exported by a module, it will be called before any exported methods are called. Our implementation wouldn't automatically call _initialize if it hasn't been called, it would likely just crash horribly.

In the absence of threading or stack switching capability in Wasm, the simplest option is to document that all goroutines still running when the invocation of the go:wasmexport function returns will be paused until the control flow re-enters the Go application.

So, this sounds like that at the end of the exported function, the Go runtime will not try to schedule other goroutines to run but directly return to Wasm? I assume this might be okay. But js.FuncOf seems to choose a different approach. This is also related to the discussion in #42372. Could you explain the reason for choosing this approach?

Yes, once the exported function returns, we would not schedule other available goroutines but return to the host. The reason for this is that we believe it's what users would expect to happen, since the runtime and various standard libraries maintain their own goroutines that would make it hard to predict the behavior and runtime of exported functions. If you believe that to be an incorrect assumption we're happy to reconsider this. Note that this also includes goroutines started by the exported function itself.

GODEBUG=wasmgoroutinemon=1

I'm not sure we want this debug mode. As you mentioned, it is probably not uncommon to have background goroutines. If one wants to ensure there is no goroutine at the time of exported function exiting, one probably can check it with runtime.NumGoroutine.

This is a fair point, and we could certainly slim down the proposal by removing this and consider it as a future addition. Thanks!

@cherrymui
Copy link
Member

Sounds good, thanks.

I guess it might be fine to return to the host when the exported function returns. I guess one question is when the "background" goroutines run. If the exported functions get called and return, but none of them explicitly wait for the background goroutines, the background goroutines will probably never run? Would that be a problem for, say, timers?

@johanbrandhorst
Copy link
Member Author

The background goroutines could run again if the exported function gets called again. I think ideally users who want concurrent work in exported functions would utilize something like a sync.WaitGroup to ensure work is completed during the execution of the function. A future proposal might be able to tackle this by exposing something like _gosched to run all goroutines until asleep, but this proposal does not account for such a feature. Also, since Threads is stable in Wasm, we may be able to just spawn new threads in the near future, which could execute in parallel to the exported function.

@achille-roussel
Copy link
Contributor

The problem of having goroutines blocked after the export call returned isn't much different from what happens when invoking an import. When a WebAssembly module calls a host import, it yields control to the WebAssembly runtime; no goroutines can execute during that time.

The issue is amplified with exports because the WebAssembly runtime could keep the module paused for extended periods of time, and the expectation is that imports usually return shortly after they were invoked, but it isn't fundamentally different.

Despite the limitations, we can still deliver incremental value to Go developers by allowing them to declare exports.

@inliquid
Copy link

@johanbrandhorst when it comes to background goroutines, do you know if the proposed solution different from tinygo which supports exported functions?

@ydnar
Copy link

ydnar commented Jan 30, 2024

@johanbrandhorst when it comes to background goroutines, do you know if the proposed solution different from tinygo which supports exported functions?

We have a separate PR to TinyGo that prototypes the same model, suspending and resuming goroutines on an export call.

@cherrymui
Copy link
Member

The background goroutines could run again if the exported function gets called again.

If the exported function (or another exported function) gets called again, and that function returns without explicitly synchronizing or rescheduling, the background goroutine may still not run? I think blocking for a little while is not a problem, but it might be a problem if it never get to run (while the exported function get called again and again)?

As you mentioned, once we have thread supports, it may not be a problem.

@johanbrandhorst
Copy link
Member Author

It's true that goroutines may never get to run if there's no point in the exported function to yield to the runtime. I think that's still what I would expect to happen if I wrote my exported function this way. All alternatives would be more confusing I think (waiting before returning or maybe running the scheduler before executing the exported function).

As you say, we can hopefully improve this with threads support in the future.

@cherrymui
Copy link
Member

Okay. This is probably fine. We can change it later if there is any problem. Thanks.

@johanbrandhorst
Copy link
Member Author

I've removed the GODEBUG option, we can add that as an enhancement later and suggest users use NumGoroutines() for their debugging needs for now.

@cherrymui
Copy link
Member

Given the similarity between the proposed build mode and the c-archive build mode on other platforms, could we just use the c-archive build mode to mean this on Wasm? The Go module is probably called from Wasm that is compiled from C. So c-archive makes some sense. Thanks.

@johanbrandhorst
Copy link
Member Author

Given the similarity between the proposed build mode and the c-archive build mode on other platforms, could we just use the c-archive build mode to mean this on Wasm? The Go module is probably called from Wasm that is compiled from C. So c-archive makes some sense. Thanks.

I'm hesitant to overload the meaning of the c-archive build mode in this case, for three primary reasons:

  1. I don't know that the assertion that the Go module will be called from C compiled to Wasm is true. At least one primary use case I have in mind for this is to call Go-compiled Wasm from Go programs running Wazero, as a way to provide plugins for arbitrary Go applications. This is one of the most popular use cases for wasip1 as far as I can tell (see examples in the background of this proposal). It would be a disservice to the Wasm ecosystem, which is fundamentally origin-language-agnostic, to tie this functionality to the c-archive build mode.
  2. The behavior of the Wasm exported binary is different from that of a c archive. There is no shared memory space, there are no threads, and so it might invite confusion for users what expectations can be had of the uses of the compiled binary.
  3. Using c-archive for wasip1 would set a precedent for future Wasm export implementations (e.g. a hypothetical wasip2) to continue using this build mode while they may differ significantly in behavior from that of wasip1.

I'm sympathetic to the concern of build mode bloat, especially as this build mode would not be reused for a hypothetical wasip2 port, but I do believe it to be in the best interest of the user.

@ydnar
Copy link

ydnar commented Feb 9, 2024

Further, a hypothetical GOOS=wasip2 would likely use something akin to -buildmode=wasm-component, which would emit a component, effectively a superset of a Wasm module.

Today, in our work to support WASI Preview 2 in TinyGo, the build process is 3-phase: 1) compile a Wasm module, 2) decorate the module with WIT metadata, and 3) convert the Wasm module to a component:

tinygo build -target=wasip2 -x -o main.wasm ./cmd/wasip2-test
wasm-tools component embed -w wasi:cli/command $(tinygo env TINYGOROOT)/lib/wasi-cli/wit/ main.wasm -o embedded.wasm
wasm-tools component new embedded.wasm -o component.wasm

Currently the second and third phases are implemented in Rust in the wasm-tools program. I suspect we’d like to implement that functionality directly in the Go toolchain so it can natively generate a component.

To color this bikeshed, I’d advocate for -buildmode=wasm-module or -buildmode=wasm-reactor, not tying it to a specific GOOS.

@cherrymui
Copy link
Member

not tying it to a specific GOOS.

I'm also leaning towards this, even if we don't reuse c-archive. I think it is possible to use the same build mode on wasip1, wasip2, and possibly eventually wasi. The implementation can be slightly different. As long as they are not vastly different, it would be fine.

I don't know that the assertion that the Go module will be called from C compiled to Wasm is true.

c-archive doesn't have to be called from C. It could be called from code compiled from other languages as long as it uses C ABI.

@johanbrandhorst
Copy link
Member Author

I'm worried that trying to name something now to reuse in future WASI ports is going to be a futile endeavor because there is still so much unknown about wasip2 and wasi and how they will relate to Go. To give some examples:

  1. A "wasm module" is defined by the Wasm spec. It has exports, funcs, even a start section, which is used to initialize the state of a module (similarly to the _initialize export in wasip1). But this is not used by wasip1 or WASI preview 2 to my knowledge.
  2. A "wasm component" is defined by the WASI preview 2 (I'm struggling to find an exact definition of "component" in this documentation). It has a different ABI from a "wasm core module" (AKA "wasm module").
  3. A "wasm reactor" is a strictly WASI preview 1 concept and the terminology has been abandoned for future WASI versions.

So which name to choose that makes sense to users now and in the future? Decisions like "should we commit to wasm modules since it's part of the code spec or wasm components since it's part of the wasip2 component model?" are things I'd rather defer until a future proposal that has to consider wasip2 in its entirety, once the dust has settled on the new ABI. This is why I think it's going to be difficult to name this anything but a very wasip1-specific name. Our first implementation of wasip2 might not even support exports.

@ydnar
Copy link

ydnar commented Feb 9, 2024

I'm worried that trying to name something now to reuse in future WASI ports is going to be a futile endeavor because there is still so much unknown about wasip2 and wasi and how they will relate to Go.

Here is a working implementation of WASI Preview 2 in Go: https://github.com/ydnar/wasm-tools-go/tree/main/wasi

So which name to choose that makes sense to users now and in the future? Decisions like "should we commit to wasm modules since it's part of the code spec or wasm components since it's part of the wasip2 component model?" are things I'd rather defer until a future proposal that has to consider wasip2 in its entirety, once the dust has settled on the new ABI. This is why I think it's going to be difficult to name this anything but a very wasip1-specific name. Our first implementation of wasip2 might not even support exports.

In a sense, a Wasm component is just a "reactor" that conforms to a specific export contract. The wasi:cli/command world exports a single function run that could call main.main using the exact go:wasmexport machinery proposed here.

Currently the wasm-tools toolchain converts a Wasm module with imports and exports conforming to a specific contract into a component. While I don’t think it’s ideal in the long term for Go to depend on a third-party tool to emit a valid WASI Preview 2 program, it’s a bridge that works today.

@ianlancetaylor
Copy link
Contributor

Build modes are scattered throughout all the tools, I'm slightly reluctant to define a new WASM-specific build mode. Especially given that I don't understand how it would differ from c-archive.

@johanbrandhorst
Copy link
Member Author

johanbrandhorst commented Feb 9, 2024

If we were to reuse c-archive, would we also add a build tag that is set when this build mode is set? I think it would be important to let users write Wasm libraries that can both be used by normal programs running main (AKA "Commands") and by programs exporting specific functions to the host (AKA "Reactors"). Since the proposal suggests causing compilation errors when //go:wasmexport is used when compiling "Commands" (since we do not export functions to the host in this mode), we'd need some way to exclude files defining these exports for library authors, easiest of which would be a build tag. But is introducing a build tag for an existing build mode going to cause any problems?

@ianlancetaylor
Copy link
Contributor

I don't see any major difficulty to adding a build tag if necessary.

@ydnar
Copy link

ydnar commented Feb 9, 2024

Given Ian’s comments, maybe it’s worth exploring a mechanism other than build mode.

@johanbrandhorst: For GOOS=wasip1, setting aside the presence of any user //go:wasmexport directives…is the difference between command and reactor mode simply whether the program exports _start (initializes runtime, calls main.main) vs _initialize (which initializes runtime, but does not call main.main)?

@johanbrandhorst
Copy link
Member Author

Essentially, yes. I'm happy to consider other mechanisms, but I do want it to be explicit, and there is something to be said for the parallel to the existing build mode c-shared.

@achille-roussel
Copy link
Contributor

achille-roussel commented Feb 9, 2024

Would we use c-archive or c-shared for the name? c-archive was first suggested, but the proposal and Johan's last message mentioned c-shared.

C archives are usually used during compilation. WASM modules are closer to C shared libraries in concept since they are loaded and linked at runtime.

The c-archive and c-shared build modes also use //export directives to locate the symbols exported in the build artifact. We would also update the build mode documentation to mention that either //export or //go:wasmexport is used depending on the target architecture.

@cherrymui
Copy link
Member

Since the proposal suggests causing compilation errors when //go:wasmexport is used when compiling "Commands"

Based on the discussion above #65199 (comment) , I think we agreed that we eventually want to support wasmexport for command. It would be great if we could just support both library and command at same time (personally I think the implementation would not be very different so it shouldn't too hard). But if you prefer supporting library first, then command later, that is probably fine. But maybe we don't want that to cause an error, which complicates things (like you mentioned, you may want a build tag). Maybe we document that wasmexport for command will be supported in the future but is ignored for now.

Given that Wasm's execution model is very different from other architectures, I think either c-archive or c-shared is probably fine. Is it possible for a Wasm module (command) starts running, and while it is running, it dynamically loads another module (library/reactor)? Or all modules have to loaded before any starts to run? If it is the former, I agree that it may be more similar to c-shared.

@ydnar
Copy link

ydnar commented Feb 10, 2024

Essentially, yes. I'm happy to consider other mechanisms, but I do want it to be explicit, and there is something to be said for the parallel to the existing build mode c-shared.

Maybe GOWASM=reactor?

@inliquid
Copy link

inliquid commented Feb 10, 2024

@johanbrandhorst

Since the proposal suggests causing compilation errors when //go:wasmexport is used when compiling "Commands" (since we do not export functions to the host in this mode), we'd need some way to exclude files defining these exports for library authors, easiest of which would be a build tag.

I think that would be different from how tinygo works? It allows compiling "Commands" with -target=wasi which then export functions to be used by the embedder. This mechanism is used in the wild for plugins, for instance here: https://github.com/knqyf263/go-plugin

Moreover we already use it in production, and we could eventually switch to go compiler if it behaves in a similar way.

There is main which sets up some globals (in order to "register" plugin) and it effectively compiles to _start which is exported by the module:
изображение

Runtime is wazero.

@johanbrandhorst
Copy link
Member Author

Based on the discussion above #65199 (comment) , I think we agreed that we eventually want to support wasmexport for command. It would be great if we could just support both library and command at same time (personally I think the implementation would not be very different so it shouldn't too hard). But if you prefer supporting library first, then command later, that is probably fine. But maybe we don't want that to cause an error, which complicates things (like you mentioned, you may want a build tag). Maybe we document that wasmexport for command will be supported in the future but is ignored for now.

Yeah, this would avoid the build tag question altogether. Reading through that again, I think it would mean that all exports would have to initialize the runtime when called, and all would have to call proc_exit before returning (seeing as a Command can be called at most once). In this form, all exports become equivalent to a main() function. I think my previous interpretation about reentrant calls is wrong given

Command instances may assume that they will be called from the environment at most once.

This seems to imply to me that only a single call from the host will take place. I don't know what that means for reentrant calls. If implemented as described above the reentrant call would call proc_exit before returning. Perhaps that is fine?

I'll think a little more about this and consider making changes to the proposal to remove the restriction of only allowing exports for reactors.

Given that Wasm's execution model is very different from other architectures, I think either c-archive or c-shared is probably fine. Is it possible for a Wasm module (command) starts running, and while it is running, it dynamically loads another module (library/reactor)? Or all modules have to loaded before any starts to run? If it is the former, I agree that it may be more similar to c-shared.

I'll have to ask around to answer this question but I'd think it's the latter.

Maybe GOWASM=reactor?

There's a section in the proposal about this:

We also considered using a GOWASM option instead, but this feels wrong since that environment variable is used to specify options relating to the architecture (existing options are satconv and signext), while this export option is dependent on the behavior of the "OS" (what functions to export, what initialization pattern to expect).

I still don't think that GOWASM is the right option and I'd sooner see us reuse buildmode=c-shared if we have to drop the custom build mode.

I think that would be different from how tinygo works? It allows compiling "Commands" with -target=wasi which then export functions to be used by the embedder.

I'm considering removing this restriction from the proposal, as Cherry suggested. The exact TinyGo implementation is a great source of data on how users are using exports but I don't think compatibility with TinyGo is a high priority for this proposal.

@4ad
Copy link
Member

4ad commented Feb 12, 2024

Using c-archive is a mistake. Wasm binaries are not static archives, they are dynamically loaded objects that can not be used at "compile-time" in any sort of meaningful way.

Using c-shared makes sense (but see below). Wasm modules of the type described in this proposal are shared libraries loaded at runtime. The fact that they are for a different ISA (Wasm) instead of the host ISA does not change this fundamental fact.

So we can reuse c-shared for now, but an issue with c-shared is that it is not future proof. When we get component model support, CM modules will be incompatible with core modules, and these core modules described in this proposal will be obsolete.

It seems impractical to support component model without having to introduce a new build mode in the future. Whether that is an argument for using c-shared vs. something custom now, I don't know.

Another argument against c-shared is that it implies some sort of compatibility with C (or its ABI), which is not really the case here.

@cherrymui
Copy link
Member

Reading through that again, I think it would mean that all exports would have to initialize the runtime when called, and all would have to call proc_exit before returning (seeing as a Command can be called at most once).

I don't think we want to do this. For a command, it is still expected to call _start to start the command, which will initialize the runtime and call main.main. The exports are used for wasm to call back to Go from a wasmimport host function. It is in the same instance. And calling an exported function before _start (or _initialize) is still considered an error. (For a library, we could consider calling an exported function before _initialize will initialize the runtime first, not sure if this is worth doing.)

Maybe GOWASM=reactor?

I agree with @johanbrandhorst that this is probably not the right approach. GOWASM is for "architecture" features. Reactor/library is not.

For each WASI version, will there be multiple ways for building a "library"? Or there will be predominately one? (It could be different for each WASI version, like, say, reactor for wasip1, component for wasip2?)

@johanbrandhorst
Copy link
Member Author

I don't think we want to do this. For a command, it is still expected to call _start to start the command, which will initialize the runtime and call main.main. The exports are used for wasm to call back to Go from a wasmimport host function. It is in the same instance. And calling an exported function before _start (or _initialize) is still considered an error. (For a library, we could consider calling an exported function before _initialize will initialize the runtime first, not sure if this is worth doing.)

I like this interpretation, but I don't know if it's correct. The exact wording (from https://github.com/WebAssembly/WASI/blob/256b651a3108610c076a12ec1915d9f9ca46e6b9/legacy/application-abi.md#current-unstable-abi) is:

_start is the default export which is called when the user doesn't select a specific function to call. Commands may also export additional functions, (similar to "multi-call" executables), which may be explicitly selected by the user to run instead.

It sounds to me like the user (via the host) can call any exported function, not just _start, and not just through reentrant calls into the same instance. I'd prefer we just support calling _start as you suggest but I'm worried that this would surprise users. What do you think?

For each WASI version, will there be multiple ways for building a "library"? Or there will be predominately one? (It could be different for each WASI version, like, say, reactor for wasip1, component for wasip2?)

For wasip1 I think there will be only one way, which is using _initialize as described in this proposal. It's unclear as yet to me what the exact options will be for wasip2. I don't know of anyone running "Wasm core modules" instead of "component model modules" but the standard is so new it's hard to say how it will be received by the ecosystem. I'd prefer not to make any decisions today about how to solve this in the future. I actually think using c-shared might be the best way to do that, since it leaves us open both to reusing c-shared for wasip2 and inventing our own new build-mode, if we want to use c-shared for Wasm core modules in wasip2.

I intend to update the proposal to support exports in Commands and switching to c-shared for the build mode.

@cherrymui
Copy link
Member

I'd prefer we just support calling _start as you suggest but I'm worried that this would surprise users.

I think that is fine. I think usually it is expected that a command has a single entry point. If there is a strong need for multiple entry point, we could add the support later. But a single entry point should be fine for now.

I actually think using c-shared might be the best way to do that, since it leaves us open both to reusing c-shared for wasip2 and inventing our own new build-mode, if we want to use c-shared for Wasm core modules in wasip2.

SGTM. Thanks.

@johanbrandhorst
Copy link
Member Author

I've updated the proposal to support exports in both "Reactors" and "Commands" and using the c-shared build mode to toggle the compilation of a "Reactor". The use of exports in "Commands" is restricted to only be allowed during the execution of the _start function through reentrant calls to host functions. I've removed references to a build tag as the wasip1 build tag is now sufficient to gate any use of go:wasmexport.

@inliquid
Copy link

The use of exports in "Commands" is restricted to only be allowed during the execution of the _start function through reentrant calls to host functions.

Not sure if I missed anything or not, but AFAIK atm implementations allow calling exports after _start is done. So for example wazero will call _start (or other start functions depending on config) when you invoke InstantiateModule and you are allowed to call module exports after that. This is basically how the plugin pattern works. What's the point to restrict that?

@johanbrandhorst
Copy link
Member Author

At the end of _start, we call proc_exit. From the point of view of the host, that means the "program is terminated": https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#-proc_exitrval-exitcode. The behavior of any calls to the instance at that point should probably be considered "undefined behavior". I'm not sure if we'll have to add something to actually refuse running functions after _start has returned, but it is not expected that users should call anything. Note also how the Application ABI doc explicitly states that

Command instances may assume that they will be called from the environment at most once. Command instances may assume that none of their exports are accessed outside the duration of that call.

If a user wants to call into an instance repeatedly, they will want to compile a "Reactor" though the c-shared build mode. Is there some use case that is not covered by this functionality?

@inliquid
Copy link

I think this behavior more or less explained by their RATIONALE.md:

Why do we only return a sys.ExitError on a non-zero exit code?

It is reasonable to think an exit error should be returned, even if the code is success (zero). Even on success, the module is no longer functional. For example, function exports would error later. However, wazero does not. The only time sys.ExitError is on error (non-zero).

This decision was to improve performance and ergonomics for guests that both use WASI (have a _start function), and also allow custom exports. Specifically, Rust, TinyGo and normal wasi-libc, don't exit the module during _start. If they did, it would invalidate their function exports. This means it is unlikely most compilers will change this behavior.

GOOS=waspi1 from Go 1.21 does exit during _start. However, it doesn't support other exports besides _start, and _start is not defined to be called multiple times anyway.

Since sys.ExitError is not always returned, we added Module.IsClosed for defensive checks. This helps integrators avoid calling functions which will always fail.

We use tinygo to compile Go to Wasm with -target=wasi, so that modules can be used as plugins. Not sure if reactor pattern supported by tinygo at all.

@johanbrandhorst
Copy link
Member Author

The plugin pattern I think you're referring to will be perfectly supported by the functionality introduced in this proposal. Note that it's also possible to do with a "Command" today (see the references in the Background section of the proposal), and will remain possible if this proposal is implemented.

@rsc
Copy link
Contributor

rsc commented Feb 14, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to add support for -buildmode=c-shared in wasm, and to add

//go:wasmexport [name]

as a directive analogous to what //export does when using cgo.

@x1unix
Copy link

x1unix commented Feb 20, 2024

@rsc I assume that it would work for libraries, but will it work for programs that work in the background?

For example, a program acts as a server worker that accepts requests from a page.
Let's assume that client calls a WASM exported function to send a request to a WASM server.
The server processes a request and calls the client back to pass a result.

package main

var (
  requests = make(chan []int)
)

//go:wasmimport myNamespace submitResponse
func submitResponse(result int)

//go:wasmexport receiveRequest
func receiveRequest(a, b int) {
   requests <- []int{a, b}
}

func main() {
   ctx := context.TODO()
   requests := make(chan []int)
   go listen(requests)
   <-ctx.Done()
}

func listen() {
   for req := range requests {
       // do some logic and call client.
       submitResponse(42)
   }
}

Will that case work?

@johanbrandhorst
Copy link
Member Author

The proposal makes it clear what can be expected in this situation: When receiveRequest is called from the host (presumably to handle some request), the runtime will wake up (having either previously handled some other export and returned to the host, or been initialized using _initialize and returned to the host), a new goroutine calling receiveRequest will be created, and the scheduler will run. Any of the available goroutines to run at this point (which may include goroutines started by runtime initialization, during previous export invocations, or the goroutine just created to handle the request) may be scheduled.

In this case, because receiveRequest blocks on a channel send, the function would yield back to the runtime, the scheduler would eventually schedule listen, and the request would be handled by submitResponse, which calls back into the host to submit the response. This host call is blocking and no goroutines are scheduled while it is running. Once it returns, listen will go back to blocking on reading from requests and yield back to the runtime, which will schedule receiveRequest, which is now unblocked. Once it returns, the runtime will return back to the host, and no other goroutines will be handled.

That is my understanding of the runtime and scheduler behavior, which is admittedly hazy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly issues Proposal
Projects
Status: Active
Development

No branches or pull requests