The proposal adds a new set of interface types to WebAssembly that describe high-level values. The proposal is semantically layered on top of the WebAssembly core spec and can be implemented in terms of an unmodified core wasm engine.
This proposal assumes the multi-value and module linking proposals as well
as the let
instruction of the function references proposal.
- Motivation
- Additional Requirements
- Proposal
- An End-to-End Example
- Adapter Fusion
- Use Cases Revisited
- FAQ
- TODO
As a compiler target, WebAssembly provides only low-level types that aim to be as close to the underlying machine as possible, allowing each source language to implement its own high-level types efficiently in terms of the low-level types. However, when modules from multiple languages need to communicate with each other or the host, there remains the question of how to exchange high-level values.
To help motivate the proposed solution, we consider 4 use cases. After the proposal is introduced, the use cases are revisited below.
While many WASI signatures can be expressed in terms of i32
s and,
in the future, references to type imports, there is still a need
for WASI functions to take and return compound value types like strings or
lists. Currently, these values are specified in terms of linear memory. For
example, in path_open
, a string is passed as a pair of i32
s which
represent the offset and length, respectively, of the string in linear memory.
However, specifying a single, fixed representation of data types like strings will become problematic when:
- with the GC proposal, the caller wants to pass a
ref array u8
; - the host has an opaque native string type which WASI would ideally accept without copy into and out of wasm linear or GC memory;
- more than one string encoding is supported.
Another problem with passing i32
offsets into linear memory is that the
calling convention implicitly requires the callee to have access the caller's
memory. Thus, all WASI client modules currently must export their memory which
is then accessed by the WASI implementation in various ad hoc ways.
Ideally, WASI APIs would be expressed in terms of representation-agnostic high-level value types and the caller wouldn't need to export their memory.
A long-standing source of friction between WebAssembly and the rest of the
Web platform (and the original motivating use case for this proposal) is that
calling Web APIs from WebAssembly usually requires thunking through JavaScript
which hurts performance and adds extra complexity for developers. Technically,
since they are exposed as JavaScript functions, Web IDL-defined functions can
be directly imported and called by WebAssembly through the JS API. However,
Web IDL functions usually take and produce high-level Web IDL types like
DOMString
and Sequence while WebAssembly can only pass numbers. With the
newly-added externref
, WebAssembly can import JavaScript functions that
create JavaScript values that can be passed to the Web IDL functions via the
ECMAScript Binding, but the performance of this approach may be even worse
than the JIT-optimized glue code.
Ideally, Web API calls would be statically type checked and passed Web IDL values created directly by wasm.
To fully realize the potential of WebAssembly, it should be possible for a WebAssembly module author to target an output profile that maximizes the number of clients that can reuse their module (where a "client" can be another WebAssembly module, a native language runtime that embeds WebAssembly or some other kind of host system). As part of this maximum-reuse profile, the language and toolchain that a module author uses should be kept an encapsulated module implementation detail allowing module clients to independently choose their own language and toolchain. This in turns allows for the emergence of a single, unified WebAssembly ecosystem of maximally-reusable modules. In contrast, the path of least resistance, and thus the default outcome if no effort is made to the contrary, is for the WebAssembly ecosystem to be fragmented along the lines of language and toolchain.
It's important to scope this use case to avoid aiming for automatic seamless integration between arbitrary languages, as this has generally been shown to be an intractable problem. Instead, we can observe the general success of shared-nothing architectures in allowing unrelated languages to interoperate. Popular examples include Unix pipes connecting separate processes and HTTP APIs connecting separate (micro)services. A shared-nothing architecture partitions a whole application into multiple isolated units that encapsulate their mutable state; shared mutable state is either banned or significantly restricted. When multiple languages are used, then, it's natural to put separate languages into separate isolated units. In raw WebAssembly terms, a natural shared-nothing unit is a module which only imports and exports functions and immutable values, but not memories, tables or mutable globals.
While existing shared-nothing architectures tend to incur communication
overhead due to system calls, context switches and extra copying, WebAssembly
can leverage its lightweight sandbox and use synchronous function calls to
avoid these sources of overhead (leaving asynchronous communication to be
provided by the host via API). Thus, instead of pipe or channel APIs,
shared-nothing WebAssembly modules can use function imports to directly call
across shared-nothing boundaries. In OS terms, this is analogous to the
synchronous IPC used for performance in some microkernels. However, this
approach has an obvious challenge in WebAssembly today: without a shared linear
memory, how can a function call pass values larger than the fixed-size core
wasm types (i32
, i64
, etc)?
One possible solution would be to leverage the upcoming GC proposal to pass
immutable GC objects across shared-nothing boundaries. However, this would add
an unnecessary dependency on GC when the clients and producers did not
otherwise require GC. Moreover, such an approach would require needless extra
copying and garbage when both sides used linear memory or mutable GC memory.
Lastly, while the GC proposal has types like struct
and array
that are
higher-level than linear memory, the GC proposal still seeks to be as close to
an assembly language as safety, portability and performance allow. Thus,
language and toolchain details are still likely to bubble up into a module's
public interface. Ideally, shared-nothing modules would be able to define their
function imports and exports using expressive high-level value types without
requiring GC or incurring extra copies or garbage.
To properly address the above motivating use cases, there are some additional requirements which impact the set of available solutions:
- A solution should allow a wide variety of value representations, avoiding the need for intermediate (de)serialization steps. Ideally, this would include even lazily-generated data structures (e.g., via iterators, generators or comprehensions).
- A solution should not force the exclusive use of linear or GC memory.
- A solution should allow efficient conversion to and from opaque host values that exist outside of any wasm memory when the host values are compatible.
- A solution should not require O(n)-sized engine-internal allocations of temporary values.
- A solution should not have subtle performance cliffs or depend on too much compiler magic.
- A solution should allow efficient and robust backwards-compatible evolution of API signatures.
- The proposal should still allow for the use of "shared-everything" linking (analogous to native dynamic linking) as a low-level mechanism for factoring common library code out of shared-nothing modules.
The proposal introduces first the new types, then the new modules and functions that can contain the new types, then lastly the new instructions that can be used in the new functions for producing and consuming the new types.
To complement core wasm's set of low-level value types (i32
, i64
, ...),
this proposal defines a new set of high-level value types called interface
types. In the core spec, core value types are formally defined by the
valtype
grammar. For interface types, the analogous set is intertype
,
defined by the following grammar:
intertype ::= f32 | f64
| s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64
| char
| (list <intertype>)
| (record (field <name> <id>? <intertype>)*)
| (variant (case <name> <id>? <intertype>?)*)
f32
and f64
are the same types that appear in valtype
and thus valtype
and intertype
intersect at these two types. char
is defined to be a
Unicode scalar value (i.e., a non-surrogate code point).
Unlike core wasm's valtype
, which, starting with function references,
allows cyclic type definitions, intertype
type definitions are required to be
acyclic. This restriction follows from the fact that interface types are meant
to be used at inter-language boundaries to copy values and matches the
restrictions of other IPC/RPC schema languages.
Additionally, while core wasm's valtype
contains reference values which refer
to mutable state, intertype
compound values are transitively immutable. One
important consequence of this distinction is that, while valtype
reference
subtyping is non-coerceive and constrained by low-level memory layout
considerations, intertype
subtyping can be highly coercive, thereby allowing
flexible, backwards-compatible evolution of interface-typed APIs. Since copying
is inherent in the use of interface types, coercions can be folded into the
copy. A summary of the allowed coercions is given by the following table:
Type | Coercion allowed |
---|---|
f32 , f64 |
f32 to f64 |
s8 , u8 , s16 , u16 , s32 , u32 , s64 , u64 |
if the source range is included in destination range |
char |
none |
list |
if the list's element type can be coerced |
record |
if the record's fields' types can be coerced, matching fields by name regardless of order and allowing source fields to be ignored |
variant |
if the variant's cases' types can be coerced, matching cases by name regardless of order and allowing destination cases to be ignored |
While these types are sufficiently general to capture a number of
more-specialized types that often arise in interfaces, it can still be
beneficial to have the specialized types explicitly represented in API
signatures. For example, this allows specialized source-language types to be
used in automatically-generated source-level API declarations. Thus, the
proposal contains several text-format and binary-format abbreviations. In the
notation below, the types on the left-hand side of the ≡
are expanded during
parsing/decoding into the abstract syntax of the right-hand side. Thus,
validation and execution are purely defined in terms of the general type; the
abbreviations only exist in the concrete formats.
string ≡ (list char)
(tuple <intertype>*) ≡ (record ("𝒊" <intertype>)*) for 𝒊=0,1,...
(flags <name>*) ≡ (record (field <name> bool)*)
bool ≡ (variant (case "false") (case "true"))
(enum <name>*) ≡ (variant (case <name>)*)
(option <intertype>) ≡ (variant (case "none") (case "some" <intertype>))
(union <intertype>*) ≡ (variant (case "𝒊" <intertype>)*) for 𝒊=0,1,...
(expected <intertype>? (error <intertype>)?) ≡ (variant (case "ok" <intertype>?) (case "error" <intertype>?))
Thus far, interface types are fairly normal and what you might expect to see in a functional language or protocol schema. However, in support of the particular use cases and requirements of this proposal, interface types have some atypical properties described next.
One of the additional requirements listed above is
to avoid intermediate O(n) copies (in time or memory use) of interface values.
This requirement may seem to be at odds with the above introduction of a new
set of immutable intertype
values since a standard eager
interpretation of intertype
would imply the need to make a temporary copy at
the point where the intertype
value was produced. While one could hope that
the copy could be optimized away by a sufficiently-smart compiler, such
optimizations would almost never apply in practice due to the typical
interleaving of arbitrary core wasm execution between producing and consuming
instructions.
To reliably satisfy the performance requirements, intertype
values are
instead given a lazy interpretation. Thus, intertype
values have the form (instruction operands*)
where instruction
is one of
the producer instructions introduced below
and operands*
is a list of core wasm values. When an intertype
value is
consumed by one of the consumer instructions (also introduced below),
only then is the instruction
applied to its operands*
. Thus, with no
possible intervening side effects, a compiler will never have to create an
intermediate O(n) copy.
As an example, a lazy string
value might have the form:
(list.lift string $liftChar (i32.const 100) (i32.const 20))
where list.lift
is an instruction that
produces a list given a set of immediates and operands that will be
explained below. When this lazy string
value is consumed (e.g., by
list.lower
), only then will the
list.lift
instruction be executed to produce the abstract sequence of
characters.
Laziness is described in more detail below in conjunction with individual lifting and lowering instructions.
While lazy evaluation solves the problem of intermediate copies, it creates a
new problem when the lazy value refers to dynamic allocations that must be
released after being read. The solution is to give all lazy values an
additional, optional "destructor" function that is called when the lazy value
is popped (either by being explicitly consumed or implicitly popped as part
of a control flow instruction). Thus, the form of a lazy value becomes
(instruction operands* destructor?)
. When destructor
is called, it is also
passed operands*
, allowing the operands*
to serve as the destructor's
"closure state". From a C++ or Rust perspective, interface values are like
locally-scoped objects with destructors.
Destructors are described in more detail below in conjunction with individual lifting and lowering instructions.
Given these laziness and ownership properties, interface types are restricted to be affine, meaning that their values can't be duplicated or consumed multiple times. This restriction solves two problems:
- Copying a lazy interface value should not make an O(n) copy, but having two lazy values referring to the same underlying dynamic allocation will lead to double-free when two destructors are called.
- The instructions executed lazily for interface values can have arbitrary side effects and thus the ability to consume an interface value multiple times can lead to surprising behaviors on both the producer and consumer ends, reducing overall ecosystem robustness.
Affine typing is described in more detail more below in conjunction with adapter functions.
Interface types have one last restriction, above and beyond affinity:
interface types cannot be used as the param
of a loop block signature. This
restriction ensures that interface values only flow "forward" which allows an
implementation to easily snapshot the operands*
of an interface value into
function locals, instead of needing to pass around the operands*
as a
first-class tuple.
The use of this property is discussed more below in the context of a simple compilation scheme for erasing laziness at compile-time.
This proposal is layered on top of the core wasm spec and thus interface types can't be used directly in core modules. Instead, the proposal defines a new kind of module that is able to use interface types and may also nest core modules. This new kind of module is called an adapter module and it contains adapter functions that are allowed to use interface types and instructions. Adapter modules embed core modules using the concepts introduced by the module linking proposal.
Adapter functions are structured the same as core functions, with three main differences:
- Adapter functions are a different kind of definition than core functions so:
- In the text format, adapter functions start with
adapter_func
instead offunc
. - In the binary format, adapter functions go into a new adapter function section and populate a new adapter function index space.
- In the text format, adapter functions start with
- In place of core wasm's
valtype
, adapter functions allowadaptertype
. - In place of core wasm's
instr
, adapter functions allowadapterinstr
.
where adaptertype
and adapterinstr
are defined as:
adaptertype ::= valtype | intertype
adapterinstr ::= instr | ... new instructions introduced below
Thus, adapter function are allowed to contain a mix of both core and interface types and instructions and can be considered a superset of core functions.
The validation rules for core functions are also extended to validate the new interface types and instructions. An important constraint that must be preserved by all rules is the affinity of interface types mentioned above. This is achieved by the following validation rules:
- Interface types are not allowed in locals (function- or
let
-scoped), since locals can be read multiple times. - Function parameters are changed from being implicit locals to describing
the initial contents of the operand stack, just like block parameters.
Consequently,
(param)
declarations in functions lose their identifier.
Other than affine typing restrictions, interface types may be used with the
parametric instructions and control instructions. For example, the
following adapter function uses interface types in conjunction with the core
if
, return
and drop
.
(adapter_func $return_one_of (param string string i32) (result string)
i32.eqz
(if (param string string) (result string)
(then return)
(else drop return))
)
Since locals are used not just for duplicating values, but also for reordering
the stack, adapter functions introduce a new rotate
instruction for moving
a value to the top of the stack:
rotate $n : [ Tn+1 ... T0 ] -> [ Tn ... T0 Tn+1 ]
rotate
is a pure restriction over what can be achieved with let
, but could
be a valuable addition to core wasm due to its significantly smaller encoding,
reduced validation cost and improved static use-def information (for a
first-tier-compiler that doesn't build SSA). (See also design/#1381.)
Since adapter functions occupy a distinct index space from core functions,
they cannot be called or indirectly referenced with call
, call_indirect
,
ref.func
, call_ref
, etc. Instead, a new call_adapter
instruction is
added for calling adapter functions:
call_adapter $f : [ P* ] -> [ R* ]
where:
- $f : (adapter_func (param P*) (result R*))
- the index of $f is less than the index of the containing adapter function
The core instructions for indirectly calling functions are not replicated for adapter functions. The reason for only allowing direct, non-recursive adapter calls is so that, at instantiation time, the wasm engine can have a complete, precise callgraph of all adapter function calls, allowing the engine to predictably compile lazy evaluation into eager evaluation at instantiation-time. This is discussed more below.
Like adapter functions, adapter modules have the same structure as core modules in both their text and binary format. The main differences are:
- Adapter modules are a different kind of definition than core modules so:
- In the text format, adapter modules start with
adapter_module
instead ofmodule
. - In the binary format, an adapter module preamble sets a bit indicating that
this is an adapter module (backwards-compatibly reframing the 32-bit
version
field as two 16-bitversion
andkind
fields).
- In the text format, adapter modules start with
- Adapter modules may not contain
func
,memory
,table
,global
,elem
ordata
definitions. Onlytype
,import
,export
,module
,instance
andalias
definitions are allowed. - Adapter modules may additionally contain
adapter_func
,adapter_module
andadapter_instance
definitions.
For example, a trivial example of an adapter module is:
(adapter_module
(import "duplicate" (adapter_func $dup (param string) (result string string)))
(import "print" (adapter_func $print (param string)))
(adapter_func (export "print_twice") (param string)
call_adapter $dup
call_adapter $print
call_adapter $print
)
)
While this example shows that it's possible to write a pure adapter module, the main point of an adapter module is to adapt the imports or exports of a core module. To do this, adapter modules use and extend the concepts added by the module linking proposal to import and nest core modules and then locally instantiate these core modules, passing in adapter functions as imports and calling exports from adapter functions.
For example, the following adapter module adapts a core module export (with the
...
to be filled in with the new instructions introduced
below):
(adapter_module
(module $CORE
(func $transform (export "transform") (param i32 i32) (result i32 i32)
;; core code
)
(memory $memory (export "memory") 1)
)
(instance $core (instantiate $CORE))
(adapter_func (export "transform") (param (list u8)) (result (list u8))
...
call $core.$transform
...
)
)
Imports can also be adapted, although doing so requires a bit more plumbing due to the acyclic nature of instantiation:
(adapter_module
(import "print" (adapter_func $print (param string)))
(adapter_module $ADAPTER
(import "print" (adapter_func $originalPrint (param string)))
(adapter_func $print (export "print") (param i32 i32)
...
call_adapter $originalPrint
)
)
(adapter_instance $adapter (instantiate $ADAPTER (adapter_func $print)))
(module $CORE
(import "print" (func $print (param i32 i32) (result i32 i32)))
(func $run (export "run")
;; core code
)
)
(instance $core (instantiate $CORE (adapter_func $adapter.$print)))
(adapter_func (export "run")
call $core.$run
)
)
Here, the inner $ADAPTER
module is instantiated first so that its export
($adapter.$print
) can be used to instantiate the inner $CORE
module. Notice
that the core instantiate
instruction has been extended to additionally
accept adapter functions.
Just as modules have a module type, adapter modules have an adapter-module type. For example, the type of the preceding adapter module is:
(adapter_module
(import "print" (adapter_func (param string)))
(export "run" (adapter_func))
)
Similarly, just as module instances have an instance type, adapter modules' instances have an adapter-instance type:
(adapter_instance
(export "run" (adapter_func))
)
In the above example, because the run
adapter function does no actual
adaptation, the underlying $core.$run
function could just as well have
been exported directly:
(export "run" (func $core.$run))
According to module linking, the dotted notation $core.$run
is syntactic
sugar for an alias definition:
(alias $core_run (func $core $run))
(export "run" (func $core_run))
With this change to the above adapter module, the adapter-module type would become:
(adapter_module
(import "print" (adapter_func (param string)))
(export "run" (func))
)
This example demonstrates how adapter-module types can contain a mix of both core and adapter definitions. Similarly, core memories, tables and globals can be imported and exported.
Along with each new interface type, the proposal adds new instructions for producing and consuming that interface type. These instructions fall into two categories:
- Lifting instructions produce interface types, going from lower-level types to higher-level types.
- Lowering instructions consume interface types, going from higher-level types to lower-level types.
By having the conversion between valtype
and intertype
explicit and
programmable, this proposal allows a wide variety of low-level representations
to be lifted and lowered without any intermediate serialization step. This
includes even data structures that are lazily- or dynamically-generated from
constructs such as iterators, generators or comprehensions.
The proposal includes a matrix of integral conversion instructions which allow converting between the 2 sign-less core integer types and the 8 explicitly-signed interface integer types:
ct ::= i32 | i64
it ::= u8 | s8 | u16 | s16 | u32 | s32 | u64 | s64
<it>.lift_<ct> : [ ct ] -> [ it ]
<ct>.lower_<it> : [ it ] -> [ ct ]
where:
- bitwidth(ct) >= bitwidth(it)
For lifting, if bitwidth(ct)
is greater than bitwidth(it)
, then only the
least-significant bitwidth(it)
bits of the ct
value are used. For lowering,
the bitwidth validation-time restriction ensures that out-of-range errors
cannot occur. In both cases, the signedness of ct
is implied by the sign of
it
.
Since there can be no dynamic allocation read lazily by integer lifting, there is no destructor immediate.
As an example usage, the following adapter module converts the implicitly-signed
i32
into the explicitly-unsigned u32
:
(adapter_module $ADAPTER
(module $CORE
(func $get_num (export "get_num") (result i32)
(i32.const 0xffffffff)
)
)
(instance $core (instantiate $CORE))
(adapter_func (export "get_num") (result u32)
(u32.lift_i32 (call $core.get_num))
)
)
Were $CORE.$get_num
to be called directly by the JS API, the i32
value
0xffffffff
would be interpreted by ToJSValue
as the signed value -1.
With the adapter module, however, get_num
explicitly returns a u32
and
thus ToJSValue
would know to interpret the bits as 232-1.
A char
is strictly defined to be a Unicode Scalar Value (USV), which
effectively means a positive integer value in either the range [0, 0xD7FF] or
[0xE000, 0x10FFFF], inclusive. Thus, lifting and lowering simply map between
i32
and char
:
char.lift : [ i32 ] -> [ char ]
char.lower : [ char ] -> [ i32 ]
If the i32
passed to char.lift
is outside the USV range, the instruction
traps. Thus, uses of char.lower
can statically assume that the produced i32
is a valid, in-range USV. Even though interface types are generally lazy, the
char.lift
range check is performed eagerly.
Since there can be no dynamic allocations read lazily by char.lift
, the
instruction has no destructor immediate.
While it may initially appear that these instructions are implicitly assuming
a 4-byte (UTF-32) encoding, that is not the case. Rather, char.lift
is to
be used after a code point has been decoded from linear memory and
char.lower
is to be used before encoding the code point into linear memory.
Thus the containing adapter function is in full control of the encoding used.
More interesting than how individual characters are passed between modules,
though, is how whole strings of characters are passed. Doing this requires
passing a (list char)
and using the list lifting and lowering instructions
which are introduced next.
Since lists contain interface-typed elements, list lifting and lowering
instructions must recursively define how to lift and lower their elements.
Lifting is performed by the list.lift
instruction:
list.lift $List $done $liftElem $destructor? : [ T:<valtype>* ] -> [ $List ]
where:
- $List refers to a (list E:<intertype>)
- $done : [ T* ] -> [ done:i32, U* ]
- $liftElem : [ U* ] -> [ E, T* ]
- $destructor : [ T* ] -> [], if present
To produce a $List
, list.lift
will repeatedly perform the following:
- Call
$done
, to determine whether the iteration is complete. - Call
$liftElem
, to lift the next element to append to the list.
The T*
and U*
tuples define the loop-carried state of the iteration
and can be used to hold indices, pointers or general data structure state.
The loop-carried state is passed as follows:
- The initial
T*
operands passed tolist.lift
are passed to the first call to$done
. - The
U*
results of$done
are passed to$liftElem
(if$done
doesn't return a nonzerodone
). - The
T*
results of$liftElem
are fed into the next call to$done
.
With this design, list.lift
allows a wide variety of list representations to
be lifted directly without any preparatory serialization:
- a homogeneous array, with
$liftElem
incrementing an index or pointer; - a linked list, with
$liftElem
following a pointer to the next node; - a
string
, with an arbitrary variable-length encoding of itschar
elements; - an ordered tree, with
$liftElem
pushing and popping a stack of parents; - a list comprehension, with
$liftElem
calling into arbitrary user-defined code.
List lowering is somewhat simpler since $lowerElem
is simply called once for
each element of the incoming list.
list.lower $List $lowerElem : [ T:<valtype>*, $List ] -> [ T* ]
where
- $List refers to an (list E:<intertype>)
- $lowerElem : [ E, T* ] -> [ T* ]
As with list.lift
, T*
describes the loop-carried state with the T*
values
being threaded through each invocation of $lowerElem
. By having T*
in both
the inputs and outputs of list.lower
, an adapter function can lower into a
wide variety of list implementations including those with up-front allocation
and incremental (re)allocation during lowering.
Since interface values are computed lazily,
list.lift
is evaluated lazily. This means that, when list.lift
is executed
in program order, it semantically produces a tuple containing the list.lift
instruction (with its immediates) and a copy of the T*
operands. It is only
when a list
is consumed by list.lower
that list.lift
will start calling
$liftElem
. Thus, any linear memory allocations read by $liftElem
must be
kept valid until list.lower
completes. If needed, list.lift
's $destructor
immediate can be used to promptly release dynamic allocations once the list
is popped.
As an example usage, a (list s32)
can be lifted from a dynamically-allocated
contiguous array as follows:
(adapter_func $done (param i32 i32) (result i32 i32 i32)
(let (local $ptr i32) (local $end i32)
(return (i32.eq (local.get $ptr) (local.get $end))
(local.get $ptr)
(local.get $end)))
)
(adapter_func $liftElem (param i32 i32) (result s32 i32 i32)
(let (local $ptr i32) (local $end i32)
(return (s32.lift_i32 (i32.load (local.get $ptr)))
(i32.add (local.get $ptr) (i32.const 4))
(local.get $end)))
)
(adapter_func $free (param i32 i32)
(let (local $ptr i32) (local $end)
(call $libc.$free (local.get $ptr)))
)
(adapter_func $liftArray (param i32 i32) (result (list s32))
(let (local $ptr i32) (local $end i32)
(return (list.lift (list s32) $done $liftElem $free (local.get $ptr) (local.get $end))))
)
and lowered into a linked list of i32
s as follows:
(adapter_func $lowerElem (param s32 i32) (result i32)
(let (param s32) (local $prevNext i32) (result i32)
i32.lower_s32
(call $core.$malloc (i32.const 8))
(let (local $srcVal i32) (local $dst i32)
(i32.store (local.get $dst) (local.get $srcVal))
(i32.store (local.get $prevNext) (local.get $dst))
(i32.add (local.get $dst) (i32.const 4))))
)
(adapter_func $lowerLinkedList (param (list s32)) (result i32)
(call $core.$malloc (i32.const 4))
(let (param (list s32)) (local $container i32)
(list.lower (list s32) $lowerElem (local.get $container))
(i32.store (i32.const 0))
(return (local.get $container)))
)
If $liftArray
is called in adapter module A to produce a (list s32)
value
that is passed to $lowerLinkedList
in adapter module B, the net result will
be to convert from A's array representation into B's linked-list representation
and then free A's memory.
While the above lifting and lowering instructions allow a wide variety of list
representations, this generality comes at the cost of performance. To see why,
consider what would happen if we wanted to instead lower the above (list s32)
into a contiguous array. Because we don't know the size up-front, we'll need
some sort of dynamic reallocation strategy. While the cost of a geometric
reallocation strategy is amortized O(n), in practical terms, it will still be
slower than if we could have simply allocated the required memory up-front.
However, requiring all lists to supply a length up-front would be problematic
for various list representations like generators, iterators and strings.
Thus, the following two instructions are added to give list producers the option of supplying an up-front element count:
list.lift_count $List $liftElem $destructor? : [ T*, count:i32 ] -> [ $List ]
where:
- $List refers to an (list E:<intertype>)
- $liftElem : [ T* ] -> [ E, T* ]
- $destructor : [ T* ] -> [], if present
Because count
defines the number of iterations up-front, the $done
function
immediate isn't necessary and the entire instruction gets much simpler. For
example, the previous $liftArray
adapter function could be rewritten more
succinctly as:
(adapter_func $liftElem (param i32) (result s32 i32)
(let (local $ptr i32)
(return (s32.lift_i32 (i32.load (local.get $ptr)))
(i32.add (local.get $ptr) (i32.const 4))))
)
(adapter_func $liftArray (param i32 i32) (result (list s32))
(let (local $ptr i32) (local $end i32)
(return (list.lift (list s32) $liftElem
(local.get $ptr)
(i32.shr_u (i32.sub (local.get $end) (local.get $ptr)) (i32.const 2)))))
)
On the consumer end, list
consumers cannot depend on list.lift_count
. Rather,
consumers can query whether a count is present with list.has_count
:
list.has_count : [ (list T) ] -> [ (list T), maybe_count:i32, condition:i32 ]
If condition
is nonzero, then maybe_count
contains the count
supplied to
list.lift_count
. Note that, by passing the (list T)
through unmodified,
list.has_count
does not "consume" the list from the perspective of the
affine type rules.
Using list.has_count
, a (list s32)
can be more-efficiently lowered into a
contiguous array as follows:
(adapter_func $lowerElem (param s32 i32) (result i32)
(let (param s32) (local $dst i32)
(i32.store (local.get $dst) (i32.lower_s32))
(return (i32.add (local.get $dst) (i32.const 4))))
)
(adapter_func $lowerArray (param (list s32)) (result i32 i32)
list.has_count
if (param (list s32) i32)
(let (param (list s32)) (local $count i32)
(call $core.$malloc (i32.mul (local.get $count) (i32.const 4)))
(let (param (list s32)) (local $dst i32)
(list.lower (list s32) $lowerElem (local.get $dst))
(return (local.get $dst) (local.get $count))))
else
drop ;; pop maybe_count
... ;; use a dynamic reallocation strategy with list.lower
end
)
Here, the then-branch takes advantage of the i32
$count
produced by
list.has_count
to pre-allocate the $dst
array. Without a count, the
else-branch must use a dynamic reallocation strategy (which is elided here for
brevity, but shown in full in the end-to-end example below).
There is one remaining performance-critical case that needs to be addressed:
strings. Recalling above, string
is an abbreviation for
(list char)
and thus, to lift or lower a string
, adapter functions can use
a combination of list.lift
+char.lift
or list.lower
+char.lower
. This
handles the general case of differing producer/consumer string encodings quite
well with the net result being a direct string transcoding. However, in the
common case where the producer and consumer have the same representation, the
transcoding will be suboptimal because strings in most cases will not be able
to use list.lift_count
(whose unit is number-of-USVs). Thus, strings will
need to use a dynamic reallocation strategy despite the fact that the
destination's allocation size could have been simply taken from the source's
allocation size.
To motivate the solution, let's first consider another desirable optimization:
when a list
is passed to a host API (c.f. the Web API
and WASI use cases), we'd like
to define a single, canonical representation of the list
in linear memory to
which the host can align its own internal representation such that, when the
canonical representation was used, zero copying was necessary. In theory,
a sufficiently-smart compiler could pattern-match the $liftElem
code of a
list.lift
to detect when a zero-copy-compatible representation was used, but
such an optimization would be very brittle.
To achieve both these optimizations, one last set of optional instructions are added:
list.lift_canon $List <memidx>? $destructor? : [ T*, offset:i32, byte_length:i32 ] -> [ $List ]
where:
- $List is a (list E), and E is a scalar type (float, integral or char)
- $destructor : [ T* ] -> [], if present
list.is_canon : [ (list E) ] -> [ (list E), maybe_byte_length:i32, condition:i32 ]
- $List is a (list E), and E is a scalar type (float, integral or char)
list.lower_canon <memidx>? : [ offset:i32, (list E) ] -> []
These instructions work as follows:
list.lift_canon
produces a list from the given<memidx>
(defaulting to memory 0) at the givenoffset
, readingbyte_length
bytes, calling the given$destructor
when the list is popped.list.is_canon
queries whether the given(list E)
was lifted bylist.lift_canon
and, if so, returns the originalbyte_length
.list.lower_canon
writes the given list into the given<memidx>
(defaulting to memory 0) at the givenoffset
.
An example usage of these instructions is given below.
As part of the specification of these instructions, a canonical representation
of $List
is precisely specified. While in theory a canonical representation
could be defined for all kinds of lists, this raises interesting questions in
the case of compound element types (lists of lists, variants and records).
Thus, the proposal starts by conservatively only allowing the "scalar" types
(numbers and characters), reserving the option to allow the compound types at a
later time.
While the canonical representation of all the numeric types are obvious,
due to their fixed-power-of-2 sizes, char
requires the proposal to choose
an arbitrary character encoding. To match the core wasm spec's choice of UTF-8,
and the more general trend of "UTF-8 Everywhere", this
proposal also chooses UTF-8. By choosing a canonical string encoding happy path
while providing a graceful fallback to efficient transcoding, Interface Types
provides a gentle pressure to eventually converge without performance cliffs in
the meantime.
It should be noted that, while many languages are permanently tied to
potentially ill-formed UTF-16, a common optimization among these languages'
runtimes is to represent strings as a union of either a full two-byte
WTF-16 code unit or a one-byte representation when the string doesn't
contain any code points outside the one-byte-representable range. While this
one-byte representation today is commonly Latin-1, which is not
UTF-8-compatible, pure 7-bit ASCII, which is UTF-8-compatible, could be
instead be used, allowing the use of list.lift_canon
for all strings
containing only ASCII.
Like lists, records are compound types that contain interface-typed fields that
are recursively lifted and lowered. The signature of record.lift
is:
record.lift $Record $liftFields $destructor? : [ T:<valtype>* ] -> [ $Record ]
where:
- $Record refers to a (record (field <name> <id>? F:<intertype>)*)
- $liftFields : [ T* ] -> [ F* ]
- $destructor : [ T* ] -> [], if present
As with list.lift
, the T*
tuple captures the core wasm state passed into
$liftFields
to lift the individual fields and $destructor
when the lifted
record is popped. Record lowering is symmetric:
record.lower $Record $lowerFields : [ T:<valtype>*, $Record ] -> [ U:<valtype>* ]
where
- $Record refers to a (record (field <name> <id>? F:<intertype>)*)
- $lowerFields : [ T*, F* ] -> [ U* ]
Having separate T*
inputs and U*
outputs gives record.lower
maximum
flexibility to allocate before or during $lowerFields
.
As with list
, record
has a lazy interpretation and thus record.lift
is
evaluated lazily. Thus, any linear memory allocations read by $liftFields
must be kept valid until record.lower
, using $destructor
to free the
allocation thereafter, if necessary.
As an example, given the following type definition of a record type:
(type $Coord (record (field "x" s32) (field "y" s32)))
a $Coord
value can be lifted from a C struct
containing two i32
s in
linear memory by the following adapter functions:
(adapter_func $liftFields (param i32) (result s32 s32)
(let (local $ptr i32) (result s32 s32)
(s32.lift_i32 (i32.load (local.get $ptr)))
(s32.lift_i32 (i32.load offset=4 (local.get $ptr))))
)
(adapter_func $liftCoord (param i32) (result $Coord)
record.lift $Coord $liftFields
)
and lowered into a different linear memory representation, for example, as two
i64
s in reverse order:
(adapter_func $lowerFields (param i32 s32 s32)
rotate 2 ;; move the i32 to the top of the stack
(let (param s32 s32) (local $ptr i32)
(i64.store (local.get $ptr) (i64.lower_s32))
(i64.store offset=8 (local.get $ptr) (i64.lower_s32)))
)
(adapter_func $lowerCoord (param i32 $Coord)
record.lower $Coord $lowerFields
)
When passing the result of $liftCoord
to $lowerCoord
, the net effect will
be to perform a copy that swaps and sign-extends.
Similar to the other compound types, variants recursively lift and lower their
cases' payloads' interface types. The signature of variant.lift
is:
variant.lift $Variant $case $liftCase? $destructor? : [ T:<valtype>* ] -> [ $Variant ]
where
- $Variant refers to a (variant (case <name> <id>? C:<intertype>?)*)
- $liftCase : [ T* ] -> [ C[$case] ] iff C[$case] is present
- $destructor : [ T* ] -> [], if present
By lifting only a single case, the responsibility for branching to select the
case is left up to the containing adapter function which can use core control
instructions like br_if
and br_table
.
In contrast, the variant.lower
instruction takes control of the dynamic
branching itself by taking one lowering function per case:
variant.lower $Variant $lowerCase* : [ T:<valtype>*, $Variant ] -> [ U:<valtype>* ]
where
- $Variant refers to a (variant (case <name> <id>? C:<intertype>?)*)
- for each case: $lowerCase : [ T*, C? ] -> [ U* ]
Having separate T*
inputs and U*
outputs gives variant.lower
maximum
flexibility to allocate before or during $lowerCase
.
As with list
and record
, variant
has a lazy interpretation and thus
variant.lift
is evaluated lazily. Thus, any linear memory allocations read by
$liftCase
must be kept valid until variant.lower
, using $destructor
to
optionally free the allocation when the variant value is popped.
As an example, given the following type definition of a variant type:
(type $MaybeAge (variant (case "has_age" $hasAge u8) (case "no_age" $noAge)))
a $MaybeAge
value can be lifted from a representation that is either null
or a pointer to an allocated object that must be freed after its contents
are read:
(adapter_func $free (param i32)
call $core.$free
)
(adapter_func $liftHasAge (param i32) (result u8)
i32.load
u8.lift_i32
)
(adapter_func $liftMaybeAge (param i32) (result $MaybeAge)
(let (local $ptr i32)
(if (result $MaybeAge) (i32.eqz (local.get $ptr))
(then (variant.lift $MaybeAge $hasAge $liftHasAge $free (local.get $ptr)))
(else (variant.lift $MaybeAge $noAge))))
)
and lowered into a packed-word encoding that uses a sentinel value of -1
to
represent the no_age
case:
(adapter_func $lowerHasAge (param u8) (result i32)
i32.lower_u8
)
(func $lowerNoAge (result i32)
i32.const -1
)
(adapter_func $lowerMaybeAge (param $MaybeAge) (result i32)
variant.lower $MaybeAge $lowerHasAge $lowerNoAge
)
When passing the result of $liftMaybeAge
to $lowerMaybeAge
, the net effect
will be to convert from the null-or-object encoding to the packed encoding and
then free the object in the non-null case.
Having introduced the building blocks, we can now look at a more-complete
example in which module $A
exports a function returning a (list u8)
that is
imported and called by module $B
. This example also demonstrates the ability
to share libc
code between $A
and $B
without sharing libc
state.
$A
is fairly straightforward, importing libc
as module so that a private
$libc
instance can be created and imported by the nested core module.
(This follows the general pattern of the dynamic linking example.)
(adapter_module $A
(type $Libc (instance
(export "memory" $memory (memory 1))
(export "free" $free (func (param i32)))
))
(import "libc" (module $LIBC (export $Libc)))
(instance $libc (instantiate $LIBC))
(alias (memory $libc $memory)) ;; make $libc.$memory the default memory
(module $CORE_A
(import "libc" (instance (type $Libc)))
(func $getBytes (export "get_bytes") (result i32 i32)
;; core code, returning (ptr, length) of allocated std::vector
)
)
(instance $core (instantiate $CORE_A (instance $libc)))
(adapter_func $freeVector (param i32 i32)
drop ;; drop the byte length
call $core.$free
)
(adapter_func $getBytes (export "get_bytes") (result (list u8))
call $core.$getBytes
list.lift_canon (list u8) $liftByte $freeVector
)
)
Since $core.$getBytes
returns ownership of the internal buffer of a
std::vector
, a destructor ($freeVector
) is used to free the buffer when the
list
is popped. Without the destructor, get_bytes
would have no way to know
when it could release the std::vector
memory returned by $core.$getBytes
.
$B
is a bit more complicated due to list lowering having to handle both the
optimized and general case and import adapters requiring extra plumbing to work
around the lack of cyclic imports.
(adapter_module $B
(type $Libc (instance
(export "memory" $memory (memory 1))
(export "malloc" $malloc (func (param i32) (result i32)))
(export "free" $free (func (param i32)))
(export "realloc" $realloc (func (param i32) (param i32) (result i32)))
))
(import "libc" (module $LIBC (export $Libc)))
(instance $libc (instantiate $LIBC))
(import "./A.wasm" (adapter_module $A
(import "libc" (module (export $Libc)))
(adapter_func (export "get_bytes") $getBytes (result (list u8)))
))
(adapter_instance $a (instantiate $A (module $LIBC)))
(adapter_module $ADAPTER
(import "get_bytes" (adapter_func $originalGetBytes (result (list u8))))
(import "libc" (instance $libc (type $Libc)))
(alias (memory $libc $memory)) ;; make $libc.$memory the default memory
(adapter_func $growingLowerByte (param u8 i32 i32 i32) (result i32 i32 i32)
(let (param u8) (result i32 i32 i32)
(local $dst i32) (local $length i32) (local $capacity i32)
(if (i32.eq (local.get $length) (local.get $capacity))
(then
(local.set $capacity (i32.mul (local.get $capacity) (i32.const 2)))
(local.set $dstPtr (call $libc.$realloc (local.get $dstPtr) (local.get $capacity)))))
(i32.store8 (local.get $dstPtr) (i32.lower_u8))
(i32.add (local.get $dstPtr) (i32.const 1))
(i32.add (local.get $length) (i32.const 1))
(local.get $cap))
)
(adapter_func $getBytes (export "get_bytes") (result i32 i32)
(list.is_canon (call_adapter $originalGetBytes))
if (param (list u8) i32) (result i32 i32)
(let (result i32 i32) (local $byteLength i32)
(call $libc.$malloc (local.get $byteLength))
(let (result i32 i32) (local $dstPtr i32)
(list.lower_canon (list u8) (local.get $dstPtr))
local.get $dstPtr
local.get $byteLength))
else
drop ;; pop maybe_count
(list.lower (list u8) $growingLowerByte
(call $core.$malloc (i32.const 8)) (i32.const 0) (i32.const 8))
drop ;; pop capacity, leave [ptr, length] pushed
end
)
)
(adapter_instance $adapter (instantiate $ADAPTER (instance $libc) (adapter_function $a.$getBytes)))
(module $CORE_B
(import "libc" (instance (type $Libc)))
(import "get_bytes" (func (result i32 i32)))
;; core code
)
(instance $core (instantiate $CORE_B (instance $libc) (adapter_func $adapter.$getBytes)))
)
Another thing that this example illustrates is the complementary use of
"shared-everything" and "shared-nothing" linking described in the
Additional Requirements section. In particular,
$LIBC
is a shared-everything library that factors out the common code from
the two shared-nothing modules $A
and $B
. Note that only the stateless
code of $LIBC
is shared; $A
and $B
get their own private $LIBC
instances. Thus, from all external appearances, $A
and $B
produce
shared-nothing instances despite using shared-everything linking as an
implementation detail.
While the lazy evaluation scheme for lifting instructions and interface values decribed above avoids intermediate O(n) copies, we might worry that it comes at the cost of extra runtime implementation complexity and performance overhead. For example, to represent its lazy values, Haskell uses thunks and a runtime implementation oriented around thunks.
However, due to the combination of:
- affine typing restrictions, which ensure that interface values are consumed at most once,
- loop-parameter restrictions, which ensure that interface values only flow forward, and
- adapter calling restrictions, which ensure that all adapter function calls are direct and non-recursive,
the implementation of adapter function laziness can be greatly simplified: at instantiation time, when imports are known, and thus the complete adapter function callgraph is known, it is possible to compile all adapter functions and interface types down to core functions and core types. With this compilation scheme, called "adapter fusion", both runtime complexity and performance overhead are avoided.
A high-level sketch of the adapter fusion algorithm is as follows. This
process is described in terms of a single adapter function (henceforth called
the "root") that has been imported by a core instance (via adapter_function
operand to instantiate
). Thus, this algorithm is expected to be performed
repeatedly, once per imported adapter function.
- Adapter functions are recursively inlined into the root until no
call_adapter
instructions remain. - For each operand of each lifting instruction, synthesize a new
local
of the operand's type in the root. (Note: lifting instructions only have core operand types.) - Give each use of a lifting instruction a unique integer id.
- Rewrite each lifting instruction in-place into:
- for each operand, a
local.set
into thelocal
synthesized by step 2. - an
i32.const
that pushes the unique integer id assigned by step 3.
- for each operand, a
- Rewrite all interface types to
i32
(which, due to step 4, is now the core type of all lifting instructions). - Compute the Reaching Definitions for all lowering instructions. (Due to validation, lowering instructions will be transitively reached by zero or more type-compatible lifting instructions.)
- Rewrite each lowering instruction in-place into a
br_table
that switches on all reachingi32
lifting instruction identifiers (pushed by step 4.2 and discovered by step 6), with one case per lifting instruction containing:- for each operand of the lifting instruction's function immediate (i.e.,
$liftElem
/$liftFields
/$liftCase
), alocal.get
from the local synthesized by step 2 and initialized by step 4.1. - an inline fusion of the lifting instruction's function immediate and the lowering instruction's function immediate produced by recursively invoking this algorithm on the composition of the two.
- for each operand of the lifting instruction's function immediate (i.e.,
- At each point where an interface value is popped (lowering instructions,
drop
instructions and control flow instructions), emit abr_table
that switches on all reachingi32
lifting instruction identifiers, where each case body contains a call to the corresponding lifting instruction$destructor
, passing the closure state captured by step 4.2.
Thus, the algorithm erases almost all the dynamism of laziness by aggressively
inlining. Even the one remaining source of dynamism (the br_table
s) will
often be optimized away when a lowering instruction is reached by a single
lifting instruction (as is the case in the example below). For normal code,
this compilation strategy would risk significant code bloat, but the role of
adapter code is to be a thin layer between core modules. If code bloat does
become a problem, there is a spectrum of less-aggressively-specializing
compilation strategies available. In the limit, no inlining need be performed;
lazy values can be boxed into tuples holding function references.
As a demonstration, the adapter modules $A
and $B
shown in the
last section would be fused into the $fused_root
function shown below. The containing module would also be created according to
similarly-automatic fusion rules. Note that the nested modules $CORE_A
and
$CORE_B
are identical to those originally nested in $A
and $B
; the fusion
algorithm only operates on adapter modules and leaves core modules untouched.
(module
(type $Libc (instance
(export "memory" $memory (memory 1))
(export "malloc" $malloc (func (param i32) (result i32)))
(export "free" $free (func (param i32)))
(export "realloc" $realloc (func (param i32) (result i32)))
))
(import "libc" (module $LIBC (export $Libc)))
(instance $libc_a (instantiate $LIBC))
(module $CORE_A
(import "libc" (instance (type $Libc)))
(func (export "get_bytes") (result i32 i32)
;; core code
)
)
(instance $core_a (instantiate $CORE_A (instance $libc_a)))
(module $GLUE
(import "" (func $core_a_get_bytes (result i32 i32)))
(import "libc_a" (instance $libc_a (type $Libc)))
(import "libc_b" (instance $libc_b (type $Libc)))
(func $fused_root (export "get_bytes") (result i32 i32)
(local $list_lift_ptr i32) ;; added by step 2 for list.lift_canon
(local $list_lift_len i32) ;; added by step 2 for list.lift_canon
;; $A.$getBytes:
call $core_a_get_bytes
local.set $list_lift_len ;; rewritten from list.lift_canon by step 4.1
local.set $list_lift_ptr ;; rewritten from list.lift_canon by step 4.1
;; i32.const added by step 4.2 eliminated as dead code
;; $B.$getBytes_:
;; list.is_canon is a constant expression because only one reaching lift
local.get $list_lift_len
;; `if` now has a constant expression, so the `if` and `else` are removed
let (result i32 i32) (local $byteLength )
(call $libc_b.$malloc (local.get $byteLength))
let (result i32 i32) (local $dstPtr i32)
;; emitted by step 7.2 for matching list.lift_canon/lower_canon
(memory.copy $libc_a.$memory $libc_b.$memory ;; using multi-memory proposal
(local.get $dstPtr)
(local.get $list_lift_ptr)
(local.get $byteLength))
;; emitted by step 8: inline $A.$freeVector
(call $libc_a.$free (local.get $list_lift_ptr))
;; end of $B.$getBytes then-branch
local.get $dstPtr
local.get $list_lift_len
end
end
)
)
(instance $libc_b (instantiate $LIBC))
(instance $glue (instantiate $GLUE (instance $libc_a) (instance $libc_b) (func $core_a.$getBytes)))
(module $CORE_B
(import "libc" (instance (type $Libc)))
(import "get_bytes" (func (result i32 i32)))
;; core code
)
(instance $core_b (instantiate $CORE_B (instance $libc_b) (func $glue.$fused_root)))
)
As demonstrated by this example, fusion is able to produce highly optimized core wasm code when both modules use the "canonical" representation of lists. However, even non-canonical lists will produce a single fused loop that simultaneously iterates over the source and destination, allowing direct copy without any intermediate buffer. In the case of nested lists, the nested lifting and lowering loops will also be fused so that there is never an intermediate copy.
In summary, adapter fusion converts multiple adapter modules which each
respectively encapsulate their own memory into a single fused core module that
contains all the memories (using multi-memory) and fused functions that
directly copy between memories. Thus, fusion can serve as an implementation
technique for Interface Types, wrapping the existing core wasm spec's
module_instantiate
procedure with a new adapter_instantiate
procedure that
performs adapter fusion and then feeds the resulting core module into
module_instantiate
. Moreover, if imports are known at build time (e.g., by
Webpack), then fusion can occur at build time, ultimately requiring no native
wasm engine support.
We now consider how this proposal can be used to address the use cases given above.
With this proposal, a WASI interface can be defined purely in terms of an
adapter-module type. From this type, written as a
.wit
/.witx
file, each source-language's toolchain can automatically
generate source-language declarations and adapter functions.
For example, the signature of fd_pwrite
could be defined as:
(adapter_instance
(type $FD (export "fd"))
(type $Errno (enum "acces" "badf" "busy" ...))
(adapter_func $fd_pwrite (export "fd_pwrite")
(param $fd (ref $FD))
(param $iovs (list u8))
(param $offset u64)
(result (expected u32 $Errno)))
)
and from this .wit
type, different source-language declarations and adapter
functions could be generated. For example, a C++ declaration generator might
emit something like:
template <class T> class handle { int table_index_; ... };
enum class errno { acces, badf, busy, ... };
std::expected<uint32_t, errno>
fd_pwrite(handle<FD> fd,
const std::vector<uint8_t>& iovs,
uint64_t offset);
Based on these signatures, the generated adapter function would be:
(adapter_func $lowerOk (param u32) (result i32 i32)
i32.const 0
i32.lower_u32
)
(adapter_func $lowerError (param $Errno) (result i32 i32)
i32.const 1
call_adapter $lowerErrno ;; $lowerErrno omitted for brevity
)
(adapter_func $fd_pwrite_adapter (param i32 i32 i64) (result i32 i32)
(let (result i32 i32) (local $fd i32) (local $iovs i32) (local $offset i64)
(call_adapter $fd_pwrite
(table.get $fds (local.get $fd))
(list.lift_canon $liftByte (i32.load (local.get $iovs)) (i32.load offset=4 (local.get $iovs)))
(u64.lift_i64 (local.get $offset)))
variant.lower (expected u32 $Errno) $lowerOk $lowerError)
)
where we assume std::vector
's first two words are the pointer and length and
std::expected
is represented as a boolean i32
and a payload i32
that can
be returned via multi-value. Calls in C++ to fd_pwrite
are thus compiled
into calls to the imported $fd_pwrite_adapter
. At instantiation-time, the
wasm engine can compile $fd_pwrite_adapter
into an efficient trampoline
which converts directly from the C++ linear memory representation of
fd_pwrite
's arguments into the callee's representation.
Using interface types, a wasm adapter module can directly call Web APIs, passing compatible high-level values. The process starts when an adapter module is instantiated by either the JS API or ESM-integration. In either case, the instantiate a WebAssembly module spec routine is invoked, passing in a set of imported JS values. This routine specifies how the incoming JS values are coerced to match the module's declared import types. With this proposal, these rules would naturally be extended to consider interface-typed imports:
- if the imported JS value is a built-in function created by Web IDL (e.g., for an operation, getter or setter) and its Web IDL signature "matches" the interface-typed import signature (in the sense described below), then an adapter function is synthesized from the two signatures that converts between interface values and Web IDL values;
- otherwise, the interface values are converted to JS values (as described below) and the Web IDL function is called as a JS function through the ECMAScript Binding.
An important requirement for these two coercion paths is that there should be
little to no observable difference between the two paths, allowing the former
path to be considered an optimization of the latter path. Ensuring 100%
semantic equivalence may not be possible due to JS semantic corner cases (e.g.,
NaN canonicalization of unrestricted double
or prototype-chain walking for
missing dictionary
properties). However, in non-pathological scenarios, this
equivalence must be maintained to ensure that JS virtualization of Web APIs
still works effectively (so that a JS function can polyfill, fix, attenuate,
censor, virtualize, etc. a Web IDL function).
Given that, the following Web IDL/interface types would match:
any
matchesexternref
void
matches the empty function result type- numeric types match numeric interface types
USVString
matchesstring
(these types are isomorphic)DOMString
matchesstring
(using the lossyDOMString
-to-USVString
conversion)ByteString
matchesstring
(using the lossyTextDecoder.decode
conversion)- Dictionary matches
record
- Sequence and Frozen array match
list
boolean
matches thebool
variant abbreviation- Enumeration types match the
enum
variant abbreviation - Nullable types match the
option
variant abbreviation - Union types match the
union
variant abbreviation - Interface types match TODO
ArrayBufferView
types match TODO- Callback types match
funcref
object
,symbol
,Promise
matchexternref
ArrayBuffer
matchesexternref
(rarely used outside ofBufferSource
)- Record matches
externref
(rarely used; not actually arecord
) - Annotated types recursively match the inner type (the annotations add callee-side runtime checks)
As shown by this list, externref
provides an escape hatch for
less-frequently-used or JS/Web-specific types. When externref
is used, an
adapter module can import helper functions for producing and consuming
externref
s. In the future, with type imports, the externref
s could be
replaced by (ref $Import)
s which could eliminate dynamic type checks in
adapter functions at instantiation-time.
To create a maximally-reusable module, a developer would produce an adapter module that exclusively uses interface types in its signature to create a shared-nothing interface. The value semantics of interface types provide the client language significant flexibility in how to coerce to and from its native language values.
For example, the JS API would be extended to provide the following coercions
(in ToJSValue
and ToWebAssemblyValue
):
- Numeric types other than
s64
andu64
convert to and from JS number values. s64
andu64
types convert to and from JS BigInt values.string
converts to and from JS string values (using theDOMString
-to-USVString
conversion for the "from" direction).- If the JS records and tuples proposal progresses, interface
record
andlist
types could convert to and from these new value types. - The
tuple
record abbreviation could also produce a JS tuple value. - The
bool
variant abbreviation converts to and from JS boolean values. - The
enum
variant abbreviation converts to and from JS string values based on theenum
's case labels. - The
option
variant abbreviation converts the JSnull
orundefined
to thenone
case and any other JS value to thesome
case. - The
result
variant abbreviation, when used as the sole result of a function call would produce an exception for theerror
case, and return the payload of theok
case. - The
union
variant abbreviation when converted to JS would convert the payload directly. Converting from a JS value to aunion
is highly ambiguous but could perhaps use the same ad hoc resolution scheme Web IDL'sunion
ECMASCript Binding. - The general
variant
interface type does not have a canonical mapping to JS. TypeScript initially supported tagged unions using an object with a specialkind
property as the discriminant, but this feature was later generalized to User-Defined Type Guards. Forvariant
, coercing to and from an object of the form{kind:'name', value:<payload>}
could potentially work.
Note that the same coercions can be applied both when WebAssembly is embedded in a native JavaScript engine and when the client is JavaScript running as WebAssembly (on a host that does not include a JavaScript engine). In a similar manner, other high-level languages can define their own coercion semantics between interface values.
This proposal has no direct impact on ESM-integration and adapter modules
should Just Work when imported via import
or <script type='module'>
. The
reason is that ESM-integration is responsible for selecting which wasm module
to compile and which JS import values to supply to the instantiate a WebAssembly module
procedure whereas this proposal changes what happens during these respective
steps.
With ESM-integration and Interface Types, WebAssembly is one step closer to enabling WebAssembly modules to have full access to the Web platform without the need for JS "glue code". The remaining capability needed to achieve this goal is something like the get-originals proposal which reflects JS and Web IDL APIs as built-in modules that can be directly imported.
- "outparams" to capture the
read(buffer)
/ typed-array-view use cases (#68) - lifting and lowering between interface types and opaque reference types, allowing zero-copy when used on both sides
- types for describing handles to resources that address resource lifetime (see also #87)
- optional/default-valued record fields or function parameters
- how precisely calls to host functions work
- ability to import/export interface values (as opposed to adapter functions), thereby allowing adapter modules to import configuration data and JSON modules