Skip to content

The Syscall Experiment

Nevkontakte edited this page Mar 20, 2022 · 24 revisions

Note: this is a draft proposal / experiment. If all goes well, it may become an official GopherJS proposal, but it isn't yet.

Proposal draft

Motivation: https://github.com/gopherjs/gopherjs/issues/693.

  • OS interaction and syscalls are mainly supported via upstream wasm shims over Node APIs.
    • Optional: fallback to the node_syscall extension for other calls, if the extension is loaded.
  • All standard library packages are built with GOOS=js GOARCH=wasm to maximize code reuse with upstream.
  • All user code is built with GOOS=js GOARCH=ecmascript.
  • Build tag gopherjs is always set.

Pros

  • Wasm runtime has a lot of similarity to GopherJS, we can reuse more upstream code.
  • Build results are independent from the host OS. Users can use build code with gopherjs under Linux, Windows or OS X and get the same results (this also improves support experience).
  • No need to build a C++ extension to do File I/O on NodeJS.

Cons

  • This can be considered a breaking change: build tags change, some syscalls may cease to work.

Open questions

  • How do we version the change? If we are to go to 2.x, we'll have to update all import paths.
  • What do we do with the node_syscall extension?
    • Deprecate and delete it?
    • Use it as a fallback for syscalls that wasm doesn't support?
    • Enable it with build tags or by default?
  • Do we want to support old GOOS/GOARCH combinations?

Experiment checklist

Code: https://github.com/nevkontakte/gopherjs/tree/syscall2

  • PoC: All standard library tests pass with js/wasm build tags.
  • Evaluate existing standard library overlays or skipped tests that can be removed.
    • net, net/http, os, syscall and syscall/js packages now pass tests. Can probably do with more packages.
  • [ ] net: Enable fake networking in tests only.
    • Fake network supports localhost connections, so that may be useful. Documented that instead of removing.
  • net/http: backport XMLHttpRequest transport. Allow disabling unneeded transports at build time. Make sure DCE works correctly.
    • XMLHttpRequest is used as a fallback if Fetch API is not available.
    • We drop our own implementation of Fetch API transport in favor of the one from standard library.
    • There is a potential to reduce generated code size if we disable fallbacks to the default roudtripper, but that violates the contract. We can investigate this later.
  • Check fallbacks in the browser (for file operations in particular).
  • Check if callback-based call stacks can be improved.
    • TL;DR: Not really. For details see "Appendix: async call stacks".
  • Disable CGo.
  • Set Compiler to gopherjs, always set gopherjs build tag.
  • Set correct build tags for stdlib and user code.
  • Investigate whether we can maintain some compatibility with the node syscall module for now.
  • Cleanup unnecessary package tweaks in the build system.
  • Cleanup overlays for targets other than js/wasm.
  • Cleanup FIXMEs.
  • Cleanup TestNativesDontImportExtraPackages.
  • [ ] Filter runtime stack trace up to goroutine scheduler.
    • This is out of scope for this change, since it was the case prior to it already.
  • Test builds under Windows.
  • Add a CircleCI smoke test for Windows and OSX
  • [ ] Set up stdlib tests under old and new GOOS/GOARCH.
    • No longer relevant, since we pinned stdlib to js/wasm.
  • Make a comprehensive list of things that are fixed or broken by the change.

Appendix: async call stacks.

Stdlib wasm implementation uses NodeJS callback-based FS API to implement file system access. In order to turn it into blocking Go calls it used the following wrapper (simplified):

func fsCall(name string, args ...interface{}) (js.Value, error) {
	c := make(chan js.Value, 1)
	f := js.FuncOf(func(this js.Value, args []js.Value) interface{} {
		c <- args[1]
		return nil
	})
	defer f.Release()
	jsFS.Call(name, append(args, f)...)
	return <-c, nil
}

Note that the callback that passes the result into the channel c executes in a separate goroutine (sort of), which receives control from NodeJS runtime. That, in turn, passes control back to the whichever goroutine made the file system call. Consider the following program:

package main

import (
	"os"
	"runtime/debug"
)

func main() {
	f, err := os.CreateTemp(os.TempDir(), "gopherjs_stack")
	if err != nil {
		panic(err)
	}
	debug.PrintStack()
}

Under wasm it would print the following call stack:

$ GOOS=js GOARCH=wasm go run -exec /usr/local/go/misc/wasm/go_js_wasm_exec main.go
goroutine 1 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x6
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x2
main.main()
        /home/aleks/git/repro/029-channel-stack/main.go:13 +0xd

Under GopherJS using node-syscall extension for FS access it prints the following:

$ (cd ~/git/gopherjs; go install --tags=gopherjsdev .) && gopherjs run main.go
    at Object.Stack (/runtime/gopherjs__runtime.go:398:3)
    at Stack (/runtime/debug/stack.go:24:4)
    at Object.PrintStack (/runtime/debug/stack.go:16:3)
    at main (/home/aleks/git/repro/029-channel-stack/main.go:13:3)
    at $init (/home/aleks/git/repro/029-channel-stack/main.go.3412395971:16200:9)
    at $goroutine (/home/aleks/git/repro/029-channel-stack/main.go.3412395971:1601:19)
    at $runScheduled (/home/aleks/git/repro/029-channel-stack/main.go.3412395971:1647:7)
    at $schedule (/home/aleks/git/repro/029-channel-stack/main.go.3412395971:1671:5)
    at $go (/home/aleks/git/repro/029-channel-stack/main.go.3412395971:1633:3)
    at Object.<anonymous> (/home/aleks/git/repro/029-channel-stack/main.go.3412395971:16212:1)
    at Object.<anonymous> (/home/aleks/git/repro/029-channel-stack/main.go.3412395971:16215:4)
    at Module._compile (internal/modules/cjs/loader.js:999:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)
    at internal/main/run_main_module.js:17:47

Notably, this call stack is a lot longer than wasm and shows a lot of NodeJS and GopherJS runtime functions (e.g. $runScheduled). This is fine, since, unlike wasm, GopherJS doesn't have a separate call stack. However, the call stack here matches the intuitive expectation of what it should be if main() function was executed synchronously. That is because it is, in fact, executed synchronously. In particular we can see that Go main() function received control (through several runtime functions) from Node's Module._compile().

This changes if we adopt the stdlib approach of using callback-based API:

$ (cd ~/git/gopherjs; go install --tags=gopherjsdev .) && gopherjs run main.go
    at Object.Stack (/runtime/gopherjs__runtime.go:397:3)
    at Stack (/runtime/debug/stack.go:24:4)
    at Object.PrintStack (/runtime/debug/stack.go:16:3)
    at Object.main (/home/aleks/git/repro/029-channel-stack/main.go:13:3)
    at Object.$init [as $blk] (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:14884:68)
    at fun (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:1636:37)
    at $goroutine (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:1634:19)
    at $runScheduled (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:1680:7)
    at $schedule (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:1704:5)
    at queueEntry (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:1770:5)
    at $send (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:1729:5)
    at $b (/home/aleks/git/repro/029-channel-stack/main.go.2386950363:6680:9)
    at $b (/syscall/js/gopherjs__js.go:83:5)
    at /home/aleks/git/repro/029-channel-stack/main.go.2386950363:65:72
    at FSReqCallback.oncomplete (fs.js:169:5)

Here we can see that the top part of the stack matches the intuition (PrintStack() as called from main(), which got control from $runScheduled()). However, the bottom of the stack is now different: it isn't rooted in the NodeJS loader, but in the FS callback handler. This can be confusing, since in Go we don't expect the bottom of the call stack to differ depending on whether we look at it before or after a file system call.

However, that accurately reflects the nature of the event loop in NodeJS and as such isn't a bug. It's an open question whether this violates the Go spec (it definitely differs from the wasm behavior), but fundamentally this issue is not new to GopherJS. For example, the following program produces a similar call stack at mainline:

package main

import (
	"runtime/debug"
	"time"
)

func main() {
	time.Sleep(time.Second)
	debug.PrintStack()
}
$ (cd ~/git/gopherjs; go install --tags=gopherjsdev .) && gopherjs run main.go
    at Object.Stack (/runtime/gopherjs__runtime.go:398:3)
    at Stack (/runtime/debug/stack.go:24:4)
    at Object.PrintStack (/runtime/debug/stack.go:16:3)
    at Object.main (/home/aleks/git/repro/029-channel-stack/main.go:10:3)
    at Object.$init [as $blk] (/home/aleks/git/repro/029-channel-stack/main.go.61338053:15651:68)
    at fun (/home/aleks/git/repro/029-channel-stack/main.go.61338053:1603:37)
    at $goroutine (/home/aleks/git/repro/029-channel-stack/main.go.61338053:1601:19)
    at $runScheduled (/home/aleks/git/repro/029-channel-stack/main.go.61338053:1647:7)
    at $schedule (/home/aleks/git/repro/029-channel-stack/main.go.61338053:1671:5)
    at queueEntry (/home/aleks/git/repro/029-channel-stack/main.go.61338053:1737:5)
    at $close (/home/aleks/git/repro/029-channel-stack/main.go.61338053:1760:5)
    at /time/gopherjs__time.go:37:60
    at Timeout._onTimeout (/home/aleks/git/repro/029-channel-stack/main.go.61338053:1679:5)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7)

We could, perhaps, try to mask extraneous call frames, but this could plausibly make debugging harder.

We could maintain the current behavior with regards to file system calls by using synchronous API, but I see several reasons not to:

  • This solves only a single example of the unintuitive stack trace, the time.Sleep() case and many similar will still remain.
  • Under Node, GopherJS programs with multiple goroutines doing FS I/O will become less efficient, since parallel async I/O operations won't be possible.
  • Under browsers, this may reduce interoperability with libraries like BrowserFS, which implement async APIs.

For the posterity, here's a version of fsCall() that uses a sync API:

func fsCall(name string, args ...interface{}) (val js.Value, err error) {
	defer func() {
		e := recover()
		if e == nil {
			return
		}
		jsErr, ok := e.(js.Error)
		if !ok {
			panic(e) // Unexpected error type, re-raise it.
		}
		err = mapJSError(jsErr.Value)
	}()
	val = jsFS.Call(name+"Sync", args...)
	return val, err
}
Clone this wiki locally