Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/ssh: test consistently runs out of memory on js-wasm builder #32840

Open
bcmills opened this issue Jun 28, 2019 · 10 comments
Open

x/crypto/ssh: test consistently runs out of memory on js-wasm builder #32840

bcmills opened this issue Jun 28, 2019 · 10 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Jun 28, 2019

The golang.org/x/crypto/ssh test consistently fails on the js-wasm builder with the error message runtime: out of memory: cannot allocate 8192-byte block.

Can the test be made to fit within the runtime on that architecture? If not, it should be skipped, instead of continuing to fail and potentially masking more serious problems.

CC @hanwen @neelance @FiloSottile

@agnivade
Copy link
Contributor

@agnivade agnivade commented Jun 29, 2019

The code is allocating memory in a tight loop. Somehow the wasm runtime is unable to schedule a GC run since it runs on a single thread. Although GOMAXPROCS=1 with amd64 passes.

I believe this is a valid bug with wasm runtime (I have seen this happen once to someone in gophers slack), but for now, adding this patch fixes it -

diff --git a/ssh/handshake_test.go b/ssh/handshake_test.go
index 91d4935..742678a 100644
--- a/ssh/handshake_test.go
+++ b/ssh/handshake_test.go
@@ -455,6 +455,9 @@ func testHandshakeErrorHandlingN(t *testing.T, readLimit, writeLimit int, couple
                                        if err != nil {
                                                break
                                        }
+                                       if runtime.GOOS == "js" && i%10 == 0 {
+                                               runtime.GC()
+                                       }
                                }
                                wg.Done()
                                c.Close()

Perhaps we can file an issue and add this patch for now, pointing to the issue.

EDIT: or maybe even track this issue for it.

@hanwen
Copy link
Contributor

@hanwen hanwen commented Jul 1, 2019

the explicit GC call is kludge. As kludges go, I prefer to simply t.Skip() on wasm because it is more explicit.

@agnivade
Copy link
Contributor

@agnivade agnivade commented Jul 1, 2019

SGTM.

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 1, 2019

Change https://golang.org/cl/184397 mentions this issue: ssh: skip testHandshakeErrorHandlingN on js/wasm

gopherbot pushed a commit to golang/crypto that referenced this issue Jul 1, 2019
The wasm runtime cannot schedule a GC run on tight loops.
Therefore it runs out of memory if such a loop allocates memory.

Skip such a test for now.

Updates golang/go#32840

Change-Id: I922b6e02710915776d2820573fd1584a5941185b
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/184397
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Han-Wen Nienhuys <hanwen@google.com>
@agnivade agnivade removed the Soon label Jul 1, 2019
@bcmills
Copy link
Member Author

@bcmills bcmills commented Jul 1, 2019

I think this is a bug in the test itself, not a bug in the wasm/js runtime.

As far as I can tell, the calls to c.writePacket end up going to an underlying memTransport, and its writePacket method enqueues a copy of the data without any sort of backpressure or flow-control. That implies that the cases that pass -1 for writeLimit can buffer an arbitrarily large amount of (reachable!) data without ever needing to yield execution.

I suspect that the explicit call to runtime.GC() appears to fix the test because it causes the goroutine to yield execution, not because it collects actual garbage. (If that hypothesis is correct, then substituting a call to runtime.Gosched() will likely have a similar effect.)

If this diagnosis is correct, then an appropriate fix may be to bound (*memTransport).writePacket to a fixed number of packets.

@agnivade
Copy link
Contributor

@agnivade agnivade commented Jul 1, 2019

I had thought about that. If that is so, shouldn't it fail in amd64 with GOMAXPROCS=1 ? What is different in the amd64 runtime that makes it to pass ?

Yes, it is a scheduling issue. runtime.Gosched() is a better substitute.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Jul 1, 2019

What is different in the amd64 runtime that makes it to pass?

I don't actually know, but in general the set of preemption points is different across different go versions. (And they're presumably going to get even more different when #24543 lands!)

My best guess (and, to be clear, it's only a guess) is that on amd64 the runtime checks for goroutine preemption any time it makes a stack check, whereas on js in only preempts when the program blocks. (Perhaps that's due to the relative costs on js, or perhaps there is no mechanism to asynchronously signal a running goroutine to yield?)

@agnivade
Copy link
Contributor

@agnivade agnivade commented Jul 1, 2019

Right, and when I said it's a bug in wasm runtime, I meant for something in needsInvestigation stage, not needsFix. i.e. - is it at all possible to cause a goroutine to prempt in these conditions, or this is something by design ? If it's the previous, then it is a bug worth investigating. If it's the latter, then we can just document it and move on with improving the test.

cc @neelance for some thoughts.

@hanwen
Copy link
Contributor

@hanwen hanwen commented Jul 2, 2019

I guess the problem is the unlimited write in https://go.googlesource.com/crypto/+/22d7a77e9e5f409e934ed268692e56707cd169e5/ssh/handshake_test.go#452 ?

you could try to substitute -1 for writeLimit with a large number (100 would probably do the trick) and see if that works better.

bored-engineer pushed a commit to bored-engineer/ssh that referenced this issue Oct 13, 2019
The wasm runtime cannot schedule a GC run on tight loops.
Therefore it runs out of memory if such a loop allocates memory.

Skip such a test for now.

Updates golang/go#32840

Change-Id: I922b6e02710915776d2820573fd1584a5941185b
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/184397
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Han-Wen Nienhuys <hanwen@google.com>
bored-engineer pushed a commit to bored-engineer/ssh that referenced this issue Oct 13, 2019
The wasm runtime cannot schedule a GC run on tight loops.
Therefore it runs out of memory if such a loop allocates memory.

Skip such a test for now.

Updates golang/go#32840

Change-Id: I922b6e02710915776d2820573fd1584a5941185b
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/184397
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Han-Wen Nienhuys <hanwen@google.com>
bored-engineer pushed a commit to bored-engineer/ssh that referenced this issue Oct 13, 2019
The wasm runtime cannot schedule a GC run on tight loops.
Therefore it runs out of memory if such a loop allocates memory.

Skip such a test for now.

Updates golang/go#32840

Change-Id: I922b6e02710915776d2820573fd1584a5941185b
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/184397
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Han-Wen Nienhuys <hanwen@google.com>
@neelance
Copy link
Member

@neelance neelance commented Aug 16, 2020

My best guess (and, to be clear, it's only a guess) is that on amd64 the runtime checks for goroutine preemption any time it makes a stack check, whereas on js in only preempts when the program blocks. (Perhaps that's due to the relative costs on js, or perhaps there is no mechanism to asynchronously signal a running goroutine to yield?)

This is correct. Preemption of goroutines requires that the sysmon function runs in a separate OS thread. Since wasm has no threads yet, we can not run sysmon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.