Skip to content

runtime: MOVUPS in duffcopy causing "suicide: sys: floating point in note handler" on Plan9 #12829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kennylevinsen opened this issue Oct 3, 2015 · 16 comments

Comments

@kennylevinsen
Copy link

Go and OS versions

go: go version devel +5a2a556 Fri Oct 2 16:39:16 2015 +0100 darwin/amd64
executing OS: 9front amd64 current head
compiling OS: OS X 10.11

Test

package main

func main() {
    println("Hello World!")
}

Expected result

cpu% ./test
Hello World!
cpu%

This is also the result on 1.5.1

Actual result

cpu% ./test
Hello World!
test 9272: suicide: sys: floating point in note handler: 0x24abaa
test 9271: suicide: sys: floating point in note handler: 0x24abaa
test 9270: suicide: sys: floating point in note handler: 0x24abaa
cpu%

Debug info

cpu% acid 9270
/proc/9270/text:amd64 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/amd64

acid: stk()
runtime.duffcopy()+0x25a ?file?:0
runtime.sighandler(runtime.~r3=0x0,runtime.gp=0x862c480,runtime.note=0x7ffffeffeb70,runtime._ureg=0x7ffffeffebf8)+0x369 ?file?:0
runtime.sigtramp()+0x60 ?file?:0
0x7ffffeffebf8 ?file?:0

acid: asm(*PC)
runtime.duffcopy+0x25a 0x000000000024abaa   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x25d 0x000000000024abad   ADDQ    $0x10,SI
runtime.duffcopy+0x261 0x000000000024abb1   MOVUPS  X0,0x0(DI)
runtime.duffcopy+0x264 0x000000000024abb4   ADDQ    $0x10,DI
runtime.duffcopy+0x268 0x000000000024abb8   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x26b 0x000000000024abbb   ADDQ    $0x10,SI
runtime.duffcopy+0x26f 0x000000000024abbf   MOVUPS  X0,0x0(DI)
runtime.duffcopy+0x272 0x000000000024abc2   ADDQ    $0x10,DI
runtime.duffcopy+0x276 0x000000000024abc6   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x279 0x000000000024abc9   ADDQ    $0x10,SI
runtime.duffcopy+0x27d 0x000000000024abcd   MOVUPS  X0,0x0(DI)
runtime.duffcopy+0x280 0x000000000024abd0   ADDQ    $0x10,DI
runtime.duffcopy+0x284 0x000000000024abd4   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x287 0x000000000024abd7   ADDQ    $0x10,SI
runtime.duffcopy+0x28b 0x000000000024abdb   MOVUPS  X0,0x0(DI)
runtime.duffcopy+0x28e 0x000000000024abde   ADDQ    $0x10,DI
runtime.duffcopy+0x292 0x000000000024abe2   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x295 0x000000000024abe5   ADDQ    $0x10,SI
runtime.duffcopy+0x299 0x000000000024abe9   MOVUPS  X0,0x0(DI)
runtime.duffcopy+0x29c 0x000000000024abec   ADDQ    $0x10,DI
runtime.duffcopy+0x2a0 0x000000000024abf0   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x2a3 0x000000000024abf3   ADDQ    $0x10,SI
runtime.duffcopy+0x2a7 0x000000000024abf7   MOVUPS  X0,0x0(DI)
runtime.duffcopy+0x2aa 0x000000000024abfa   ADDQ    $0x10,DI
runtime.duffcopy+0x2ae 0x000000000024abfe   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x2b1 0x000000000024ac01   ADDQ    $0x10,SI
runtime.duffcopy+0x2b5 0x000000000024ac05   MOVUPS  X0,0x0(DI)
runtime.duffcopy+0x2b8 0x000000000024ac08   ADDQ    $0x10,DI
runtime.duffcopy+0x2bc 0x000000000024ac0c   MOVUPS  0x0(SI),X0
runtime.duffcopy+0x2bf 0x000000000024ac0f   ADDQ    $0x10,SI

The above appears to be calling copyduff from the sighandler. Change 14836 (https://go-review.googlesource.com/#/c/14836/) changes runtime.copyduff from using MOVQ to using MOVUPS for performance reasons, but Plan9 does not permit using floating point in note handlers. This includes operations that access XMM registers, such as MOVUPS.

Almost two years ago, a similar issue was fixed, where runtime.memmove (which used MOVOU) was used accidentally in the plan9 signal handler: https://codereview.appspot.com/34640045/

Proposed solution

A solution would be to avoid using copyduff in this case or to have a slower copyduff as well. Reverting the optimized copyduff for only this reason would seem silly to me.

@kennylevinsen kennylevinsen changed the title runtime: MOVUPS in duffcopy causing "Floating point in note handler" on Plan9 runtime: MOVUPS in duffcopy causing "suicide: sys: floating point in note handler" on Plan9 Oct 3, 2015
@bradfitz
Copy link
Contributor

bradfitz commented Oct 3, 2015

If you want Plan 9 to stop regressing in this way, you should add a unit test which triggers it. Cause a bunch of notes to happen at a time/place where these copy routines would run?

Any fix for this would need a test anyway, so writing the test wouldn't be a waste of time.

/cc @0intro

@randall77
Copy link
Contributor

We already turn off duffcopy when goos==Nacl. You can just turn duffcopy off when goos==plan9 also.
See cmd/compile/internal/amd64/cgen.go:blockcopy.

@davecheney
Copy link
Contributor

Sgtm, but a test is still important.

On Sat, 3 Oct 2015 11:53 Keith Randall notifications@github.com wrote:

We already turn off duffcopy when goos==Nacl. You can just turn duffcopy
off when goos==plan9 also.
See cmd/compile/internal/amd64/cgen.go:blockcopy.


Reply to this email directly or view it on GitHub
#12829 (comment).

@randall77
Copy link
Contributor

Agreed.

@kennylevinsen
Copy link
Author

A test:

package main

import (
    "fmt"
    "os"
    "testing"
)

func TestNote(t *testing.T) {
    pid := os.Getpid()

    path := fmt.Sprintf("/proc/%d/note", pid)
    f, err := os.Create(path)
    if err != nil {
        t.Fatalf("unable to open note file: %v", err)
    }
    defer f.Close()
    fmt.Fprint(f, "Hello, world!\n")
}

I have run a non-test variant (make TestNote into a func main() and you got it), but not as go test, as I only have go 1.4.2 available directly on my plan9 box. I can install one from master later tonight when I come home for testing "go test" directly. The only issue that could be would be whether go test would detect that the thread has been suspended as an error.

If the signal handler works in the _SigNotify case, then this should simply do nothing. Otherwise, the thread will have execution suspended in the "broken" run state, and "XYZ: pid: suicide: sys: floating point in note handler: pc=somewhere" printed to stderr. It does not validate the _SigPanic or _SigExit branches.

A side-note is that, due to the go runtime posting "go: exit" to all threads on shutdown, any terminating application is a test, with all threads being suspended and having their own line about floating point written to stderr.

@bradfitz
Copy link
Contributor

bradfitz commented Oct 5, 2015

@Joushou, if that test code causes it to crash, then why do the signal_plan9_test.go tests in the standard library already still pass? Its postNote code does the same thing.

@0intro
Copy link
Member

0intro commented Oct 5, 2015

@bradfitz As far I understand it, the tests pass on plan9/386, but not on plan9/amd64.
The recent duffcopy change is indeed amd64-specific.

The real issue is we don't have any plan9/amd64 builder currently.
I've set up a GCE image running the 9k (amd64) kernel, but I still have
to fix a major issue in the memory manager before it could be running reliably.

@kennylevinsen
Copy link
Author

@0intro: Yes, it's amd64 specific. 386 uses MOVL, whereas amd64 uses MOVOU with an XMM register.

@bradfitz
Copy link
Contributor

bradfitz commented Oct 5, 2015

We really need an plan9/amd64 builder before we can even discuss changes for plan9/amd64. In fact, I'm actually a little tempted to delete all the plan9/amd64 code since it's been (way!) over four weeks (per golang.org/wiki/PortingPolicy) since we last saw a successful build result plan9/amd64. The porting policy requires we have a running builder.

If 9k is a buggy kernel, are there any other non-buggy amd64 kernels which can run in other environments at least? VMWare? AWS?

@0intro
Copy link
Member

0intro commented Oct 5, 2015

The former plan9/amd64 builder was running the 9front kernel.
I should be able to set up an old-style plan9/amd64 builder running 9front.
Would it be fine?

@kennylevinsen
Copy link
Author

I was about to suggest that. 9front doesn't seem to have known issues with the amd64 kernel.

I would be very sad if the plan9/amd64 code got pulled.

@bradfitz
Copy link
Contributor

bradfitz commented Oct 5, 2015

I would be very sad if the plan9/amd64 code got pulled.

@Joushou, maybe you can help run a builder. :)

@bradfitz
Copy link
Contributor

bradfitz commented Oct 5, 2015

@0intro, I don't know anything about 9k vs 9front. Old style builders can't be trybots, but that's fine. Email me for a build key.

@kennylevinsen
Copy link
Author

@bradfitz I might be able to donate some processing power myself to the task in the near future, but my internet connection is wireless and unstable until some legal matters get sorted out. Until then, I can only help with others' builders, and try to see if I can find and fix other plan9 issues. :)

@0intro
Copy link
Member

0intro commented Oct 5, 2015

@Joushou Any help is very appreciated :)

A plan9/amd64 (9front) builder is now running.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/15421 mentions this issue.

@0intro 0intro closed this as completed in 50ad337 Oct 6, 2015
@golang golang locked and limited conversation to collaborators Oct 9, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants