Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestAsyncPreempt fails on openbsd/arm64 (Apple M1 running single CPU) #47530

Closed
4a6f656c opened this issue Aug 4, 2021 · 4 comments
Closed

Comments

@4a6f656c
Copy link
Contributor

@4a6f656c 4a6f656c commented Aug 4, 2021

OpenBSD's arm64 port runs on Apple M1 hardware, however only currently stable when running with a single-processor kernel:

$ uname -a
OpenBSD armadillo.sing.id.au 6.9 GENERIC#1196 arm64
$ sysctl hw
hw.machine=arm64
hw.model=Apple Icestorm r1p1
hw.ncpu=1
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=sd0:c9c422693d5abe86
hw.diskcount=1
hw.product=Apple Mac mini (M1, 2020)
hw.physmem=16390758400
hw.usermem=16390742016
hw.ncpufound=8
hw.allowpowerdown=1
hw.ncpuonline=1

The Go openbsd/arm64 port works as expected on this machine, with the exception of the TestAsyncPreempt failing due to a timeout (note this is not a new regression - the same failure occurs with both Go -tip and Go 1.16). The strange thing is that this test passes on the openbsd/arm64 builder (running on a Pine64+ board), when running with both a single-processor and multi-processor kernel (worth noting this is a slightly older OpenBSD version, but I'm not currently aware of any kernel changes that would impact this).

Adding some debugging, we get async preempt signals being delivered to the process (SIGURG), however we do not preempt as we return false from isAsyncSafePoint via the if test at line 401 of src/runtime/preempt.go (in other words, up returned by pcdatavalue2 is not _PCDATA_UnsafePointSafe).

Building src/runtime/testdata/testprog manually and running with the AsyncPreempt argument, shows that we keep calling isAsyncSafePoint with a pc of 0xcf408 - this is part of the main.frameless function:

(gdb) disass 0xcf408
Dump of assembler code for function main.frameless:
   0x00000000000cf400 <+0>:     mov     x0, #0x0                        // #0
   0x00000000000cf404 <+4>:     b       0xcf43c <main.frameless+60>
   0x00000000000cf408 <+8>:     adrp    x27, 0x1f7000 <runtime.mheap_+88864>
   0x00000000000cf40c <+12>:    add     x27, x27, #0xf88
   0x00000000000cf410 <+16>:    ldr     x1, [x27]
   0x00000000000cf414 <+20>:    mul     x2, x0, x0
   0x00000000000cf418 <+24>:    mul     x2, x2, x0
   0x00000000000cf41c <+28>:    mul     x2, x2, x0
   0x00000000000cf420 <+32>:    mul     x2, x0, x2
   0x00000000000cf424 <+36>:    mov     x3, #0x3039                     // #12345
   0x00000000000cf428 <+40>:    madd    x1, x2, x3, x1
   0x00000000000cf42c <+44>:    adrp    x27, 0x1f7000 <runtime.mheap_+88864>
   0x00000000000cf430 <+48>:    add     x27, x27, #0xf88
   0x00000000000cf434 <+52>:    str     x1, [x27]
   0x00000000000cf438 <+56>:    add     x0, x0, #0x1
   0x00000000000cf43c <+60>:    mov     x27, #0x4000000000000000        // #4611686018427387904
   0x00000000000cf440 <+64>:    cmp     x0, x27
   0x00000000000cf444 <+68>:    b.lt    0xcf408 <main.frameless+8>
   0x00000000000cf448 <+72>:    ret
   0x00000000000cf44c <+76>:    .inst   0x00000000 ; undefined
End of assembler dump.

I do not see any reason why that should be an unsafe point and if that is indeed the case, presumably this means that the PCDATA table has incorrect information. Also, from manual inspection, the code in this function decompiles to the same assembly on both machines.

Any hints in tracking this down further would be greatly appreciated.

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented Aug 4, 2021

Loading

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Aug 4, 2021

It may be because it uses R27 (REGTMP). The injected call, runtime.asyncPreempt, clobbers R27. So if R27 is live it cannot be async preempted. However, R27 is only live within those three instructions. If we are preempted there we can just restart at first instruction. So it should be marked restartable (_PCDATA_Restart1 or _PCDATA_Restart2), and it indeed is. The code in isAsyncSafePoint seems to return too early for those values (the code below actually expects those values). I'll fix.

That said, I would expect the signal does not always land on that instruction. Some signal would land on a preemptible region (e.g. a MUL instruction) and it can preempt. Maybe the signal delivery on OpenBSD on that machine is somehow biased?

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 4, 2021

Change https://golang.org/cl/340011 mentions this issue: runtime: accept restartable sequence pcdata values in isAsyncSafePoint

Loading

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented Aug 5, 2021

It may be because it uses R27 (REGTMP). The injected call, runtime.asyncPreempt, clobbers R27. So if R27 is live it cannot be async preempted. However, R27 is only live within those three instructions. If we are preempted there we can just restart at first instruction. So it should be marked restartable (_PCDATA_Restart1 or _PCDATA_Restart2), and it indeed is. The code in isAsyncSafePoint seems to return too early for those values (the code below actually expects those values). I'll fix.

Ah! REGTMP was the bit that I was missing.

That said, I would expect the signal does not always land on that instruction. Some signal would land on a preemptible region (e.g. a MUL instruction) and it can preempt. Maybe the signal delivery on OpenBSD on that machine is somehow biased?

So this is an interesting and valid point. Obviously this relies on a SIGURG being delivered to a specific thread and that is only going to happen when the thread performs a system call (none in this case) or a context switch. The context switch in turn will be triggered via a timer interrupt being delivered, resulting in a trap into the kernel. Hacking up some equivalent C code, I see the SIGURG being delivered on one of three instructions in the loop on this machine, whereas when run on a Pine64+ machine it lands on almost any instruction in the loop. I suspect that what we're seeing here may be architectural, with the trap occurring at particular points in pipeline execution - I'll ask some people who are more familiar with the M1 hardware.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants