New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: windows-arm build fails with "unexpected return pc for fmt.(*pp).handleMethods called from 0x10" #28854

Open
alexbrainman opened this Issue Nov 18, 2018 · 10 comments

Comments

Projects
None yet
5 participants
@alexbrainman
Member

alexbrainman commented Nov 18, 2018

windows-arm build is broken

https://build.golang.org/log/18c3850b705ffd52fbb0caacb3d2a89aacfaf456

runtime: unexpected return pc for fmt.(*pp).handleMethods called from 0x10
stack: frame={sp:0x1144ec10, fp:0x1144ec58} stack=[0x1144c000,0x11450000)
1144eb90:  00000000  1146c090  114b00f4  1144eb6c 
1144eba0:  00b9a990  00d20e58  00000000  00000000 
1144ebb0:  00abe1bc <fmt.(*pp).handleMethods+304>  00b9a990  00d20e58  00b4ac80 <html/template.(*Error).Error+20> 
1144ebc0:  114b00e0  00b4ac80 <html/template.(*Error).Error+20>  00abe1bc <fmt.(*pp).handleMethods+304>  00d25a4c 
1144ebd0:  00a2c0b4 <runtime.deferproc+40>  000001ff  00000001  114b00e0 
1144ebe0:  1146c090  00d259f0  00b97f88  00b96b98 
1144ebf0:  00c01718  114bec1c  00abe294 <fmt.(*pp).handleMethods+520>  00a2c154 <runtime.deferproc+200> 
1144ec00:  1146c0bc  00abe1a0 <fmt.(*pp).handleMethods+276>  1146c0ac  114bec24 
1144ec10: <00000010  114bec20  00abe620 <fmt.(*pp).printArg+472>  00000000 
1144ec20:  00bd473c  114a2000  00b96b98  00000000 
1144ec30:  00000076  00000000  00c01718  317002a0 
1144ec40:  00000000  00000000  00b96b98  00000000 
1144ec50:  00d15a31  00000000 >00000000  00000000 
1144ec60:  00ac1b44 <fmt.(*pp).doPrintf+316>  114a2000  00000076  00000001 
1144ec70:  00000000  00000000  00a5bbd4 <sync.(*Pool).getSlow+220>  114a0114 
1144ec80:  00000000  00000001  00a5b640 <sync.(*Mutex).Unlock+40>  114a0114 
1144ec90:  00000001  00000000  00000001  00000000 
1144eca0:  00ac284c <fmt.(*pp).doPrintf+3652>  114a0114  00abb784 <fmt.Sprintf+72>  114a2000 
1144ecb0:  00b96b98  00000000  00000076  114a0114 
1144ecc0:  00000004  00000004  00000001  00000000 
1144ecd0:  00bb3358  00000002 
fatal error: unknown caller pc

runtime stack:
runtime.throw(0xbc23cc, 0x11)
	e:/workdir/go/src/runtime/panic.go:608 +0x5c
runtime.gentraceback(0xffffffff, 0xffffffff, 0x0, 0x114b00e0, 0x0, 0x0, 0x7fffffff, 0xbd4a5c, 0x315ff938, 0x0, ...)
	e:/workdir/go/src/runtime/traceback.go:275 +0x171c
runtime.copystack(0x114b00e0, 0x4000, 0x1)
	e:/workdir/go/src/runtime/stack.go:881 +0x1c4
runtime.newstack()
	e:/workdir/go/src/runtime/stack.go:1050 +0x280
runtime.morestack()
	e:/workdir/go/src/runtime/asm_arm.s:420 +0x60

goroutine 242 [copystack]:
runtime.call16(0x0, 0xbd473c, 0x1146c0ac, 0x10, 0x10)
	e:/workdir/go/src/runtime/asm_arm.s:516 +0x7c fp=0x1144eb68 sp=0x1144eb68 pc=0xa547e8
panic(0xb9a990, 0xd20e58)
	e:/workdir/go/src/runtime/panic.go:513 +0x194 fp=0x1144ebb0 sp=0x1144eb68 pc=0xa2cfbc
runtime.panicmem(...)
	e:/workdir/go/src/runtime/panic.go:82
runtime.sigpanic()
	e:/workdir/go/src/runtime/signal_windows.go:210 +0x1d0 fp=0x1144ebc4 sp=0x1144ebb0 pc=0xa3f214
fmt.(*pp).handleMethods(0x114bec20, 0xabe620, 0x0)
	e:/workdir/go/src/fmt/print.go:608 +0x130 fp=0x1144ec10 sp=0x1144ebc8 pc=0xabe1bc
runtime: unexpected return pc for fmt.(*pp).handleMethods called from 0x10
stack: frame={sp:0x1144ec10, fp:0x1144ec58} stack=[0x1144c000,0x11450000)
1144eb90:  00000000  1146c090  114b00f4  1144eb6c 
1144eba0:  00b9a990  00d20e58  00000000  00000000 
1144ebb0:  00abe1bc <fmt.(*pp).handleMethods+304>  00b9a990  00d20e58  00b4ac80 <html/template.(*Error).Error+20> 
1144ebc0:  114b00e0  00b4ac80 <html/template.(*Error).Error+20>  00abe1bc <fmt.(*pp).handleMethods+304>  00d25a4c 
1144ebd0:  00a2c0b4 <runtime.deferproc+40>  000001ff  00000001  114b00e0 
1144ebe0:  1146c090  00d259f0  00b97f88  00b96b98 
1144ebf0:  00c01718  114bec1c  00abe294 <fmt.(*pp).handleMethods+520>  00a2c154 <runtime.deferproc+200> 
1144ec00:  1146c0bc  00abe1a0 <fmt.(*pp).handleMethods+276>  1146c0ac  114bec24 
1144ec10: <00000010  114bec20  00abe620 <fmt.(*pp).printArg+472>  00000000 
1144ec20:  00bd473c  114a2000  00b96b98  00000000 
1144ec30:  00000076  00000000  00c01718  317002a0 
1144ec40:  00000000  00000000  00b96b98  00000000 
1144ec50:  00d15a31  00000000 >00000000  00000000 
1144ec60:  00ac1b44 <fmt.(*pp).doPrintf+316>  114a2000  00000076  00000001 
1144ec70:  00000000  00000000  00a5bbd4 <sync.(*Pool).getSlow+220>  114a0114 
1144ec80:  00000000  00000001  00a5b640 <sync.(*Mutex).Unlock+40>  114a0114 
1144ec90:  00000001  00000000  00000001  00000000 
1144eca0:  00ac284c <fmt.(*pp).doPrintf+3652>  114a0114  00abb784 <fmt.Sprintf+72>  114a2000 
1144ecb0:  00b96b98  00000000  00000076  114a0114 
1144ecc0:  00000004  00000004  00000001  00000000 
1144ecd0:  00bb3358  00000002 
fmt.(*pp).handleMethods(0x0, 0xac1b44, 0x114a2000)
	e:/workdir/go/src/fmt/print.go:608 +0x130 fp=0x1144ec58 sp=0x1144ec10 pc=0xabe1bc
created by testing.(*T).Run
	e:/workdir/go/src/testing/testing.go:913 +0x2ac

goroutine 1 [chan receive]:
testing.(*T).Run(0x1177a000, 0xbbe8bf, 0xa, 0xbd4814, 0x1)
	e:/workdir/go/src/testing/testing.go:914 +0x2cc
testing.runTests.func1(0x114b4000)
	e:/workdir/go/src/testing/testing.go:1154 +0x68
testing.tRunner(0x114b4000, 0x1145bf14)
	e:/workdir/go/src/testing/testing.go:862 +0xa0
testing.runTests(0x11498020, 0xd23f50, 0x3c, 0x3c, 0x0)
	e:/workdir/go/src/testing/testing.go:1152 +0x238
testing.(*M).Run(0x114a8000, 0x0)
	e:/workdir/go/src/testing/testing.go:1069 +0x130
main.main()
	_testmain.go:234 +0x12c
FAIL	html/template	1.534s

@jordanrh1 please fix the breakage. According to https://github.com/golang/go/wiki/PortingPolicy

Once the CLs are all submitted, all.bash must pass, so that the builder reports "ok" in the dashboard.

Any port started during a release cycle must be finished (all.bash passing, builder reporting "ok") before the corresponding release freeze, or else the code will be removed at the freeze.

We did not have windows-arm builder pass complete build yet. Unfortunately I do not have time to debug this.

Thank you.

Alex

@jordanrh1

This comment has been minimized.

Contributor

jordanrh1 commented Nov 21, 2018

The html/template error is very strange. The failing test is TestErrors. An access violation occurs in html/template.(*Error).Error(SB) when it tries to dereference a NULL pointer.

TEXT html/template.(*Error).Error(SB) /home/jordanrh/go/src/html/template/error.go
	  error.go:216          0x54ac88                e59a1008                MOVW 0x8(R10), R1
	  error.go:216          0x54ac8c                e15d0001                CMP R1, R13
	  error.go:216          0x54ac90                9a00009e                B.LS 0x54af10
	  error.go:216          0x54ac94                e52de050                MOVW.W R14, -0x50(R13)
	  error.go:218          0x54ac98                e59d0054                MOVW 0x54(R13), R0
--->	  error.go:218          0x54ac9c                e5901008                MOVW 0x8(R0), R1  <--- faulting instruction (R0=0) (e.Line) 
	  error.go:218          0x54aca0                e5902004                MOVW 0x4(R0), R2  (e.Node)
	  error.go:218          0x54aca4                e3520000                CMP $0, R2
	  error.go:218          0x54aca8                1a00006c                B.NE 0x54ae60
	  error.go:221          0x54acac                e5901014                MOVW 0x14(R0), R1
	  error.go:221          0x54acb0                e3510000                CMP $0, R1


0:000> r
	 **r0=00000000**  r1=1142cb70  r2=1142d2dc  r3=00000010  r4=11408a80  r5=114b4280
	 r6=1141863c  r7=11418648  r8=114522ec  r9=00000007 r10=11408a80 r11=1142d2dc
	r12=114522ef  sp=1142d280  lr=0023e1d8  pc=002cac9c psr=20000010 --C-- ARM

Corresponding GO code:

      func (e *Error) Error() string {
	        switch {
	        case e.Node != nil:
	                loc, _ := (*parse.Tree)(nil).ErrorContext(e.Node)
	                return fmt.Sprintf("html/template:%s: %s", loc, e.Description)
--->	        case e.Line != 0:  <-- load of e.Line causes fault
	                return fmt.Sprintf("html/template:%s:%d: %s", e.Name, e.Line, e.Description)
	        case e.Name != "":
	                return fmt.Sprintf("html/template:%s: %s", e.Name, e.Description)
	        }
	        return "html/template: " + e.Description
	}

The stack location that e comes from is indeed null:

0:000> dd sp+0x54
	1142d2d4  **00000000** 00354748 114b4280 00316b98

I do not believe the caller passes a null Error object and I don't think there is any defect in the code being tested. Interestingly the crash only occurs on the following input, but all the other testcases must be present in order for the crash to happen. If you comment out just two other testcases, it passes. This suggests the bug is dependent on memory layout.

                        {
	                        `<script>reverseList = [{{template "t"}}]</script>` +
	                                // Missing " after recursive call.
	                                `{{define "t"}}{{if .Tail}}{{template "t" .Tail}}{{end}}{{.Head}}",{{end}}`,
	                        `: cannot compute output context for template t$htmltemplate_stateJS_elementScript`,
	                },

I think something is going wrong with stack growth. Stack growth uses tracebacks to update stack pointers. There are known issues with tracebacks so I think tracebacks may be at the root of this.

@alexbrainman

This comment has been minimized.

Member

alexbrainman commented Nov 21, 2018

Pinging Dr @aclements. Do you have any suggestions about how to debug this?

Thank you.

Alex

@jordanrh1

This comment has been minimized.

Contributor

jordanrh1 commented Nov 21, 2018

I found the source of the error on Windows/ARM. The synthetic call to sigpanic was not being constructed correctly for ARM. The code was written for x86/amd64 which does not have a link register:

func exceptionhandler
         ...
        if r.ip() != 0 {
                sp := unsafe.Pointer(r.sp())
                sp = add(sp, ^(unsafe.Sizeof(uintptr(0)) - 1)) // sp--
                *((*uintptr)(sp)) = r.ip()
                r.setsp(uintptr(sp))
        }

The correct code for ARM is:

        if r.ip() != 0 {
                sp := unsafe.Pointer(r.sp())
                sp = add(sp, ^(unsafe.Sizeof(uintptr(0)) - 1)) // sp--
                *((*uintptr)(sp)) = r.lr()   // *** LR must be saved to stack instead of PC
                r.lrr = uint32(r.ip())        // *** LR must be set to faulting PC
                r.setsp(uintptr(sp))
        }

This allows traceback to unwind the stack correctly.

As for the access violation in html/template, it looks like this occurs on all platforms. It was just the catalyst for exposing the incorrect exception handling code on Windows. The access violation shows up when you run the test under the debugger (linux/amd64):

(gdb) run -test.v -test.run TestErrors
Starting program: /home/jordanrh/work/go/template.test -test.v -test.run TestErrors
[New LWP 5828]
[New LWP 5829]
[New LWP 5830]
[New LWP 5831]
[New LWP 5832]
=== RUN   TestErrors

Thread 6 "template.test" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 5832]
0x000000000055bd77 in html/template.(*Error).Error (e=0x0, ~r0=...) at /home/jordanrh/go/src/html/template/error.go:218
218             case e.Node != nil:
(gdb) info registers
rax            0x0      0
rbx            0x20     32
rcx            0xc00006c600     824634164736
rdx            0xc0000fe848     824634763336
rsi            0xc0000fe848     824634763336
rdi            0xc000124260     824634917472
rbp            0xc0000fe828     0xc0000fe828
rsp            0xc0000fe790     0xc0000fe790
r8             0x20     32
r9             0xc000028f88     824633888648
r10            0x0      0
r11            0xc0000291d0     824633889232
r12            0x80     128
r13            0x40     64
r14            0x3      3
r15            0x80     128
rip            0x55bd77 0x55bd77 <html/template.(*Error).Error+55>
eflags         0x10206  [ PF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
---Type <return> to continue, or q <return> to quit---
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb)

@stjj89 @mvdan is it by design that html/template generates access violations?

@jordanrh1

This comment has been minimized.

Contributor

jordanrh1 commented Nov 21, 2018

More helpful stack trace:

#0  0x0000000000152d90 in html/template.(*Error).Error (e=0x0, ~r0=...) at /home/jordanrh/go/src/html/template/error.go:218
#1  0x00000000000c3864 in fmt.(*pp).handleMethods (p=0x40000920c0, verb=118, handled=true) at /home/jordanrh/go/src/fmt/print.go:608
#2  0x00000000000c3ce4 in fmt.(*pp).printArg (p=0x40000920c0, arg=..., verb=118) at /home/jordanrh/go/src/fmt/print.go:697
#3  0x00000000000c70bc in fmt.(*pp).doPrintf (p=0x40000920c0, format=..., a=...) at /home/jordanrh/go/src/fmt/print.go:1014
#4  0x00000000000c0e84 in fmt.Sprintf (format=..., a=..., ~r2=...) at /home/jordanrh/go/src/fmt/print.go:214
#5  0x000000000015186c in html/template.context.String (c=..., ~r0=...) at /home/jordanrh/go/src/html/template/context.go:29
#6  0x0000000000170920 in html/template.(*context).String (~r0=...) at <autogenerated>:1
#7  0x00000000000c3934 in fmt.(*pp).handleMethods (p=0x4000092000, verb=118, handled=true) at /home/jordanrh/go/src/fmt/print.go:614
#8  0x00000000000c3ce4 in fmt.(*pp).printArg (p=0x4000092000, arg=..., verb=118) at /home/jordanrh/go/src/fmt/print.go:697
#9  0x00000000000c70bc in fmt.(*pp).doPrintf (p=0x4000092000, format=..., a=...) at /home/jordanrh/go/src/fmt/print.go:1014
#10 0x00000000000c0e84 in fmt.Sprintf (format=..., a=..., ~r2=...) at /home/jordanrh/go/src/fmt/print.go:214
#11 0x0000000000155740 in html/template.errorf (k=<optimized out>, node=..., f=..., args=..., line=<optimized out>) at /home/jordanrh/go/src/html/template/error.go:232
#12 html/template.join (a=..., b=..., node=..., nodeName=..., ~r4=...) at /home/jordanrh/go/src/html/template/escape.go:461
#13 0x0000000000155860 in html/template.join (a=..., b=..., node=..., nodeName=..., ~r4=...) at /home/jordanrh/go/src/html/template/escape.go:454
#14 0x0000000000155c14 in html/template.(*escaper).escapeBranch (e=0x4000011040, c=..., n=0x4000011000, nodeName=..., ~r3=...) at /home/jordanrh/go/src/html/template/escape.go:484
#15 0x00000000001537ec in html/template.(*escaper).escape (e=0x4000011040, c=..., n=..., ~r2=...) at /home/jordanrh/go/src/html/template/escape.go:128
#16 0x0000000000155df8 in html/template.(*escaper).escapeList (e=0x4000011040, c=..., n=0x40000bd020, ~r2=...) at /home/jordanrh/go/src/html/template/escape.go:493
#17 0x00000000001560ac in html/template.(*escaper).escapeListConditionally (e=0x4000094298, c=..., n=0x40000bd020, filter={void (html/template.escaper *, html/template.context, bool *)} 0x40000b76a8, ~r3=..., ~r4=<optimized out>)
    at /home/jordanrh/go/src/html/template/escape.go:508
#18 0x0000000000157004 in html/template.(*escaper).escapeTemplateBody (e=0x4000094298, c=..., t=0x4000010f80, ~r2=..., ~r3=<optimized out>) at /home/jordanrh/go/src/html/template/escape.go:626
#19 0x0000000000156d34 in html/template.(*escaper).computeOutCtx (e=0x4000094298, c=..., t=0x4000010f80, ~r2=...) at /home/jordanrh/go/src/html/template/escape.go:587
#20 0x0000000000156788 in html/template.(*escaper).escapeTree (e=0x4000094298, c=..., node=..., name=..., line=0, ~r4=..., ~r5=...) at /home/jordanrh/go/src/html/template/escape.go:580
#21 0x00000000001530ac in html/template.escapeTemplate (tmpl=0x40000bcfc0, node=..., name=..., ~r3=...) at /home/jordanrh/go/src/html/template/escape.go:22
#22 0x000000000015b2e8 in html/template.(*Template).escape (t=0x40000bcfc0, ~r0=...) at /home/jordanrh/go/src/text/template/template.go:47
#23 0x000000000015b428 in html/template.(*Template).Execute (t=0x40000bcfc0, wr=..., data=..., ~r2=...) at /home/jordanrh/go/src/html/template/template.go:119
#24 0x0000000000167a14 in html/template.TestErrors (t=0x40000a6100) at /home/jordanrh/go/src/html/template/escape_test.go:1069
#25 0x00000000000e2eb0 in testing.tRunner (t=0x40000a6100, fn={void (testing.T *)} 0x40000b7fd0) at /home/jordanrh/go/src/testing/testing.go:862
#26 0x00000000000659e4 in runtime.goexit () at /home/jordanrh/go/src/runtime/asm_arm64.s:1128
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
@gopherbot

This comment has been minimized.

gopherbot commented Nov 21, 2018

Change https://golang.org/cl/150957 mentions this issue: runtime: windows/arm fix tracebacks printed from sigpanic

gopherbot pushed a commit that referenced this issue Nov 26, 2018

runtime: windows/arm fix tracebacks printed from sigpanic
The exception handler modifies the stack and continuation context so
it looks like the faulting code calls sigpanic() directly. The call was
not set up correctly on ARM, because it did not handle the link register
correctly. This change handles the link register correctly for ARM.

Updates #28854

Change-Id: I7ccf838adfc05cd968a5edd7d19ebba6a2478360
Reviewed-on: https://go-review.googlesource.com/c/150957
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@stjj89

This comment has been minimized.

Contributor

stjj89 commented Nov 26, 2018

I believe the problem is that context.String prints context.err using the %v format specifier. This causes (*Error).Error to be called with a nil receiver when context.err is `nil.

The proper fix should be to add a nil-receiver switch case to (*Error).Error, since it is the Error's responsibility to print itself properly even if it is nil.

I don't really understand why this error has gone undetected so long on linux builds. Any clue what is masking this issue on standard go test runs on linux systems?

@jordanrh1

This comment has been minimized.

Contributor

jordanrh1 commented Nov 27, 2018

It looks like there's code in fmt/print to recover from panics:

                // If a string is acceptable according to the format, see if
                // the value satisfies one of the string-valued interfaces.
                // Println etc. set verb to %v, which is "stringable".
                switch verb {
                case 'v', 's', 'x', 'X', 'q':
                        // Is it an error or Stringer?
                        // The duplication in the bodies is necessary:
                        // setting handled and deferring catchPanic
                        // must happen before calling the method.
                        switch v := p.arg.(type) {
                        case error:
                                handled = true
                                defer p.catchPanic(p.arg, verb)
                                p.fmtString(v.Error(), verb)
                                return

                        case Stringer:
                                handled = true
                                defer p.catchPanic(p.arg, verb)
                                p.fmtString(v.String(), verb)
                                return
                        }
                }
@gopherbot

This comment has been minimized.

gopherbot commented Nov 27, 2018

Change https://golang.org/cl/151501 mentions this issue: html/template: print nil Error pointers correctly

@jordanrh1

This comment has been minimized.

Contributor

jordanrh1 commented Dec 11, 2018

@alexbrainman Since this issue is fixed can it be closed?

I'm working on resolving the remaining test failures. I will use #26148.

@alexbrainman

This comment has been minimized.

Member

alexbrainman commented Dec 12, 2018

@alexbrainman Since this issue is fixed can it be closed?

I think this issue became about fixing bug in html/template package now. The issue will get closed once https://go-review.googlesource.com/c/go/+/151501 is submitted (CL 151501 mentions this issue in its commit message).

I'm working on resolving the remaining test failures. I will use #26148.

SGTM.

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment