Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: split stack overflow on *-386 #35349

Open
FiloSottile opened this issue Nov 4, 2019 · 12 comments

Comments

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Nov 4, 2019

https://storage.googleapis.com/go-build-log/2a4850e8/linux-386_a62e5e86.log from the https://golang.org/cl/205057 TryBot.

runtime: newstack sp=0x8e98e8c stack=[0x8e99000, 0x8e9a000]
	morebuf={pc:0x8126ddd sp:0x8e98e94 lr:0x0}
	sched={pc:0x8127353 sp:0x8e98e90 lr:0x0 ctxt:0x0}
runtime: gp=0x8c5a2a0, goid=37, gp->status=0x2
 runtime: split stack overflow: 0x8e98e8c < 0x8e99000
fatal error: runtime: split stack overflow

/cc @ianlancetaylor @aclements

@FiloSottile

This comment has been minimized.

Copy link
Member Author

@FiloSottile FiloSottile commented Nov 4, 2019

The only hit on greplogs in recent months which does not look like #22553 is 2019-09-23T16:50:00-a14efb1/plan9-386-0intro.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 4, 2019

Well that's strange. I can't think of any reason why that error would not always happen. There aren't any random elements that I can see.

@bcmills

This comment has been minimized.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Nov 7, 2019

There aren't any random elements that I can see.

With a garbage collector, there are always random elements. (Is it possible that this has something to do with the GC's stack-shrinking?)

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Nov 8, 2019

@bcmills bcmills changed the title runtime: split stack overflow runtime: split stack overflow o6 Nov 8, 2019
@bcmills bcmills changed the title runtime: split stack overflow o6 runtime: split stack overflow on linux-386 Nov 8, 2019
@bcmills bcmills changed the title runtime: split stack overflow on linux-386 runtime: split stack overflow on *-386 Nov 8, 2019
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

In each case the stack trace looks approximately like

cmd/compile/internal/gc.(*ssafn).Log(0xa0678a0, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/ssa.go:6792 +0x24 fp=0x985a564 sp=0x985a560 pc=0x887bc04
cmd/compile/internal/ssa.(*Func).Log(...)
	/workdir/go/src/cmd/compile/internal/ssa/func.go:624
cmd/compile/internal/ssa.Compile(0xa004cc0)
	/workdir/go/src/cmd/compile/internal/ssa/compile.go:32 +0x6a fp=0x985d878 sp=0x985a564 pc=0x820c31a
cmd/compile/internal/gc.buildssa(0x988a0d0, 0x0, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/ssa.go:444 +0xa71 fp=0x985d948 sp=0x985d878 pc=0x884ad31
cmd/compile/internal/gc.compileSSA(0x988a0d0, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:298 +0x52 fp=0x985d9e4 sp=0x985d948 pc=0x881c132
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Nov 8, 2019

That makes it even stranger, because that function is literally trivial:

func (e *ssafn) Log() bool {
return e.log
}

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Nov 8, 2019

So, maybe something to do with inlining?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

That function is trivial, but that's just the function where the problem is noticed. The function that is pushing the stack too low is cmd/compile/internal/ssa.Compile, which has quite a large stack frame. You can see this by looking at the changes in sp values in the traceback.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

OK, let's consider the possibility of an ill-timed shrinkstack. The total available stack is 0x2000 bytes. Of that, the ssa.Compile function needs 0x3314 bytes, which overflows. Each time the stack shrinks, it is cut in half. So let's say that the stack was originally 0x4000 bytes, which would leave enough room for ssa.Compile. So we would be in trouble if the stack shrank after ssa.Compile decided it had enough space but before ssa.Compile actually adjusted the stack pointer.

The prologue of ssa.Compile looks like this:

  compile.go:29         0x82104d0               658b0d00000000          MOVL GS:0, CX                                                   
  compile.go:29         0x82104d7               8b89fcffffff            MOVL 0xfffffffc(CX), CX                                         
  compile.go:29         0x82104dd               8b7108                  MOVL 0x8(CX), SI                                                
  compile.go:29         0x82104e0               81fedefaffff            CMPL $0xfffffade, SI                                            
  compile.go:29         0x82104e6               0f8433130000            JE 0x821181f                                                    
  compile.go:29         0x82104ec               8d842480030000          LEAL 0x380(SP), AX                                              
  compile.go:29         0x82104f3               29f0                    SUBL SI, AX                                                     
  compile.go:29         0x82104f5               3d10360000              CMPL $0x3610, AX                                                
  compile.go:29         0x82104fa               0f861f130000            JBE 0x821181f                                                   
  compile.go:29         0x8210500               81ec10330000            SUBL $0x3310, SP                                                

So it seems to me that we would be in trouble if something shrank the stack while executing from address 0x82104f3 through address 0x82104fa.

But I don't see any way that could happen. The runtime will not shrink the stack of a goroutine that was preempted due to a signal.

I have not been able to recreate the problem on my laptop.

@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented Nov 11, 2019

Just a guess:

  compile.go:29         0x82104d0               658b0d00000000          MOVL GS:0, CX                                                   
  compile.go:29         0x82104d7               8b89fcffffff            MOVL 0xfffffffc(CX), CX                                         

this is loading G from TLS. If it is preempted between these two instructions, parked, and then resumed on a different thread, the TLS address may become invalid (still pointing to the TLS in the old thread), therefore may load a wrong stack bound.

If this is the case, two-instruction TLS access probably needs to be marked nonpreemptible.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 13, 2019

Change https://golang.org/cl/206903 mentions this issue: cmd/internal/obj/x86: mark 2-instruction TLS access nonpreemptible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.