Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: windows/386 SEH corruption #7325

Closed
alexbrainman opened this issue Feb 13, 2014 · 31 comments
Closed

runtime: windows/386 SEH corruption #7325

alexbrainman opened this issue Feb 13, 2014 · 31 comments
Milestone

Comments

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Feb 13, 2014

Do this:

C:\>cd %GOROOT%\test

C:\go\root\src>hg id
ee32f4cd6b44 tip

C:\go\root\src>hg st

C:\go\root\test>go build divmod.go

C:\go\root\test>divmod.exe

I can see "Windows Just In Time Debugger" come up, offering to debug offending
program. If I proceed, it says "Unhandled exception at 0x00402d8e in divmod.exe:
0xC0000094: Integer division by zero."

I can also see similar problem on windows-386 builder
http://build.golang.org/log/10fc34c246e5a31e8909cad746568b396f62d4a8

...
# ..\test
Build complete, duration 1h0m0.013s. Result: error: timed out after 1h0m0s

The "go run divmod.go" crashed in a similar way and the builder waiting for
someone to take action on "... Just In Time Debugger ...". It gives up after 1
hour. And we end-up with multiple crashed programs on our builder.

divmod.exe is attached.

Alex

Attachments:

  1. divmod.exe (463872 bytes)
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 14, 2014

Comment 1:

One of the things that divmod tests is that a division zero causes a panic, a panic that
can be caught by a deferred call to recover.  Evidently that is not working on Windows.
I think it's supposed to work by having runtime·minit call
runtime·install_exception_handler which installs an SEH handler that calls
runtime·sigtramp.

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Feb 14, 2014

Comment 2:

I just checked recent history, and I can see everything works fine up until (including):
changeset:   19088:19f24bb3c8f8
user:        Nicolas Owens <mischief@offblast.org>
date:        Thu Feb 13 10:26:16 2014 -0500
summary:     net: only return unique hosts during hostname lookup on plan 9
version from
changeset:   19089:a7a59cfa61d2
user:        David du Colombier <0intro@gmail.com>
date:        Thu Feb 13 16:35:51 2014 +0100
summary:     cmd/gc: catch notes on Plan 9
and up until (including)
changeset:   19091:223d1c05e7c2
user:        Russ Cox <rsc@golang.org>
date:        Thu Feb 13 11:10:31 2014 -0500
summary:     runtime: introduce MSpan.needzero instead of writing to span data
don't build on windows. But next version after
changeset:   19092:c3319d654b61
user:        Dmitriy Vyukov <dvyukov@google.com>
date:        Thu Feb 13 20:15:19 2014 +0400
summary:     cmd/gc: fix windows build
already has this defect.
Alex

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Feb 14, 2014

Comment 3:

I applied
changeset:   19092:c3319d654b61
user:        Dmitriy Vyukov <dvyukov@google.com>
date:        Thu Feb 13 20:15:19 2014 +0400
summary:     cmd/gc: fix windows build
right onto
changeset:   19089:a7a59cfa61d2
user:        David du Colombier <0intro@gmail.com>
date:        Thu Feb 13 16:35:51 2014 +0100
summary:     cmd/gc: catch notes on Plan 9
to fix the build. And I can see now that this issue is started with:
changeset:   19090:a8b97f205736
user:        Dmitriy Vyukov <dvyukov@google.com>
date:        Thu Feb 13 19:36:45 2014 +0400
summary:     runtime: fix concurrent GC sweep
Alex

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Feb 14, 2014

Comment 4:

Dmitry, I wonder, if some of our threads (GC / prof / ...) don't have exception handler
installed?
Alex

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Feb 17, 2014

Comment 5:

I noticed our windows-amd64 builder crashes with different error. I can reproduce it
here on my windows-amd64 pc:
c:\go\root\test>divmod.exe
runtime: newstack framesize=0x0 argsize=0x18 sp=0x28fb50 stack=[0x290000, 0x292fa0]
        morebuf={pc:0x421074 sp:0x28fb60 lr:0x0}
        sched={pc:0x410f20 sp:0x28fb58 lr:0x0 ctxt:0x291508}
runtime: split stack overflow: 0x28fb50 < 0x290000
fatal error: runtime: split stack overflow
runtime stack:
runtime.throw(0x467589)
        c:/go/root/src/pkg/runtime/panic.c:464 +0x74
runtime.newstack()
        c:/go/root/src/pkg/runtime/stack.c:247 +0x1ff
runtime.morestack()
        c:/go/root/src/pkg/runtime/asm_amd64.s:228 +0x58
goroutine 16 [stack split]:
runtime.sighandler(0x2907a0, 0x2902b0, 0xc082012000)
        c:/go/root/src/pkg/runtime/os_windows_amd64.c:36 fp=0x28fb60
runtime: unexpected return pc for runtime.sigtramp called from 0x2907a0
runtime.sigtramp()
        ?:0 +0x54 fp=0x28fb68
created by _rt0_go
        c:/go/root/src/pkg/runtime/asm_amd64.s:97 +0x121
goroutine 17 [syscall]:
runtime.notetsleepg(0x2bff60, 0xdf8475800)
        c:/go/root/src/pkg/runtime/lock_sema.c:263 +0x83
runtime.MHeap_Scavenger()
        c:/go/root/src/pkg/runtime/mheap.c:531 +0xac
runtime.goexit()
        c:/go/root/src/pkg/runtime/proc.c:1438
created by runtime.main
        c:/go/root/src/pkg/runtime/proc.c:191
goroutine 18 [runnable]:
runtime.gosched()
        c:/go/root/src/pkg/runtime/proc.c:1412 +0x2a
bgsweep()
        c:/go/root/src/pkg/runtime/mgc0.c:1903 +0x46
runtime.goexit()
        c:/go/root/src/pkg/runtime/proc.c:1438
created by runtime.gc
        c:/go/root/src/pkg/runtime/mgc0.c:2187
It looks like stack is suddenly empty. How can that be? And why it only happens on
windows? The only windows difference I can think of, is StackSystem (extra room for
windows exception handler code)? Is it possible that some of our new code does not
account for that? Another possibility is memory corruption? I'm stumped. Looking for
suggestions.
Alex

@mattn
Copy link
Member

@mattn mattn commented Feb 18, 2014

Comment 6:

I can reproduced it.
http://go-gyazo.appspot.com/c013d0a359029305.png

@mattn
Copy link
Member

@mattn mattn commented Feb 18, 2014

Comment 7:

I can reproduce it.
http://go-gyazo.appspot.com/c013d0a359029305.png

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Feb 18, 2014

Comment 8:

I can do too.
Alex

@slimsag
Copy link

@slimsag slimsag commented Feb 24, 2014

Comment 9:

With GOMAXPROCS=1 I can reproduce the issue as well, but only randomly (every few times
I run divmod.exe it crashes, but not every time).

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Feb 26, 2014

Comment 10:

Labels changed: added release-go1.3.

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 4, 2014

Comment 11:

I think this CL should be fixed now by:
changeset:   19326:2750cd9fc49b
tag:         tip
user:        Russ Cox <rsc@golang.org>
date:        Mon Mar 03 19:55:40 2014 -0500
files:       src/cmd/gc/typecheck.c test/fixedbugs/issue7310.go
description:
cmd/gc: fix internal crash
TBR=ken2
CC=golang-codereviews
https://golang.org/cl/70200053

Status changed to Fixed.

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 4, 2014

Comment 12:

My mistake. I was meant to say: fixed by
runtime: fix traceback on Windows
The exception handler runs on the ordinary g stack,
and the stack copier is now trying to do a traceback
across it. That's never been needed before, so it was
unimplemented. Implement it, in all its ugliness.
Fixes windows/amd64 build.
TBR=khr
CC=golang-codereviews
https://golang.org/cl/71030043

@rsc
Copy link
Contributor

@rsc rsc commented Mar 7, 2014

Comment 13:

[Hi Hector! Question for you at the bottom.]
This is still happening on the windows/386 builder. I was able to connect to a broken
divmod with gdb and take a look. 
For future reference, when you're looking for thread-local storage on 32-bit windows, it
seems that the thread structures are allocated 0x7ffde000 moving downward: 0x7ffde000 is
thread 1, 0x7ffdd000 is thread 2, and so on: 0x7ffdb000 is thread 4 is as far as I got
in this program. Or maybe gdb numbers the threads in reverse and the thread structures
are actually allocated upward. 
Anyway, there is a divide by zero happening (as expected) but the SEH handler for the Go
thread has been removed from the SEH chain. That is, the chain should look like
0(FS) = m->seh = 0x31ff78
seh at 0x31ff78 = {0x31ffc4, 0x41ffb0}
seh at 0x31ffc4 = {0x31ffe4, 0x77459ac2}
seh at 0x31ffe4 = {0xffffffff, 0x774f7b76}
The 0x41ffb0 is our handler. the others are windows handlers earlier in the chain.
In the broken divmod program, 0(FS) is 0x31ffc4: the go SEH block has been removed. But
who removed it? The only code in Go that removes a SEH block is used during calls from C
back into Go, and that is not happening here.
So it seems like Windows removed it. Why? The only time Windows is supposed to remove a
SEH structure is if it declines to handle an exception and then an outer SEH does handle
it. The assumption is that the code is returning to something near the outer SEH, so the
inner code is gone and the inner SEH should be gone too. The only thing
runtime.sighandler passes on is EXCEPTION_BREAKPOINT:
    switch(info->ExceptionCode) {
    case EXCEPTION_BREAKPOINT:
        r->Eip--;   // because 8l generates 2 bytes for INT3
        return 1;
    }
If Windows sent an EXCEPTION_BREAKPOINT, then Go's handler would pass, an outer handler
would handle it, and Go's handler would be removed. But I do not understand why Windows
would send an EXCEPTION_BREAKPOINT. But also if this didn't happen in practice it's hard
to believe the code would be here at all. 
Hector, this code was in the very first implementation of Windows exception handling, in
https://golang.org/cl/4079041 aka b897d583e187. Clearly you had hit this,
because otherwise you wouldn't have thought to do the r->Eip--. 
How did this come up? Can it come up if a debugger is not involved and the program is
not crashing?
If it's not the breakpoint explanation, then I have no idea why Windows would remove the
SEH block. Maybe there is just a Windows kernel bug, but that's not encouraging. I ran
the test (divmod.exe) over and over and over by itself and couldn't make it fail. But it
fails somewhat regularly (maybe 1/4 of the time) during the builder all.bat tests.
Thread 1 (the main thread, I can tell) is not running Go code at the moment but also has
a broken SEH. It is in Windows code that may have pushed its own SEH, but the SEH link
pointer it has doesn't make sense, and it does not lead back to the Go SEH frame for
that m.
It is hard to see what Go could be doing wrong here other than the breakpoint thing, but
it is also hard to see how the breakpoint thing could happen. If it's really a Windows
bug, our options for working around this seem limited. One possibility would be to
always update the SEH head pointer at 0(FS) during runtime.gogo, so that on entry to any
goroutine, we know we have the right SEH setting. That's the best I can come up with
right now.
Anyone else? Hector?
Thanks.
Russ

Status changed to Accepted.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 7, 2014

Comment 14:

Also, Alex, were you seeing this on another windows/386 machine? Are you still? What
version of Windows? I am wondering if it is a Windows 2000-specific bug.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 7, 2014

Comment 15:

This issue was updated by revision b2fa6f4.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/72590044

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 7, 2014

Comment 16 by hectorchu:

I use Linux these days (and no Windows on my computer), so it's hard for me to do any
investigation. And besides I'd have to pick it all up again. But the breakpoint handler
is just so that when a debugger is attached and a breakpoint is set in the code, it will
break into the debugger. The IP correction is so that the debugger points at the right
location (otherwise it'd be off-by-one).

@rsc
Copy link
Contributor

@rsc rsc commented Mar 7, 2014

Comment 17:

Thanks. I will restore the breakpoint handler once we see a builder failure
without it. I expect we will.

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 7, 2014

Comment 18:

Russ,
I saw this bug on my windows XP. Also see comments in this issue from others. Also our
builder is not windows 2000. So it is not specific to windows 2000. I didn't see this
bug since your last fix, but I didn't try hard enough. I will try again, but I have no
competes here. I will try next week.
Alex

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 7, 2014

Comment 19:

Russ, I also think we need to consider scenario of exception happens while running go
exception handler. Out exception handler can do anything our normal program does. None
of exception handler functions are marked as no-grow-stack. Perhaps it is good idea to
rewrite handler to do very minimum.
Alex

@rsc
Copy link
Contributor

@rsc rsc commented Mar 10, 2014

Comment 20:

I've looked at a handful of these on the builder. The typical failure trace looks like
this:
(gdb) thread 3
[Switching to thread 3 (Thread 664.0x9a4)]
#0  0x774b5f4f in ntdll!_chkstk () from C:\Windows\system32\ntdll.dll
(gdb) where
#0  0x774b5f4f in ntdll!_chkstk () from C:\Windows\system32\ntdll.dll
#1  0x75f60262 in OpenProfileUserMapping ()
   from C:\Windows\system32\kernel32.dll
#2  0x003a06dc in ?? ()
#3  0x75f3fa12 in KERNEL32!GetApplicationRestartSettings ()
   from C:\Windows\system32\kernel32.dll
#4  0x75f5e080 in OpenProfileUserMapping ()
   from C:\Windows\system32\kernel32.dll
#5  0xffffffff in ?? ()
#6  0x003a0708 in ?? ()
#7  0x75f4042e in UnhandledExceptionFilter ()
   from C:\Windows\system32\kernel32.dll
#8  0x003a07e4 in ?? ()
#9  0x774c80f9 in ntdll!EtwpNotificationThread ()
   from C:\Windows\system32\ntdll.dll
#10 0x77491696 in ntdll!RtlInitializeNtUserPfn ()
   from C:\Windows\system32\ntdll.dll
#11 0x00000000 in ?? ()
(gdb)
Many of these symbols are bogus: gdb only knows about symbols in the dynamic link
tables, so many of these are just the last exported symbol before a large chunk of
unexported symbols. However, ntdll!_chkstk is accurate, as is
KERNEL32!GetApplicationRestartSettings.
ntdll!_chkstk is what Windows calls from any function that allocates a large stack frame
(>4kB). It works by touching one word per new page created, so that if you have just
a single 4kB guard page for your stack, you can't skip over it. The problem here is that
this code is running on a Go stack, presumably in response to the divide-by-zero. The
function mislabeled as OpenProfileUserMapping is trying to allocate a 4300-byte stack
frame, but Go only leaves 2 kB for Windows (StackSystem in runtime.h). The hope was that
2k is "big enough" but apparently it is not.
This happens in divmod with some regularity because I wrote divmod to generate test
cases using a deeply recursive function. Eventually this check ends up running near the
bottom of a stack that happens to have nothing mapped just before it in memory, and
_chkstk works as intended and faults. The new copying stacks exacerbate the problem, but
it could happen in the old stacks too. I believe it started happening reliably when we
reenabled concurrent sweep only because concurrent sweep created a new couple stack
frames and therefore disrupted the usual allocation patterns.
I would like to see a disassembly of the KiUserExceptionDispatcher implementation [1]
along with the things it uses, but I have been unable to produce one. Gdb seems not to
know enough about the symbols, and windbg is inscrutable.
I have also observed that the SEH information is often corrupt in the thread with the
problem. It is possible that this stack overflow is a distraction and only happens
because the Go SEH registration has been lost and we're stuck in Windows SEH routines.
It is also possible that this stack overflow - happening on some other thread, earlier,
undetected - is why the SEH registrations are being lost. I think that's not too likely
because it makes the most sense for GetApplicationRestartSettings to be called during a
crash, and we are trying to understand how the crash came to be, not bugs revealed
during its handling. Of course, it could be some other stack overflow. What else might
Windows be running at the bottom of the goroutine stacks that we're unaware of?
On the SEH registration problem, one theory is that when Windows code runs on the Go
stack, it registers and then deregisters its own SEH handlers. Normally we make sure
that the SEH chain is on the m (aka g0, aka OS) stack, but this Windows code is
executing on an arbitrary Go stack. It may be confused by seeing SEH registrations not
on the OS stack. In particular, the thread information block contains two words giving
the bounds of the OS stack, and some sources online claim that if Windows sees SEH
entries outside those bounds it will ignore/reject them somehow [2]:
    The OS is pretty paranoid about corrupt stacks during this chain traversal.
    It checks that all chain entries are within the bounds of the stack. 
    (These bounds are also recorded in the TEB).  The OS also checks that all
    entries are in ascending order on the stack.  If you violate these rules,
    the OS will consider the stack to be corrupt and will be unable to process
    exceptions.  This is one of the reasons that a Win32 application cannot
    break its stack into multiple disjoint segments as an innovative technique
    for dealing with stack overflow.
We have certainly not observed that, however, and I think if it were true we'd have seen
more problems.
I wonder if the deregistration of handlers during an exception is based on stack
address. If so, suppose that the Go SEH handler is registered on the OS stack at address
0x6fff0, and then we make a Go stack at 0x1010000, and then Windows code running on that
Go stack pushes a SEH handler or two around 0x100fff0, and then an exception happens. It
is possible that the unwind of the SEH stack during the exception is a while loop that
waits until it finds an entry larger than the stack address being returned to. (That's
similar to what we do in panic.) Windows would see the 0x6fff0 address as "lower" on the
stack than 0x100fff0, instead of realizing that they are two different stacks entirely,
and incorrectly remove the top-level SEH handler we've registered along with the others.
The next exception would not be caught by the now-removed Go SEH, and we end up seeing
the failure we see today.
It is possible that we could bypass some of this by switching to Vectored Exception
Handling [3]. VEH is nice because it applies to the whole program, not just one thread,
so they are not constantly being registered and deregistered. We can register it once
and not worry about the handler not being called or being overridden by Windows code due
to confusion about stack addresses. Go's VEH handler could pick off just the entries
with a PC in the Go text segment and let the others fall through to the other handlers
and then eventually to SEH. This is similar to what we do on windows/amd64, except that
on amd64 the handler information is in the executable header instead of registered at
run time. If the VEH implementation is low-level enough and doesn't call too many other
functions, it might fit in the 2k we are currently reserving. Using VEH would exclude
Windows 2000 (requiring Windows XP/Windows Server 2003 or later). 
Another possibility is to have a separate SEH chain for every goroutine stack, and make
the goroutine switch update all three words in the thread information block: stack
bottom, stack top, stack, SEH chain. This would mean that the Windows per-thread SEH
handlers would be excluded from executing (ever), which might break debug breakpoints
(the only time Go's SEH handler ever declines an opportunity to handle an exception).
It is also possible that we are causing this problem ourselves, similar to issue #7470.
However, we do next to no manipulation of the SEH ourselves, and the code that was buggy
in 7470 is not involved at all in test/divmod.go (no cgo).
I really don't know what to do here. None of the options are great. Using VEH is
probably simplest, if we can drop win2k and if it helps, but there's no guarantee it
will fix the general problem. It's easy to try though, I guess.
[1] http://www.microsoft.com/msj/0197/exception/exception.aspx
[2] http://blogs.msdn.com/b/cbrumme/archive/2003/10/01/51524.aspx
[3] http://msdn.microsoft.com/en-us/magazine/cc301714.aspx
(gdb) x/20i 0x75f60253
   0x75f60253 <OpenProfileUserMapping+20435>:   mov    %edi,%edi
   0x75f60255 <OpenProfileUserMapping+20437>:   push   %ebp
   0x75f60256 <OpenProfileUserMapping+20438>:   mov    %esp,%ebp
   0x75f60258 <OpenProfileUserMapping+20440>:   mov    $0x10cc,%eax
   0x75f6025d <OpenProfileUserMapping+20445>:
    call   0x75f60fc5 <OpenProfileUserMapping+23877>
   0x75f60262 <OpenProfileUserMapping+20450>:   mov    0x75f6e4ac,%eax
   0x75f60267 <OpenProfileUserMapping+20455>:   xor    %ebp,%eax
   0x75f60269 <OpenProfileUserMapping+20457>:   mov    %eax,-0x4(%ebp)
   0x75f6026c <OpenProfileUserMapping+20460>:   push   %ebx
   0x75f6026d <OpenProfileUserMapping+20461>:   mov    0x8(%ebp),%ebx
NOTE: 75f6e4ac is a PLT entry jumping to 0x774b5f28
(gdb) x/30i 'ntdll!_chkstk'
   0x774b5f28 <ntdll!_chkstk>:  push   %ecx
   0x774b5f29 <ntdll!_chkstk+1>:        lea    0x4(%esp),%ecx
   0x774b5f2d <ntdll!_chkstk+5>:        sub    %eax,%ecx
   0x774b5f2f <ntdll!_chkstk+7>:        sbb    %eax,%eax
   0x774b5f31 <ntdll!_chkstk+9>:        not    %eax
   0x774b5f33 <ntdll!_chkstk+11>:       and    %eax,%ecx
   0x774b5f35 <ntdll!_chkstk+13>:       mov    %esp,%eax
   0x774b5f37 <ntdll!_chkstk+15>:       and    $0xfffff000,%eax
   0x774b5f3c <ntdll!_chkstk+20>:       cmp    %eax,%ecx
   0x774b5f3e <ntdll!_chkstk+22>:       jb     0x774b5f4a <ntdll!_chkstk+34>
   0x774b5f40 <ntdll!_chkstk+24>:       mov    %ecx,%eax
   0x774b5f42 <ntdll!_chkstk+26>:       pop    %ecx
   0x774b5f43 <ntdll!_chkstk+27>:       xchg   %eax,%esp
   0x774b5f44 <ntdll!_chkstk+28>:       mov    (%eax),%eax
   0x774b5f46 <ntdll!_chkstk+30>:       mov    %eax,(%esp)
   0x774b5f49 <ntdll!_chkstk+33>:       ret
   0x774b5f4a <ntdll!_chkstk+34>:       sub    $0x1000,%eax
   0x774b5f4f <ntdll!_chkstk+39>:       test   %eax,(%eax)
   0x774b5f51 <ntdll!_chkstk+41>:       jmp    0x774b5f3c <ntdll!_chkstk+20>
   0x774b5f53 <ntdll!_chkstk+43>:       nop
   0x774b5f54 <ntdll!_chkstk+44>:       nop
   0x774b5f55 <ntdll!_chkstk+45>:       nop
   0x774b5f56 <ntdll!_chkstk+46>:       nop
   0x774b5f57 <ntdll!_chkstk+47>:       nop
   0x774b5f58 <ntdll!_chkstk+48>:       nop
   0x774b5f59 <ntdll!_chkstk+49>:       nop

@rsc
Copy link
Contributor

@rsc rsc commented Mar 10, 2014

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 10, 2014

Comment 22:

Russ,
Remember we had similar problem on windows/amd64? So we introduced
runtime·setstacklimits. Perhaps we can do the same for 386?
Alex

@rsc
Copy link
Contributor

@rsc rsc commented Mar 10, 2014

Comment 23:

Thanks for the pointer to setstacklimits. I was trying to find it earlier but
misremembered it being in the 386 port. I tried setting the bounds in
install_exception_handler and that produced reliable crashes much earlier than before. I
think the problem is that Windows uses those numbers to decide when it is okay to
allocate more stack on demand, so if they are lying and Windows faults on the OS stack,
Windows thinks something terrible happened instead of just growing the stack. We could
flip them back when running on the OS stack, of course, but that's a more involved
change to try.
If we find something like this that fixes the divmod problem, then maybe we just back
away slowly and call it fixed, but I'm still a bit worried about the inversion of memory
addresses along the SEH chain. I expect that is involved somehow, but perhaps I'm wrong.
This is one of those times when on every other system you can just go to lxr/fxr and
look at the source code. Oh well.

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 11, 2014

Comment 24:

[2] http://blogs.msdn.com/b/cbrumme/archive/2003/10/01/51524.aspx
looks like what we need. We might even use it for both 386 / amd64. Is it something we
want to try and do now?
Alex

@rsc
Copy link
Contributor

@rsc rsc commented Mar 11, 2014

Comment 25:

The cbrumme blog post is very long. What specifically do you think we should do?

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 11, 2014

Comment 26:

Sorry, Russ, I meant to say we should use AddVectoredExceptionHandler just like in
[3] http://msdn.microsoft.com/en-us/magazine/cc301714.aspx
We should be able to use it for for both 386 and amd64. Define handler once per program:
will work for all threads and callbacks. Hopefully we don't have to worry about
AddVectoredExceptionHandler stack, like we do here.
Maybe we can even leave most of current SEH code alone. It can get called for win2k. It
is broken, but it was broken before.
Alex

@rsc
Copy link
Contributor

@rsc rsc commented Mar 11, 2014

Comment 27:

Using AddVectoredExceptionHandler sounds good to me. If it works I'd like to remove the
SEH code instead of leaving it for win2k. It won't be tested and it has been a source of
problems in the past (and continues to be). Much better to pull it out and stop wasting
our development resources maintaining code for systems that even Microsoft doesn't
support anymore.

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 12, 2014

Comment 28:

Russ,
Sounds like a plan. I will try and start working on this (you've wasted enough time on
this) when I have time. I think this can wait (maybe disable broken test), but ,if you
think this urgent, then do it yourself.
Alex

@rsc
Copy link
Contributor

@rsc rsc commented Mar 12, 2014

Comment 29:

Thanks Alex. I am trying to get the builders unflaky so that I can depend
on them when making some other changes, so I looked into doing this. It was
pretty quick to set up. CL 74790043.

@alexbrainman
Copy link
Member Author

@alexbrainman alexbrainman commented Mar 14, 2014

Comment 30:

A diff related to https://golang.org/cl/74790043/
Alex

Attachments:

  1. a.diff (18371 bytes)

@rsc
Copy link
Contributor

@rsc rsc commented Mar 25, 2014

Comment 31:

This issue was closed by revision 3750904.

Status changed to Fixed.

@rsc rsc added this to the Go1.3 milestone Apr 14, 2015
@rsc rsc removed the release-go1.3 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants