Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: failure in TestRaiseException on windows-amd64-2012 #49681

Open
bcmills opened this issue Nov 19, 2021 · 34 comments
Open

runtime: failure in TestRaiseException on windows-amd64-2012 #49681

bcmills opened this issue Nov 19, 2021 · 34 comments
Labels
ExpertNeeded NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Nov 19, 2021

--- FAIL: TestRaiseException (0.02s)
    crash_test.go:106: C:\Users\gopher\AppData\Local\Temp\1\go-build1560626842\testprog.exe RaiseException exit status: exit status 2989
    syscall_windows_test.go:636: No stack trace: 
FAIL
FAIL	runtime	68.703s

greplogs --dashboard -md -l -e 'FAIL: TestRaiseException'

2021-11-18T02:19:50-f1cc529/windows-amd64-2012

The test dates from 2014. I can find only one such faillure in the logs, and it is very recent — marking as release-blocker until we can determine whether this is a regression (CC @bufflig).

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows release-blocker labels Nov 19, 2021
@bcmills bcmills added this to the Go1.18 milestone Nov 19, 2021
@toothrot toothrot added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Nov 19, 2021
@bcmills
Copy link
Member Author

bcmills commented Nov 23, 2021

One more this week. Looking like a regression of some sort.

greplogs --dashboard -md -l -e 'FAIL: TestRaiseException' --since=2021-11-19

2021-11-22T22:36:03-7456b94/windows-amd64-2012

@bufflig
Copy link
Contributor

bufflig commented Nov 24, 2021

Hmmm, this might have been flaky for a while. I didn't realize it then, as I thought it was related to the crashdump change and it "went away" after a sync, but it was probably a flakiness I saw. My guess is that the recent regression is caused by changed timing due to the scavenger changes, but that's just a guess. The flakiness should be fixed anyway, I'll put it on the list.

@bufflig bufflig self-assigned this Nov 24, 2021
@alexbrainman
Copy link
Member

alexbrainman commented Nov 25, 2021

crash_test.go:106: C:\Users\gopher\AppData\Local\Temp\1\go-build1560626842\testprog.exe RaiseException exit status: exit status 2989
syscall_windows_test.go:636: No stack trace:

I spent some time staring at this. 2989 is 0xbad from

proc.Call(0xbad, EXCEPTION_NONCONTINUABLE, 0, 0)

I also run runtime.TestRaiseException and it would not fail on me. I also added t.Logf in the test to print stacktrace, and I can see that when test succeeds, the child process always returns 2.

Go runtime calls runtime.exit(2) from our Windows exception handler.

I suspect in this failure runtime.exit(2) call was not reached. Either runtime crashed in exception handler, or Windows did not call our exception handler for some reason.

Alex.

@bufflig
Copy link
Contributor

bufflig commented Nov 29, 2021

I was suspecting that something was spooky here
The exception technically comes from non-go code, so I was wondering if that's simply the reason we don't get a stack trace and - for some reason I don't understand - the test for non-go-code did not come to the same conclusion before? But I could completely misunderstand it too. When I tested, an exception I caused "directly" by e.g. reading a protected memory page gave stack dumps, but when I raised the exception by calling RaiseException, it became "flaky" somewhere between 1.17 and 1.18.

@alexbrainman
Copy link
Member

alexbrainman commented Dec 1, 2021

I was suspecting that something was spooky here
The exception technically comes from non-go code, so I was wondering if that's simply the reason we don't get a stack trace and - for some reason I don't understand - the test for non-go-code did not come to the same conclusion before? But I could completely misunderstand it too.

I don't understand what you are saying. runtime.isgoexception should always return false in our scenario. Because the function only returns true if exception code is _EXCEPTION_ACCESS_VIOLATION, _EXCEPTION_INT_DIVIDE_BY_ZERO and other consts in that case statement. But we pass 0xbad value to RaiseException, and 0xbad does not match any of the consts in the case.

When I tested, an exception I caused "directly" by e.g. reading a protected memory page gave stack dumps, but when I raised the exception by calling RaiseException, it became "flaky" somewhere between 1.17 and 1.18.

I guess, your testing confirms that we do have new problem here. How do I reproduce your "flaky" testing?

Thank you.

Alex

@bufflig
Copy link
Contributor

bufflig commented Dec 2, 2021

I was suspecting that something was spooky here
The exception technically comes from non-go code, so I was wondering if that's simply the reason we don't get a stack trace and - for some reason I don't understand - the test for non-go-code did not come to the same conclusion before? But I could completely misunderstand it too.

I don't understand what you are saying. runtime.isgoexception should always return false in our scenario. Because the function only returns true if exception code is _EXCEPTION_ACCESS_VIOLATION, _EXCEPTION_INT_DIVIDE_BY_ZERO and other consts in that case statement. But we pass 0xbad value to RaiseException, and 0xbad does not match any of the consts in the case.

You're totally right and I'm clearly wrong. I think you didn't understand what I was saying because what I was saying made no sense :) I was specifically throwing EXCEPTION_ACCESS_VIOLATION in my test program, why my theory was potentially a little less out there back then...

When I tested, an exception I caused "directly" by e.g. reading a protected memory page gave stack dumps, but when I raised the exception by calling RaiseException, it became "flaky" somewhere between 1.17 and 1.18.

I guess, your testing confirms that we do have new problem here. How do I reproduce your "flaky" testing?

I just used a small program calling RaiseException through cgo:

package main

// #include <windows.h>
// #include <stdio.h>
// #include <stdlib.h>
// #include <errhandlingapi.h>
//
// static void raise() {
//   RaiseException(EXCEPTION_ACCESS_VIOLATION, 0,0,0);
// }
import "C"

import (
	"time"
)

func main() {
	time.Sleep(time.Second)
	C.raise()
	return
}

It became much less probable to succeed in dumping a stack trace during 1.18 development, but as you can see, it already needed a sleep to work to begin with, so it wasn't water-proof from the start.

However, after trying to reproduce now, I think it's a red herring, as after @randall77 's https://go-review.googlesource.com/c/go/+/364174, the program above consistently works, even without the sleep. In fact, after that change, none of my sometimes flaky tests programs trying to create exceptions in different ways ever fail to stack dump. That makes no sense to me, looking at the change itself. And also, it's completely opposite to what shows up in the builders, it rather started happening in the builders after that change, or a few days after.

So now I have no idea, not even a bad one.

Thank you.

Alex

Thanks!

Patrik

@alexbrainman
Copy link
Member

alexbrainman commented Dec 7, 2021

I just used a small program calling RaiseException through cgo:

runtime.TestRaiseException does not uses cgo. I expect cgo and no-cgo runtimes are very different from each other. If your small cgo program works fine (dumps stack trace every time), that does not prove anything.

Alex

@bufflig
Copy link
Contributor

bufflig commented Dec 7, 2021

Yea, probably not. I'll see if I can make it happen on a gomote with the testcase itself, to see if there's any way to reproduce.

@bufflig
Copy link
Contributor

bufflig commented Dec 8, 2021

I managed to get some repros with some logging. Not conclusive in any way, but, as you mostly had concluded already, it does reach exceptionhandler (in signal_windows.go), which does return _EXCEPTION_CONTINUE_SEARCH, but then neither firstcontinuehandler nor lastcontinuehandler are called, as far as I can see they are not crashing, they are not called at all.

@bufflig
Copy link
Contributor

bufflig commented Dec 8, 2021

I don't know if it's really relevant, but it appears the change that I mentioned above, 364174, also is makes this error much more probable. I've not been able to reproduce it once before that change, but by spamming the gomotes, I can reproduce within a couple of minutes after that change. I still have no clue why that change would affect exception handling, of course. It might just change the timing or something.

@cherrymui
Copy link
Member

cherrymui commented Dec 8, 2021

It seems that in a successful run, exceptionhandler is called first, returning as isgoexception is false, then firstcontinuehandler is called, then lastcontinuehandler is called which prints the stack trace. In a failed run, though, exceptionhandler is called and returned as usual, but firstcontinuehandler and lastcontinuehandler are not called. Instead, it exits with error code 2989 (0xbad, instead of the usual 2). I don't know why the continue handlers are not called some time.

@bufflig
Copy link
Contributor

bufflig commented Dec 8, 2021

Yes. I don't know if the trampoline functions in asm are called though. There is some code there beyond my comprehension in sys_windows_amd64:sigtramp (no surprise there :)), also using tls, which was very vaguely related to the change 364174. It does do some stuff to make sure it's running on g0 stack, I don't know if that could be where the flakiness comes from?

@cherrymui
Copy link
Member

cherrymui commented Dec 8, 2021

I think sigtramp is also not called (for firstcontinuehandler and lastcontinuehandler). I added some code to print in sigtramp and I didn't see it print.

Yeah, I also cannot reproduce before CL 364174 or at tip just reverting CL 364174. Weird. That code runs very early at startup. I wonder how it could cause nondeterministic failure.

It also seems to fail more likely if I set GOGC=1. I tried disabling async preemption but it doesn't seem to have any effect.

@bufflig
Copy link
Contributor

bufflig commented Dec 8, 2021

Ah, interesting. I tried (somewhat arbitrarily) to re-register the exception handlers before returning _EXCEPTION_CONTINUE_SERACH from exceptionhandler, that did not affect what happened at all (not sure what I expected though). It almost seems like the return value seen by windows exception handling mechanism is not seeing _EXCEPTION_CONTINUE_SEARCH, but something else?

@cherrymui
Copy link
Member

cherrymui commented Dec 8, 2021

It seems if I increase the 5*8 to 5*8+7 at https://cs.opensource.google/go/go/+/master:src/runtime/asm_amd64.s;l=152 (it was 4*8+7 before that CL), I can no longer reproduce the failure. Reading the code I don't see where it clobbers anything on the stack, though. Maybe there was something right below SP at the entry of rt0_go and should not be clobbered?

@bufflig
Copy link
Contributor

bufflig commented Dec 8, 2021

Heh, there's something going on there, definitely. If I make that specific change, the cgo program that started emitting stack dumps after 364174 stopped dumping stack dumps again. The change on that line clearly affects exception handlers, but I have no idea in which way or why. Can it just be some arbitrary memory corruption going on? The stack being slightly different just makes one thing more probable than another, but in fact, something else is broken?

I'm completely confused by the switching between ABIs in sigtramp, and I don't really get what ABI is used to call the Go code. Is it called using ABI0? What makes it not being called using ABIInternal? Wrappers? I might need an ABI for dummies lesson :)

@cherrymui
Copy link
Member

cherrymui commented Dec 8, 2021

firstcontinuetramp is called using the system ABI. firstcontinuetramp passes the PC of the ABI0 symbol (as there is no <ABIInternal> marker) of firstcontinuehandler to sigtramp, so the ABI0 function is called. That function is actually a wrapper, which calls the actual ABIInternal Go function firstcontinuehandler, and returns its value back in ABI0.

@cherrymui
Copy link
Member

cherrymui commented Dec 9, 2021

I added some print and it doesn't seem we clobbered anything. For both success run and failed run the word right below SP at entry of rt0_go is 0, and in both case we stored the same value there (this store https://cs.opensource.google/go/go/+/master:src/runtime/asm_amd64.s;l=155), and the first a few words above that are never changed and the same for both cases. So it is not that we clobbered something on that stack. Still weird.

@bufflig
Copy link
Contributor

bufflig commented Dec 9, 2021

Thanks for checking and explaining the ABI and the wrapper! At least now I understand sigtramp, which seems to do what's expected. This is indeed a weird one...

@cherrymui
Copy link
Member

cherrymui commented Dec 9, 2021

I also tried to change sigtramp to always return 0 (which is not always correct but is all what we need for this test) and it can still fail. So it is not the return value.

@bufflig
Copy link
Contributor

bufflig commented Dec 10, 2021

Yea, I printed out AX at the end of sigtramp and it showed the same thing. So I can't see anyway we're returning the wrong value. Unfortunately...

I don't know what's up here. One thing is that the ones that are not run are "ContinueHandlers" and the one that's run is an "ExceptionHandlers". They are just run in sequence according to the docs I could find, but there might be some criteria for not running the continuehandlers that we hit. I couldn't find anything and the docs are sparse. MS docs I found basically say exceptionhandlers run, then if the last one returns EXCEPTION_CONTINUE_SEARCH, continuehandlers are run. But maybe there's some other criteria, like the stack has to look in a certain way (which would be kind of strange as they allegedly get their own stack frame). Maybe someone with access to the source could shed some light? @jstarks - do you by any chance have time to look at if there's some criteria that will skip continuehandlers?

Also I've had problems reproducing with the latest dev version lately, there's absolutely zero stuff that should affect this checked in lately as I can see. I will keep digging.

@cherrymui cherrymui removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 14, 2021
@hanchaoqun
Copy link
Contributor

hanchaoqun commented Dec 17, 2021

@cherrymui @bufflig
The root cause of the problem may be this scenario:
The ring3 of windows x64 has three kinds of exception handling mechanisms, VEH (RtlCallVectoredExceptionHandlers), SEH(RtlpUnwindInternal), and VCH(RtlCallVectoredContinueHandlers). Since the isgoexception always return false,so the first chance to run VCH never happens,and SEH is called between VEH and second VCH (the second call to RtlCallVectoredContinueHandlers in the RtlDispatchException). When processing SEH, the function stack needs to be Unwinded according to the current ContextRecord RIP and RSP, but due to the goroutine's RSP may not be able to satisfy the Unwind operation of windows, and thus abnormal nesting occurs (RtlpUnwindInternal raises another uncatchable exception), causing the process to crash prematurely in SEH, and VCH will not be executed.
Reference: https://doxygen.reactos.org/d6/dea/sdk_2lib_2rtl_2amd64_2except_8c_source.html#l00087

 BOOLEAN
 NTAPI
 RtlDispatchException(
     _In_ PEXCEPTION_RECORD ExceptionRecord,
     _In_ PCONTEXT ContextRecord)
 {
     BOOLEAN Handled;
 
     /* Perform vectored exception handling for user mode */
     if (RtlCallVectoredExceptionHandlers(ExceptionRecord, ContextRecord))
     {
         /* Exception handled, now call vectored continue handlers */
         RtlCallVectoredContinueHandlers(ExceptionRecord, ContextRecord);
 
         /* Continue execution */
         return TRUE;
     }
 
     /* Call the internal unwind routine */
     Handled = RtlpUnwindInternal(NULL, // TargetFrame
                                  NULL, // TargetIp
                                  ExceptionRecord,
                                  0, // ReturnValue
                                  ContextRecord,
                                  NULL, // HistoryTable
                                  UNW_FLAG_EHANDLER);
 
     /* In user mode, call any registered vectored continue handlers */
     RtlCallVectoredContinueHandlers(ExceptionRecord, ContextRecord);
 
     return Handled;
 }

I did the following test:
In the exceptionhandler function , and after"isgoexception" call return, modified the RIP, RBP & RSP of the exception's context:

// ADD FOR TEST
// Read Stackbase from PEB
TEXT runtime·getStackBase(SB),NOSPLIT,$0
	MOVQ	0x8(GS), AX
	MOVQ	AX, ret+0(FP)
	RET

// ADD FOR TEST
// Read StackLimit from PEB
TEXT runtime·getStackLimit(SB),NOSPLIT,$0
	MOVQ	0x10(GS), AX
	MOVQ	AX, ret+0(FP)
	RET

...
func exceptionhandler(info *exceptionrecord, r *context, gp *g) int32 {
	if !isgoexception(info, r, gp) {
                // ADD FOR TEST
		r.rip = 0
		r.rbp = getStackBase()
		r.rsp = getStackLimit()
		return _EXCEPTION_CONTINUE_SEARCH
	}
...
}

The behavior will meet expectations, but this modification is obviously not right for fixing this issue, and I am still working on the repair plan...

@hanchaoqun
Copy link
Contributor

hanchaoqun commented Dec 20, 2021

I have dumped the TIB&PEB and also the VEH&VCH before call to RaiseException(EXCEPTION_ACCESS_VIOLATION, 0,0,0), see the stackBase/stackLimit is differ greatly with the EXCEPTION RSP (eg. RSP=0xc000047f30):

C PEB: 0xbd31221000
C CrossProcessFlags: 0xc 
C CrossProcessFlags: 0xc
C exceptionList: 0x0
C stackBase: 0xbd31600000
C stackLimit: 0xbd315fd000
ntdll: 0x00007FF9539B0000
RtlpAddVectoredHandler: 0x00007FF953A31680
RtlpAddVectoredHandler off1: 0x00007FF953A31690
RtlpAddVectoredHandler off2: 0x00007FF953A317A4
VCH&VEH list is: 0x00007FF953B2F3E8
ProcessCookie : 0x00000000F7F2A674
VEH First exception entry: 0x0000022F60D66800
VEH Last  exception entry: 0x0000022F60D66800
VCH First continue  entry: 0x0000022F60D66830
VCH Last  continue  entry: 0x0000022F60D66880
Dumping the entries:
VEH : 0x00007FF953B2F3F0
  VEH handler = 0x0000022F60D60840  (exceptiontramp)
VCH : 0x00007FF953B2F408
  VEH handler = 0x0000022F60D66860  (firstcontinuetramp)
  VEH handler = 0x0000022F60D66AC0 (lastcontinuetramp)

@hanchaoqun
Copy link
Contributor

hanchaoqun commented Dec 21, 2021

Perhaps we can add an additional VEH to the current test case, always returns EXCEPTION_CONTINUE_SEARCH, which triggers the first call to VCH (before RtlpUnwindInternal is called, prevent being interrupted by RtlpUnwindInternal) ?

testprog/syscall_windows.go:

//go:linkname stdcall2 runtime.stdcall2
func stdcall2(fn unsafe.Pointer, a0, a1 uintptr) uintptr

//go:linkname _AddVectoredExceptionHandler runtime._AddVectoredExceptionHandler
var _AddVectoredExceptionHandler unsafe.Pointer

func lastexceptiontramp()

func lastexceptiontramphandler(info uintptr, r uintptr, gp uintptr) int32 {
	return -0x1
}

func RaiseException() {
	stdcall2(_AddVectoredExceptionHandler, 0, abi.FuncPCABI0(lastexceptiontramp))
	const EXCEPTION_NONCONTINUABLE = 1
	mod := syscall.MustLoadDLL("kernel32.dll")
	proc := mod.MustFindProc("RaiseException")
	proc.Call(0xbad, EXCEPTION_NONCONTINUABLE, 0, 0)
	println("RaiseException should not return")
}

testprog/syscall_windows_amd64.s:

TEXT main·lastexceptiontramp(SB),NOSPLIT|NOFRAME,$0
	PUSH_REGS_HOST_TO_ABI0()
	ADJSP	$48
	
	MOVQ	0(CX), BX // ExceptionRecord*
	MOVQ	8(CX), CX // Context*
	MOVQ	BX, 0(SP)
	MOVQ	CX, 8(SP)
	MOVQ	$0, 16(SP)
	CALL    main·lastexceptiontramphandler(SB)
	MOVL	24(SP), AX
done:
	ADJSP	$-48
	POP_REGS_HOST_TO_ABI0()

	RET
	

@heschi
Copy link
Contributor

heschi commented Jan 12, 2022

Ping: this is marked as a release blocker. Any news?

@bcmills
Copy link
Member Author

bcmills commented Jan 12, 2022

@hanchaoqun, you had some suggestions on a possible fix — do you have the bandwidth to mail a CL for review?

This has still been reproducing at a fairly high rate on the windows-amd64-2012 builder specifically; curiously, it has not reproduced on other Windows builders.

greplogs --dashboard -md -l -e 'FAIL: TestRaiseException' --since=2021-11-24

2021-12-02T15:48:58-28ec0fd/windows-amd64-2012
2021-12-02T05:25:23-00dbcb3/windows-amd64-2012
2021-12-01T17:13:34-08ecdf7/windows-amd64-2012
2021-11-29T22:57:42-3ca57c7/windows-amd64-2012
2021-11-29T22:02:50-bc32dd1/windows-amd64-2012
2021-11-24T20:53:54-a3b8f62/windows-amd64-2012

@bufflig
Copy link
Contributor

bufflig commented Jan 12, 2022

As far as I understand it, this is not really a new problem. This would just have "happened" to work before and now "happens" not to work, so even though I think it's something that should be fixed, I'm not sure it's a release blocker.

@hanchaoqun - do you agree? is it a correct assumption that this is not really a new problem?

@bcmills
Copy link
Member Author

bcmills commented Jan 12, 2022

even though I think it's something that should be fixed, I'm not sure it's a release blocker.

I think that's reasonable, but if we go that route should we add a testenv.SkipFlaky on the windows-amd64-2012 builder to reduce noise on the dashboard in the meantime?

Better still, if they test only happened to work before is there something we can do to increase the failure rate on the other builders? (IMO a skipped test with a high failure rate is more valuable than a flaky one with a low failure rate.)

@bufflig
Copy link
Contributor

bufflig commented Jan 13, 2022

even though I think it's something that should be fixed, I'm not sure it's a release blocker.

I think that's reasonable, but if we go that route should we add a testenv.SkipFlaky on the windows-amd64-2012 builder to reduce noise on the dashboard in the meantime?

Sounds reasonable.

Better still, if they test only happened to work before is there something we can do to increase the failure rate on the other builders? (IMO a skipped test with a high failure rate is more valuable than a flaky one with a low failure rate.)

I can't come up with a way to increase the probability either, unfortunately. Maybe someone else has some idea?

@gopherbot
Copy link

gopherbot commented Jan 13, 2022

Change https://golang.org/cl/378254 mentions this issue: testing: skip flaky TestRaiseException on windows-amd64-2012

@hanchaoqun
Copy link
Contributor

hanchaoqun commented Jan 13, 2022

@bufflig, @bcmills , yes, i agree this is not a new problem, I think it may be caused by the fact that the specific version of windows is not friendly enough to the go stack. So I think it is not stable to throw the exception of c in lastcontinuehandler, maybe we should throw the exception and print the stack immediately when we find the exception of c in the exceptionhandler, no longer rely on VCH, do you agree?
This problem can be stably reproduced by manually adjusting the stack (position of rip and rsp)

func exceptionhandler(info *exceptionrecord, r *context, gp *g) int32 {
	if !isgoexception(info, r) {
		sp := unsafe.Pointer(uintptr(r.rsp))
		for true {
			vsp := uint64(*((*uintptr)(sp)))
			if  vsp < getStackBase() && vsp >= getStackLimit() {
				r.rip = *((*uint64)(unsafe.Pointer(uintptr(sp) - 8)))
				r.rsp = *((*uint64)(unsafe.Pointer(sp)))
			}
			delta := uintptr(sys.StackAlign)
			sp = add(sp, delta)
			if uint64(uintptr(sp)) > uint64(uintptr(add(unsafe.Pointer(r.sp()), 8*50))) ||
				uint64(uintptr(sp)) >= getStackBase() {
				sp = add(sp, -delta)
				break
			}
		}
		return _EXCEPTION_CONTINUE_SEARCH
	}
        ...
D:\gobuild\golang\go1.18\src\runtime>go test -run=^TestRaiseException$ -v
=== RUN   TestRaiseException
    crash_test.go:124: running go build -o C:\Users\han\AppData\Local\Temp\go-build1862937472\testprog.exe
    testenv.go:361: [C:\Users\han\AppData\Local\Temp\go-build1862937472\testprog.exe RaiseException] exit status: exit status 2989
    syscall_windows_test.go:636: No stack trace:
--- FAIL: TestRaiseException (0.94s)
FAIL
exit status 1
FAIL    runtime 0.991s

gopherbot pushed a commit that referenced this issue Jan 13, 2022
This is in relation to #49681

Change-Id: I32ad8b506cf8fb0a94b15c3cc8b1eaf5af728c59
Reviewed-on: https://go-review.googlesource.com/c/go/+/378254
Run-TryBot: Patrik Nyblom <pnyb@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Trust: Patrik Nyblom <pnyb@google.com>
@bufflig
Copy link
Contributor

bufflig commented Jan 14, 2022

@hanchaoqun Yea, I was wondering if dumping the stack trace directly when !isgoexception was possible. I don't really know the reason we're using both exceptionhandlers and continuehandlers, but I assume there's a reason. Anyone has any knowledge of the history behind that?

@bcmills - I did add skipping the test case as flaky, should we remove the release blocker tag?

@bcmills bcmills modified the milestones: Go1.18, Backlog Jan 14, 2022
@alexbrainman
Copy link
Member

alexbrainman commented Jan 23, 2022

@bufflig

... I was wondering if dumping the stack trace directly when !isgoexception was possible. ​I don't really know the reason we're using both exceptionhandlers and continuehandlers, but I assume there's a reason. Anyone has any knowledge of the history behind that?

Do you mean printing stack trace and exiting in exceptionhandler function? I don't see how that is possible.

exceptionhandler receive ALL process exceptions. Some of these exceptions are raised and handled in external to Go code.

For example, DLL raises some exception, and then DLL exception handling code handles this exception and stops passing that exception further up the exception chain. Currently exceptions like that are ignored by Go runtime. Do you propose we crash the Go process instead?

I hope that explains well enough.

Alex

@bufflig
Copy link
Contributor

bufflig commented Jan 24, 2022

Ah, right - I see how my idea/thought would not work. Thanks!

jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
This is in relation to golang#49681

Change-Id: I32ad8b506cf8fb0a94b15c3cc8b1eaf5af728c59
Reviewed-on: https://go-review.googlesource.com/c/go/+/378254
Run-TryBot: Patrik Nyblom <pnyb@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Trust: Patrik Nyblom <pnyb@google.com>
@bufflig bufflig removed their assignment Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExpertNeeded NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

8 participants