Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: possible memory corruption caused by CL 304470 "cmd/compile, runtime: add metadata for argument printing in traceback" #49075

Open
katiehockman opened this issue Oct 19, 2021 · 88 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@katiehockman
Copy link
Contributor

katiehockman commented Oct 19, 2021

OSS-Fuzz reported an issue a few weeks ago that we suspect is memory corruption caused by the runtime. This started on August 16th, so is likely a Go 1.17 issue.

A slice bounds out of range issue is being reported from calls to regexp.MustCompile(,\s*).Split
However, this is not reproducible with the inputs provided by OSS-Fuzz, so we expect something else is going on.

Below are some of the panic logs:

panic: runtime error: slice bounds out of range [:18416820578376] with length 59413

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c0001c801f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x61c
github.com/google/gonids.(*Rule).option(0x10c000068000, {0x100c000096970, {0x10c0001c8016, 0x8}}, 0x10c00029a040)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c0001c8000, 0x630000350400}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x630000350400, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.1689543426.go:21

panic: runtime error: slice bounds out of range [628255583:13888]

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c00033601f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x617
github.com/google/gonids.(*Rule).option(0x10c00026cc00, {0x100c00026e190, {0x10c000336016, 0x7}}, 0x10c0001a4300)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c000336000, 0x62f00064a400}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x62f00064a400, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.1689543426.go:21
AddressSanitizer:DEADLYSIGNAL

panic: runtime error: slice bounds out of range [473357973:29412]

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c0002a001f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x617
github.com/google/gonids.(*Rule).option(0x10c0001b0180, {0x100c000280100, {0x10c0002a0016, 0xb}}, 0x10c0001ae040)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c0002a0000, 0x632000930800}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x632000930800, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.1689543426.go:21

From rsc@:

The relevant code is processing the [][]int returned from regexp.(*Regexp).FindAllStringIndex.
That [][]int is prepared by repeated append:

func (re *Regexp) FindAllStringIndex(s string, n int) [][]int {
    if n < 0 {
        n = len(s) + 1
    }
    var result [][]int
    re.allMatches(s, nil, n, func(match []int) {
        if result == nil {
            result = make([][]int, 0, startSize)
        }
        result = append(result, match[0:2])
    })
    return result
}

Each of the match[0:2] being appended is prepared in regexp.(*Regexp).doExecute by:

dstCap = append(dstCap, m.matchcap...)

appending to a zero-length, non-nil slice to copy m.matchcap.

And each of the m.matchcap is associated with the *regexp.machine m, which is kept in a sync.Pool for reuse.

The specific corruption is that the integers in the [][]int are clear non-integers (like pointers),
which suggests that either one of the appends is losing the reference accidentally during GC
or something in sync.Pool is wonky.

This could also be something strange that OSS-Fuzz is doing, and doesn't necessarily represent a real-world use case.

/cc @golang/security

@katiehockman katiehockman added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 19, 2021
@randall77
Copy link
Contributor

randall77 commented Oct 19, 2021

How often does this happen? Is there any way we could reproduce?
If reproducible, we could turn off the sync.Pool usage and see if anything changes.

@cherrymui cherrymui added this to the Go1.18 milestone Oct 19, 2021
@rsc
Copy link
Contributor

rsc commented Oct 20, 2021

@catenacyber says it is still happening multiple times a day on OSS-Fuzz.

Philippe, do you have any hints about an easy way to get a reproduction case on our own machines?
I looked briefly at the instructions for running oss-fuzz itself and they were a bit daunting.

@catenacyber
Copy link
Contributor

catenacyber commented Oct 20, 2021

How often does this happen?

Around 50 times a day on oss-fuzz

Is there any way we could reproduce?

I did not manage to reproduce it myself, did not try very hard though...

I looked briefly at the instructions for running oss-fuzz itself and they were a bit daunting.

Well, the hard thing is that this bug does not reproduce for a specific input.
But running it should be ok cf https://google.github.io/oss-fuzz/getting-started/new-project-guide/#testing-locally
That is

  • install docker
  • cd /path/to/oss-fuzz
  • python infra/helper.py build_image gonids
  • python infra/helper.py build_fuzzers gonids
  • python infra/helper.py run_fuzzer --corpus-dir=<path-to-temp-corpus-dir> gonids fuzz_parserule

Then, I guess you need to wait one hour, and relaunch the fuzzer if it did not trigger the bug, until it does

we could turn off the sync.Pool usage and see if anything changes.

Is there some environment variable to do so ?

@catenacyber
Copy link
Contributor

catenacyber commented Oct 20, 2021

Maybe oss-fuzz uses -fork=2 as an extra argument to run_fuzzer

@randall77
Copy link
Contributor

randall77 commented Oct 20, 2021

we could turn off the sync.Pool usage and see if anything changes.

Is there some environment variable to do so ?

No, you'd have to edit the code to replace pool allocations with new or make.

@catenacyber
Copy link
Contributor

catenacyber commented Oct 20, 2021

Is there some environment variable to do so ?

No, you'd have to edit the code to replace pool allocations with new or make.

so rebuild the standard library ?

@randall77
Copy link
Contributor

randall77 commented Oct 20, 2021

Just edit it, rebuild will be automatic.

@catenacyber
Copy link
Contributor

catenacyber commented Oct 20, 2021

So, I did google/oss-fuzz#6623 with regex.go not using sync package

@rsc
Copy link
Contributor

rsc commented Oct 20, 2021

google/oss-fuzz#6623 looks worth a shot. Thanks.

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

It looks like the bug is still happening but much less often.

One last stack trace is

panic: runtime error: slice bounds out of range [:107271103185152] with length 45246

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c00010801f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1260 +0x61c
github.com/google/gonids.(*Rule).option(0x10c00034a180, {0x100c0007c4350, {0x10c000108016, 0x6}}, 0x10c000334080)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c000108000, 0x62e0000d8400}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x62e0000d8400, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.3230035416.go:21
AddressSanitizer:DEADLYSIGNAL

regexp.go:1260 seems to prove that this is the modified regex.go file without sync right ?

Any more clues ?
Any more debug assertions to insert in regexp.go ?

@josharian
Copy link
Contributor

josharian commented Oct 22, 2021

This started on August 16th, so is likely a Go 1.17 issue.

Could you bisect to a particular commit that introduced the corruption?

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

Could you bisect to a particular commit that introduced the corruption?

oss-fuzz uses latest Golang release, so they switched from 1.16.x to 1.17 on August 16th, but we do not know which commit exactly in this major release induced the buggy behavior...

@randall77
Copy link
Contributor

randall77 commented Oct 22, 2021

I have another possible theory:

regexp.(*Regexp).Split(0x10c0000b2640, {0x10c0001c801f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x61c

Is there any way you can get at the regexp and the contents of the string? In this case, 0x10c0000b2640 is the address of a Regexp, the first field of which is a string I'd like to see. Also, we'd need the very long string at address 0x10c0001c801f for 0x76dbc0 bytes (7MB+!).
If you have any way for us to reproduce that execution state, we could grab those values ourselves.

My theory is that regexp is using the code in internal/bytealg pretty hard, and maybe it's tripping up on some weird corner case. The string pointers always end in 0x1f, which is maybe one of those corner cases. That might explain the intermittency - if the same test case gets allocated in a different place, it might not trigger the bug.
Not sure why we would hit it only for 1.17. https://go-review.googlesource.com/c/go/+/310184 seems to be the only CL of note in internal/bytealg for 1.17, and it looks reasonable to me.

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

Put the hex encoded version of one string here :
https://catenacyber.fr/stringhex.txt

@dgryski
Copy link
Contributor

dgryski commented Oct 22, 2021

If it's an alignment issue, could we try to reproduce by mmap'ing a large block and creating a string with all the different possible alignments / ending pointer bytes?

@randall77
Copy link
Contributor

randall77 commented Oct 22, 2021

What is the regexp that is being used?

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

regexp.MustCompile(,\s*).Split(fuzzedinput, -1)

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

So, there are kinds of a lot of answers to split....

@randall77
Copy link
Contributor

randall77 commented Oct 22, 2021

Nothing obvious with the string+regexp you posted. Putting it at different alignments doesn't seem to trigger anything.
The string you posted doesn't seem long enough, though. It is only 227927 bytes, and all the failure traces above have a string that is 0x76dbc0 = 7789504 bytes.

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

This string posted comes with stack trace

panic: runtime error: slice bounds out of range [699494522:64250]
--
  |  
  | goroutine 17 [running, locked to thread]:
  | regexp.(*Regexp).Split(0x10c0000b2640, {0x10c000ac201f, 0x962c00}, 0xffffffffffffffff)
  | regexp/regexp.go:1266 +0x617
  | github.com/google/gonids.(*Rule).option(0x10c000066000, {0x0, {0x10c000ac2016, 0x10c000044000}}, 0x10c0001dc000)
  | github.com/google/gonids/parser.go:675 +0x36c5
  | github.com/google/gonids.parseRuleAux({0x10c000ac2000, 0x37a78}, 0x0)
  | github.com/google/gonids/parser.go:942 +0x6b3
  | github.com/google/gonids.ParseRule(...)
  | github.com/google/gonids/parser.go:971
  | github.com/google/gonids.FuzzParseRule({0x7f369cb75800, 0x0, 0x10c000000601})
  | github.com/google/gonids/fuzz.go:20 +0x54
  | main.LLVMFuzzerTestOneInput(...)
  | ./main.3953748960.go:21

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

When running manually, it seems my string is always aligned on 16 bytes.like 0x10c0001185a0

@randall77
Copy link
Contributor

randall77 commented Oct 22, 2021

That doesn't seem right. The string posted is only 227927 bytes, but the stack trace shows that is is 9841664 bytes.

To unalign a string, do:

s2 := (strings.Repeat("*", i) + s)[i:]

@catenacyber
Copy link
Contributor

catenacyber commented Oct 22, 2021

So, the string must have been corrupted before the call to regexp.(*Regexp).Split

@randall77
Copy link
Contributor

randall77 commented Oct 22, 2021

I'm not sure I understand. How did you get the string, if not just dumping the 0x962c00 bytes starting at address 0x10c000ac201f ?

@catenacyber
Copy link
Contributor

catenacyber commented Oct 23, 2021

I'm not sure I understand. How did you get the string, if not just dumping the 0x962c00 bytes starting at address 0x10c000ac201f ?

oss-fuzz provides the stack trace, and the input that triggered it.
The input that triggers it, when run over gonids.FuzzParseRule ends up once at gonids.(*Rule).option to call regexp.(*Regexp).Split and that is where I got the string

@randall77
Copy link
Contributor

randall77 commented Oct 24, 2021

Hm, then I guess the values in the stack traces are incorrect. Which can happen, especially with regabi introduced in 1.17.
I don't have any ideas on how to figure out the problem without bisecting through the 1.17 commits.
I am unable to reproduce on my laptop. It never crashes for me...
Speaking of which, it's always possible that it is a bad machine. Have you reproduced on different machines?

Another thing to try, run with GODEBUG=cpu.all=off. That will disable most of the complicated bytealg code.

@thepudds
Copy link

thepudds commented Oct 24, 2021

I am unable to reproduce on my laptop. It never crashes for me...

FWIW, I was not able to reproduce either using the oss-fuzz local execution steps on a clean Linux VM (amd64, Ubuntu 20.04) following the directions outlined above in #49075 (comment), with 3 separate runs that totaled about 24 hours.

I also ran for about 24 hours using dvyukov/go-fuzz against the same gonids fuzz target, which also did not crash (although I don't think there is an enormous amount of signal from that result, given libFuzzer and go-fuzz have different process restart strategies, different mutations, different propensity for creating large inputs, and so on).

@catenacyber some questions for you:

  1. I am curious if you are able to reproduce in a clean environment using those steps you outlined for local execution?

  2. I wonder if some additional steps might be needed, such as perhaps downloading a snapshot of the oss-fuzz working corpus (vs. if I followed, the oss-fuzz local execution steps you outlined above results in only using a seed corpus downloaded from https://rules.emergingthreats.net?).

  3. I wonder if -max_len needs to be set, or alternatively, perhaps if the live oss-fuzz working corpus has larger inputs that push libFuzzer to use larger examples? Following the steps above, my local execution currently reports:

INFO: -max_len is not provided; libFuzzer will not generate inputs larger 
than 4096 bytes
INFO: seed corpus: files: 67174 min: 1b max: 3228b total: 19520296b rss: 57Mb

That said, it might be that size of inputs in the corpus is not meaningful if this turns out to be due to some corruption.

  1. If I followed, it looks like the live gonids Dockerfile on the oss-fuzz repo is using the attempted workaround of removing use of sync.Pool, which I think you said resulted in "It looks like the bug is still happening but much less often." If Keith or others are trying to reproduce, it might make sense to not use that attempted workaround when trying to reproduce locally in order to increase frequency?

In any event, sorry if anything I wrote is off base, but curious for your thoughts.

@cherrymui
Copy link
Member

cherrymui commented Jul 6, 2022

Looks like the regression ended up with commit 537cde0 cmd/compile, runtime: add metadata for argument printing in traceback as the culprit

I'm having a hard time to see how that CL can cause memory corruption. The code changed in that CL doesn't write to memory at run time, so it shouldn't corrupt anything. One exception is when runtime.Stack is called, but even so it just writes to the given buffer with bounds check via the builtin print function, which is not changed in that CL. Does the program call runtime.Stack (directly or indirectly)?

Maybe it is miscompilation? That CL only adds a new metadata, which shouldn't affect program execution. The only possibility would be the added metadata corrupts some other data in the binary. In that case, I would expect it reproduces more reliably, though.

Would it be possible to get one instance of failing binary and the source code? Maybe we could see if it is a miscompilation by inspecting the binary. Maybe we could run it over and over again locally to reproduce.

@catenacyber
Copy link
Contributor

catenacyber commented Jul 6, 2022

So, first, triple checking :

1 Looking at oss-fuzz builds cf https://oss-fuzz-build-logs.storage.googleapis.com/index.html#gonids

2 We can see that the build from July 4th https://oss-fuzz-build-logs.storage.googleapis.com/log-abf6907c-7bbb-48e5-b055-2addef901dc7.txt has Note: switching to 'd4aa72002e'.

3 The build from July 5th https://oss-fuzz-build-logs.storage.googleapis.com/log-fa1e00b7-57ab-4fe6-ac86-76a314a1c6c3.txt has `Note: switching to '537cde0b4b'.

4 Then on oss-fuzz https://oss-fuzz.com/testcase-detail/5194651375108096 (that may be restricted for an access, I can provide sceenshots), we see 68 crashes from 202206300609, 32 from 202207050612, and 0 from builds in between

5 If d4aa720 is good and 537cde0 is bad, as there is no commit in between, 537cde0 introduced this behavior

Does the program call runtime.Stack (directly or indirectly)?

git grep runtime returns no result for gonids
How can I check the indirectly way ?

As a fuzzer, the program is supposed to panic sometimes and print the stack trace, is that what runtime.Stack does ? (like the ones provided in this issue)

I do not know if this is relevant, but this fuzz target uses go routines (which is not common for fuzz targets)
And as a fuzzer, the golang is built as a static library with CGO and linked into a C++ program which has main with clang

Maybe it is miscompilation?

I will try to get one binary

@cherrymui
Copy link
Member

cherrymui commented Jul 6, 2022

How can I check the indirectly way ?

The easiest is probably go tool nm <binary> | grep "runtime\.Stack".

As a fuzzer, the program is supposed to panic sometimes and print the stack trace, is that what runtime.Stack does ? (like the ones provided in this issue)

Not exactly. When an unrecoverable panic occurs, it will print a stack trace, which uses the code added in that CL. But an unrecoverable panic would occur first, and that CL doesn't change whether such panic occurs.

I do not know if this is relevant, but this fuzz target uses go routines (which is not common for fuzz targets)
And as a fuzzer, the golang is built as a static library with CGO and linked into a C++ program which has main with clang

Could you be sure that the memory corruption is not caused by the user code, or by the C++ code? Have you tried to run under the race detector while fuzzing?

jonathanmetzman pushed a commit to google/oss-fuzz that referenced this issue Jul 7, 2022
@catenacyber
Copy link
Contributor

catenacyber commented Jul 7, 2022

How can I check the indirectly way ?

The easiest is probably go tool nm <binary> | grep "runtime\.Stack".

No results

Could you be sure that the memory corruption is not caused by the user code, or by the C++ code?

How could I get such an insurance ?

New fact : d4aa72002e is not really good, but the bug produces a different stack trace

panic: runtime error: slice bounds out of range [:107202383708288] with length 26259
--
  |  
  | goroutine 17 [running, locked to thread]:
  | regexp.(*Regexp).Split(0x10c0000b6640, 0x10c000176a9f, 0x774bc0, 0xffffffffffffffff)
  | regexp/regexp.go:1266 +0x61c
  | github.com/google/gonids.(*Rule).option(0x10c000084600, 0x1010000000050, 0x10c000176a96, 0x0, 0x7fb715876778)
  | github.com/google/gonids/parser.go:675 +0x3705
  | github.com/google/gonids.parseRuleAux(0x10c000176a80, 0x62b0000cb200, 0x6600)
  | github.com/google/gonids/parser.go:943 +0x6ce
  | github.com/google/gonids.ParseRule(...)
  | github.com/google/gonids/parser.go:972
  | github.com/google/gonids.FuzzParseRule(0x62b0000cb200, 0x0, 0x1)
  | github.com/google/gonids/fuzz.go:20 +0x54
  | main.LLVMFuzzerTestOneInput(...)
  | ./main.834012382.go:21
  | AddressSanitizer:DEADLYSIGNAL

So, I will be back to bisecting b05903a..d3853fb

@cherrymui
Copy link
Member

cherrymui commented Jul 7, 2022

How could I get such an insurance

Running the test under the race detector would be a step, if it works.

produces a different stack trace

The stack trace looks pretty much the same as the original one. It is the same stack of functions.

@bcmills
Copy link
Member

bcmills commented Jul 7, 2022

Running the test under the race detector would be a step, if it works.

If the bug really was introduced in that commit, would the race detector really help? It looks like many of the code changes in that commit are in the runtime package, which IIRC is not currently instrumented under the race detector.

[Edit: I guess you're saying that since the stack trace is the same stack of functions, it's not that commit, and so the race detector may still help.]

@cherrymui
Copy link
Member

cherrymui commented Jul 7, 2022

I don't think the bug is introduced in that CL. And the comment above #49075 (comment) also suggests that.

The race detector would be helpful to see if there is any issue in the user code. I don't think we are really clear that it is a runtime bug.

@catenacyber
Copy link
Contributor

catenacyber commented Jul 7, 2022

The stack trace looks pretty much the same as the original one. It is the same stack of functions.

Indeed, but oss-fuzz/clusterfuzz fails to parse it the same way because of missing braces {}in the regexp to parse them cf GOLANG_STACK_FRAME_FUNCTION_REGEX definition here https://github.com/google/clusterfuzz/blob/08f52cd1b9c304cf39988561f1241cee9fd5673a/src/clusterfuzz/stacktraces/constants.py#L308

That led oss-fuzz to think the stack traces were different, hence the bugs were different, hence git bisect showed that the bug appeared with this formatting change.

The golang bug was not introduced by that commit, but one in https://github.com/golang/go/commit/b05903a9f6408065c390ea6c62e523d9f51853a5..https://github.com/golang/go/commit/d3853fb4e6ee2b9f873ab2e41adc0e62a82e73e4 : we have known good and bad revisions (even with the different stack trace)

So, see you in 10-ish days

@mknyszek
Copy link
Contributor

mknyszek commented Jul 7, 2022

Here's the commit range more conveniently rendered at go.googlesource.com (not sure how to do it on GitHub): https://go.googlesource.com/go/+log/b05903a9f6408065c390ea6c62e523d9f51853a5..d3853fb4e6ee2b9f873ab2e41adc0e62a82e73e4

@tmm1
Copy link
Contributor

tmm1 commented Jul 7, 2022

b05903a...d3853fb

@catenacyber
Copy link
Contributor

catenacyber commented Jul 9, 2022

I am seeing other stack traces that may be related :

  • fatal error: stack growth not allowed in system call
  • fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

I can post the complete stack traces if needed

@aclements
Copy link
Member

aclements commented Jul 12, 2022

Since we now know this isn't a recent regression, I'm moving this issue to 1.20.

Looking forward to further bisection. :)

@aclements aclements modified the milestones: Go1.19, Go1.20 Jul 12, 2022
@catenacyber
Copy link
Contributor

catenacyber commented Jul 13, 2022

Current bisection leaves 4ce49b4...9dd71ba

@catenacyber
Copy link
Contributor

catenacyber commented Jul 13, 2022

So @aclements I think the commit introducing the regression is yours 9dd71ba

@catenacyber
Copy link
Contributor

catenacyber commented Jul 13, 2022

And so, I bet the bug was already present previously in some non-default setup ?..

@catenacyber
Copy link
Contributor

catenacyber commented Jul 13, 2022

This looks like a pretty massive change...

Let me know if I can help by providing additional information :

  • try to compile with other settings
  • try to print out other information when the panic occurs...

@catenacyber
Copy link
Contributor

catenacyber commented Jul 15, 2022

I confirm 9dd71ba is bad and 4ce49b4 is good

@catenacyber
Copy link
Contributor

catenacyber commented Jul 21, 2022

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49328 is the new bug tracker with the fixed golang stack trace :-)

MartinPetkov pushed a commit to MartinPetkov/oss-fuzz that referenced this issue Aug 15, 2022
cf golang/go#49075

Try to git bisect this unreproducible bug
MartinPetkov pushed a commit to MartinPetkov/oss-fuzz that referenced this issue Aug 15, 2022
@catenacyber
Copy link
Contributor

catenacyber commented Aug 21, 2022

Friendly ping @aclements : what can I do next to help understand this ABI bug cf #40724 ?

@aclements
Copy link
Member

aclements commented Nov 8, 2022

So @aclements I think the commit introducing the regression is yours 9dd71ba

Unfortunately, this just tells us it has something to do with register ABI, and not much else. You probably can't meaningfully bisect further back because register ABI didn't work much earlier than that commit.

That leaves us with old-fashioned debugging.

I would start with our usual GC issue debugging pattern:

  1. Can you reproduce it with GODEBUG=asyncpreemptoff=1?
  2. Can you reproduce it with GODEBUG=gccheckmark=1?
  3. Can you reproduce it with GODEBUG=gcshrinkstackoff=1?
  4. Can you reproduce it with GODEBUG=cgocheck=2?

@catenacyber
Copy link
Contributor

catenacyber commented Nov 8, 2022

Thanks Austin for getting back to this.

Can you reproduce it with GODEBUG=asyncpreemptoff=1?

Can I set the env variable after I start the program ?
(I do not think I have a way to control the environment variables before running the program)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Status: No status
Development

No branches or pull requests