Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: preemption-related deadlock when calling Go from C #35294

Closed
mdempsky opened this issue Oct 31, 2019 · 11 comments
Closed

runtime: preemption-related deadlock when calling Go from C #35294

mdempsky opened this issue Oct 31, 2019 · 11 comments
Assignees
Milestone

Comments

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 31, 2019

While comparing dvyukov/go-fuzz with mdempsky/go114-fuzz-build, I noticed that fuzzing k8s.io/kubernetes/test/fuzz/yaml.FuzzSigYaml with libFuzzer seems to periodically run into timeouts. I haven't been able to reproduce this with Go 1.13, so this seems like a regression.

Tentatively marking release blocker, since this seems like it could be a subtle runtime and/or cgo regression.

I'm going to try bisecting to see if I can figure out what commit caused the problem.

@mdempsky mdempsky added this to the Go1.14 milestone Oct 31, 2019
@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Oct 31, 2019

I ran a bunch of times at master to see how long it takes to hang. Here's a record of the number of fuzz iterations before hang:

118784
274821
97466
41172
705368
374579
773952
135061
4180
34428
520221
199902
241640
306002
25566
160724
359761
205918
26332
518383
99168
316188

I would guess it's exponentially distributed?

Running until 2M iterations seems like a safe threshold for bisection.

Edit: Using a 1.5M iteration threshold because I'm impatient, and estimating the exponential distribution based on the above samples suggests to me the chance of 1.5M iterations without failure is 0.25%. (But I'm no statistician.)

@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Oct 31, 2019

/cc @aclements

Still bisecting, but it seems to be narrowing down on one of the CLs related to #10958 / #24543.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 31, 2019

Honestly I wouldn't be surprised if the culprit CL turns out to be https://golang.org/cl/171883, which turns on the new timers. I suspect there may be some case we are failing to handle.

It would be nice (for me) to hear that it was a different CL, though.

@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Oct 31, 2019

@ianlancetaylor You look safe for now. My current bisect interval is from 316fb95 (good) to 6058603 (bad), and that CL looks like it was merged outside of that window.

@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Oct 31, 2019

Git bisect identified 3f83411.

I double checked to make sure it does hang at that CL; and I'm up to 2M iterations without hanging for double checking the previous commit (which assuming I didn't mess up "git bisect bad" / "git bisect good" earlier should be at least 3.5M iterations in total).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 31, 2019

@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Nov 1, 2019

I've been able to minimize the failure below, though it seems to reproduce the issue somewhat less reliably than actually using libfuzzer.

The program should run forever printing increasing numbers, but occasionally it hangs.

$ go build -buildmode=c-archive -o fuzz.a fuzz.go
$ cc -o driver driver.c fuzz.a -lpthread
$ ./driver

(Edit: See better repro below.)

$ cat fuzz.go
package main

// #include <stdint.h>
import "C"

import (
	"unsafe"
)

//export LLVMFuzzerTestOneInput
func LLVMFuzzerTestOneInput(data *C.uint8_t, size C.size_t) C.int {
	s := (*[1 << 30]byte)(unsafe.Pointer(data))[:size:size]

	for _, x := range s {
		_ = make([]byte, x)
	}
	return 0
}

func main() {
}
$ cat driver.c
// +build ignore

#include <stdio.h>

#include "fuzz.h"

uint8_t data[1] = { 64 };

int main() {
  int i = 0;
  for (;;) {
    i++;
    if ((i % 4096) == 0) printf("#%d\n", i);
    LLVMFuzzerTestOneInput(data, sizeof(data));
  }
}
@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Nov 1, 2019

Simpler, much more reliable repro:

$ cat fuzz.go
package main

import "C"

var sink []byte

//export Fuzz
func Fuzz() {
	sink = make([]byte, 4096)
}

func main() {
}
$ cat driver.c
// +build ignore

#include <stdio.h>

#include "fuzz.h"

int main() {
  int i = 0;
  for (;;) {
    i++;
    if ((i % 256) == 0) printf("#%d\n", i);
    Fuzz();
  }
}
@mdempsky

This comment has been minimized.

Copy link
Member Author

@mdempsky mdempsky commented Nov 1, 2019

I'm not able to make any more progress on this. The last repro is very reliable, but I don't sufficiently understand the runtime preemption logic, and my naive debugging skills aren't sufficient here.

@mdempsky mdempsky changed the title runtime: dvyukov/go-fuzz -libfuzzer timeouts at master, but not Go 1.13 runtime: preemption-related deadlock when calling Go from C Nov 1, 2019
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 2, 2019

I found the problem. Working on adding a test.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 2, 2019

Change https://golang.org/cl/204957 mentions this issue: runtime: clear preemptStop in dropm

@gopherbot gopherbot closed this in 8de0bb7 Nov 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.