New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: add test for syscall failing to create new OS thread during syscall.Exec #20822

Open
jvshahid opened this Issue Jun 28, 2017 · 14 comments

Comments

Projects
None yet
5 participants
@jvshahid
Contributor

jvshahid commented Jun 28, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8 linux/amd64 (same behavior with 1.8.3)

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"          
GOBIN=""                
GOEXE=""                
GOHOSTARCH="amd64"      
GOHOSTOS="linux"        
GOOS="linux"            
GOPATH="/home/jvshahid/codez/gocodez"           
GORACE=""               
GOROOT="/home/jvshahid/.gvm/gos/go1.8"          
GOTOOLDIR="/home/jvshahid/.gvm/gos/go1.8/pkg/tool/linux_amd64"                                  
GCCGO="gccgo"           
CC="gcc"                
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build588313748=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"               
CGO_ENABLED="1"         
PKG_CONFIG="pkg-config" 
CGO_CFLAGS="-g -O2"     
CGO_CPPFLAGS=""         
CGO_CXXFLAGS="-g -O2"   
CGO_FFLAGS="-g -O2"     
CGO_LDFLAGS="-g -O2"    

What did you do?

Run this app in a while loop, e.g. while true; do go run main.go; done

What did you expect to see?

/path/to/pwd
/path/to/pwd
/path/to/pwd
/path/to/pwd
/path/to/pwd
/path/to/pwd

What did you see instead?

runtime: failed to create new OS thread (have 5 already; errno=11)                               
runtime: may need to increase max user processes (ulimit -u)                                     
fatal error: newosproc                                                                           

Kernel version (uname -a)

Linux amun 4.4.0-81-generic #104-Ubuntu SMP Wed Jun 14 08:17:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

There are few issues that were opened in the past with the same error message. The most relevant comment i found in all of them is this comment which suggests that this could be a kernel issue and was looking for a way to reproduce the problem. Some interesting notes:

  1. setting GOMAXPROCS to 1 make the problem hard to reproduce (may be event eliminate it)
  2. the go runtime usually gets a chance to run for a while before the process threads are killed. that means that the process will sometime exec successfully and exit 0 and will sometimes exit with non-0 status code after panicing
@jvshahid

This comment has been minimized.

Contributor

jvshahid commented Jun 28, 2017

/cc @ianlancetaylor since i referenced his comment

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 28, 2017

I'm not surprised that this fails, and I don't think it's a bug. Running go run main.go means starting the Go tool, which will look at main.go, check that all the imports are up to date, run the compiler, run the linker, and only then run your (simple) program. While it is doing that, your shell loop has plenty of time to loop around and start another instance of go run main.go. The number of go run main.go builds running in parallel will steadily increase, especially as the load on the system increases and each one takes longer and longer to complete. Soon you will hit your process limit (which you can by running ulimit -u) and you will get the error you are reporting.

If you want to show a real problem, run go build main.go and then run ./main in a loop. Then you will be running a very simple program where there is a realistic possibility that the program can complete in the time it takes the shell to loop around. Even then I expect they will tend to stack up, but it should take a lot longer.

@jvshahid

This comment has been minimized.

Contributor

jvshahid commented Jun 28, 2017

This while loop is running go run main.go synchronously, i.e. it will wait for it to exit. Simple way to verify that is to replace the echo $PWD with echo before && sleep 10 && echo after.

@jvshahid

This comment has been minimized.

Contributor

jvshahid commented Jun 28, 2017

Also worth noting this is reproducible after few runs (10 or 20 runs). It is not consuming all the pids on the system

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 28, 2017

Ah, OK, sorry.

What does ulimit -u print on your system?

Immediately after the loop fails, what does ps print?

@jvshahid

This comment has been minimized.

Contributor

jvshahid commented Jun 28, 2017

$ ulimit -u
62821

it is really hard to make the loop fail, but i currently have 377 threads running. I don't imagine this loop to be adding enough processed and/or threads to exceed the limit:

$ ps -elF | wc -l
377
@jvshahid

This comment has been minimized.

Contributor

jvshahid commented Jun 28, 2017

Here's the system wide limits:

$ cat /proc/sys/kernel/pid_max
32768
jvshahid@amun [~/codez/gocodez/src/github.com/jvshahid/testexec]
$ cat /proc/sys/kernel/threads-max
125642

I really doubt this has anything to do with limits

@bradfitz bradfitz changed the title from failed to create new OS thread during syscall.Exec to runtime: failed to create new OS thread during syscall.Exec Jun 28, 2017

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 28, 2017

Any difference with Go 1.9beta2?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 28, 2017

Ah, you're right. This is #18146 for a program that doesn't use cgo. Sorry for forgetting about that.

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Jun 28, 2017

@jvshahid

This comment has been minimized.

Contributor

jvshahid commented Jun 28, 2017

@bradfitz yes go1.9beta2 fixes the issue. I'm guessing it is 91139b8 that fixed it by introducing a lock. I was also curious if you think setting GOMAXPROCS to 1 is a reasonable workaround for the meantime ?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 28, 2017

Good to hear. So I guess what this bug needs now is a test.

I think we'd prefer you use go1.9beta2 as your fix rather than GOMAXPROCS=1 as a workaround. Go 1.9 has no known bugs compared to Go 1.8.

@bradfitz bradfitz added the Testing label Jun 28, 2017

@bradfitz bradfitz changed the title from runtime: failed to create new OS thread during syscall.Exec to runtime: add test for syscall failing to create new OS thread during syscall.Exec Jun 28, 2017

@jvshahid

This comment has been minimized.

Contributor

jvshahid commented Jun 28, 2017

@bradfitz do you think converting the bash loop into a go test will be ok to merge in ? I'm concerned that this might be flaky test. what do you think ?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 28, 2017

Ideally the test should execute pretty quickly. And flaky tests are no good, but I don't see why this one would be flaky. Rather than expect a failure in, say, 10,000 iterations, just do 10,000 iterations and pass if you don't get a failure. Assuming you used to generally get a failure in 10,000 iterations.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@odeke-em

This comment has been minimized.

Member

odeke-em commented Jun 23, 2018

Hello @jvshahid, might you be interested or available to submit a CL with the suggested test for Go1.11?

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Unplanned Jul 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment