-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Description
I've been stalking a strange bug on "stress test" builder that runs under
Jenkins.
I have the Jenkins builder configured to kill a job if it runs for more than 60 minutes.
The current set of tests usually finish in about 20 to 30 minutes.
I had 4 builds between 4 February and 14 February that got killed after 60 minutes. At
first I thought that the machine was under high load or that the Jenkins timer was
misbehaving, but I added some timing logs to the build script and everything seemed fine.
I disabled the Jenkins build time limit. I caught one this morning that had been stuck
for a few hours.
There were two processes running that were part of the suspect build:
20397 ? Sl 0:48 go test std -short -timeout=120s
22515 ? S 0:00 go test std -short -timeout=120s
I could tell from their /proc/<PID>/environ file that they were both part of the
Jenkins job.
/proc/20397 had:
cwd -> /build/go.tip/go/src
exe -> /build/go.tip/go/bin/go
/build/go.tip/go is $GOROOT.
/proc/22515 had:
cwd -> /dev/shm/go-build017000760/sync/_test
exe -> /build/go.tip/go/bin/go
/dev/shm is TMPDIR.
This looks like it could be a process stuck between fork and exec?
The last thing the job had printed out before I killed it was:
ok log/syslog 2.187s
In previous cases I had:
+ go test -v -cpu 1,2,4 std
...
ok mime/multipart 0.642s
Build timed out (after 60 minutes). Marking the build as aborted.
+ go test -v -cpu 1,2,4 std
...
ok crypto/tls 1.182s
Build timed out (after 60 minutes). Marking the build as aborted.
and a very nice one:
+ go version
go version devel +d8e8634eec4e Thu Feb 14 15:04:22 2013 +1100 linux/amd64
++ seq 1 5 + for i in '`seq 1 5`'
+ export GOMAXPROCS=3
+ GOMAXPROCS=3
+ ./run.bash --no-rebuild
# Testing packages.
ok cmd/api 0.055s
? cmd/cgo [no test files]
ok cmd/fix 2.585s
ok cmd/go 0.058s
? cmd/godoc [no test files]
ok cmd/gofmt 0.103s
? cmd/vet [no test files]
? cmd/yacc [no test files]
ok archive/tar 0.041s
Build timed out (after 60 minutes). Marking the build as aborted.
so this doesn't seem to be related to any of the recent issues with log/syslog.
I tried to strace pid 22515 on the way out. I sent a SIGQUIT first, this didn't kill it.
Then I sent a SIGKILL.
22515 futex(0x7fafb4000020, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be
restarted if SA_RESTART is set)
22515 --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=13066, si_uid=0} ---
22515 futex(0xb68930, FUTEX_WAKE, 1) = 0
22515 rt_sigreturn() = 202
22515 futex(0x7fafb4000020, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be
restarted if SA_RESTART is set)
22515 --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=13066, si_uid=0} ---
22515 rt_sigreturn() = 202
22515 futex(0x7fafb4000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
22515 +++ killed by SIGKILL +++
I also straced pid 20397, but that trace is less enlightening.
Finally, I noticed that 22515 had quite a lot of file descriptors hanging around.
Doesn't look quite right at first glance:
# ls -la /proc/22515/fd
total 0
dr-x------ 2 jenkins jenkins 0 Feb 18 06:04 .
dr-xr-xr-x 8 jenkins jenkins 0 Feb 15 21:47 ..
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 0 -> /dev/null
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 1 -> pipe:[566509041]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 10 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 11 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 13 -> pipe:[566509040]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 14 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 15 -> pipe:[566530248]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 16 -> pipe:[566498220]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 17 -> pipe:[566506106]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 18 -> pipe:[566498220]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 19 -> /dev/null
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 2 -> pipe:[566433078]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 20 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 22 -> pipe:[566527240]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 23 -> pipe:[566513584]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 24 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 25 -> pipe:[566508526]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 26 -> pipe:[566508526]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 27 -> pipe:[566536197]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 28 -> /dev/null
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 29 -> pipe:[566530248]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 3 -> pipe:[566527242]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 30 -> pipe:[566506059]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 31 -> pipe:[566521230]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 32 -> /dev/null
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 33 -> pipe:[566506106]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 34 -> pipe:[566527240]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 35 -> pipe:[566521230]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 36 -> pipe:[566521227]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 37 -> pipe:[566509032]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 38 -> pipe:[566514464]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 39 -> pipe:[566521227]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 4 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 40 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 41 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 42 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 43 -> pipe:[566531204]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 44 -> pipe:[566508527]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 45 -> pipe:[566516473]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 46 -> pipe:[566529271]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 47 -> pipe:[566529271]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 48 -> pipe:[566529259]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 49 -> pipe:[566509043]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 5 -> pipe:[566521231]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 50 -> pipe:[566509043]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 51 -> pipe:[566511380]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 52 -> pipe:[566508527]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 53 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 54 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 55 -> pipe:[566520373]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 57 -> pipe:[566520373]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 59 -> pipe:[566525096]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 6 -> pipe:[566521231]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 60 -> pipe:[566525096]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 61 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 63 -> pipe:[566509038]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 65 -> /dev/null
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 66 -> pipe:[566509041]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 67 -> pipe:[566509041]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 68 -> pipe:[566521228]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 69 -> pipe:[566521213]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 7 -> pipe:[566536197]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 70 -> pipe:[566521228]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 73 -> pipe:[566514464]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 75 -> pipe:[566529270]
l-wx------ 1 jenkins jenkins 64 Feb 18 06:07 76 -> pipe:[566529270]
lr-x------ 1 jenkins jenkins 64 Feb 18 06:07 9 -> /dev/null
I've seen this issue with:
go version devel +b67523215571 Mon Feb 04 15:30:41 2013 +1100 linux/amd64
go version devel +a44171137af8 Sat Feb 09 17:36:31 2013 -0500 linux/amd64
go version devel +d8e8634eec4e Thu Feb 14 15:04:22 2013 +1100 linux/amd64
go version devel +537555f45dcc Fri Feb 15 13:37:43 2013 -0800 linux/amd64
Kernel is 3.4.7-1.fc16.x86_64.