Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests can hang dockerd, ps, and the container #511

Closed
jbarrick-mesosphere opened this issue Jul 5, 2019 · 4 comments · Fixed by #522
Closed

Integration tests can hang dockerd, ps, and the container #511

jbarrick-mesosphere opened this issue Jul 5, 2019 · 4 comments · Fixed by #522
Labels

Comments

@jbarrick-mesosphere
Copy link
Member

What happened:

My integration tests running in Docker hung:

ok  	github.com/kudobuilder/kudo/pkg/kudoctl/cmd	0.035s	coverage: 62.2% of statements
?   	github.com/kudobuilder/kudo/pkg/kudoctl/cmd/get	[no test files]
^C^R        
^C# github.com/kudobuilder/kudo/pkg/test/utils.test
SIGQUIT: quit
PC=0x459181 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x839ac8, 0x80, 0x0, 0x0, 0x0, 0x7f4a00000000, 0x0, 0x0, 0x7ffd4f72fdb8, 0x40a1e1, ...)
	/usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
runtime.futexsleep(0x839ac8, 0x7ffd00000000, 0xffffffffffffffff)
	/usr/local/go/src/runtime/os_linux.go:46 +0x4b
runtime.notesleep(0x839ac8)
	/usr/local/go/src/runtime/lock_futex.go:151 +0xa1
runtime.stopm()
	/usr/local/go/src/runtime/proc.go:1936 +0xc1
runtime.findrunnable(0xc00001c000, 0x0)
	/usr/local/go/src/runtime/proc.go:2399 +0x54a
runtime.schedule()
	/usr/local/go/src/runtime/proc.go:2525 +0x21c
runtime.park_m(0xc000001200)
	/usr/local/go/src/runtime/proc.go:2605 +0xa1
runtime.mcall(0x0)
	/usr/local/go/src/runtime/asm_amd64.s:299 +0x5b

goroutine 1 [running]:
	goroutine running on other thread; stack unavailable

Killing it with SIGQUIT didn't reveal anything useful and the process was still stuck.

Running ps faux on my system hangs:

➜  ~ ps faux
^C^C^C^C

docker stop is unable to stop it.

I ran ps and lsof and was able to see it was hung reading ps 13617 justin 6r REG 0,4 0 3452416 /proc/5613/cmdline

cat confirmed this:

$ sudo cat /proc/5613/cmdline
^C^C^C^C^C^C^C

Based on this blog post it seems like this can happen when the OOM killer is disabled and a memory limit is hit, so it seems like the fix may be to set a memory limit in Docker.

What you expected to happen:

Integration tests should pass.

How to reproduce it (as minimally and precisely as possible):

while ./test/run_tests.sh; do date; done
@jbarrick-mesosphere
Copy link
Member Author

jbarrick-mesosphere commented Jul 5, 2019

moby/moby#34579

moby/moby#15204

➜  ~ sudo cat /proc/5613/syscall
202 0x839ac8 0x80 0x0 0x0 0x0 0x0 0x7ffcae0ed328 0x459181
➜  ~ sudo cat /proc/5613/stack  
[<0>] call_rwsem_down_read_failed+0x14/0x30
[<0>] __do_page_fault+0x3bd/0x4c0
[<0>] do_page_fault+0x32/0x130
[<0>] page_fault+0x1e/0x30
[<0>] __clear_user+0x1a/0x50
[<0>] copy_fpstate_to_sigframe+0x7a/0x280
[<0>] do_signal+0x5bd/0x650
[<0>] exit_to_usermode_loop+0xbf/0xe0
[<0>] do_syscall_64+0x157/0x180
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff
➜  ~ 
➜  ~ cat /proc/5613/status
Name:	link
Umask:	0022
State:	D (disk sleep)
Tgid:	5613
Ngid:	0
Pid:	5613
PPid:	30344
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	64
Groups:	 
NStgid:	5613	7241
NSpid:	5613	7241
NSpgid:	30274	1
NSsid:	30274	1
VmPeak:	  447580 kB
VmSize:	  447580 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	  340540 kB
VmRSS:	  340540 kB
RssAnon:	  336520 kB
RssFile:	    4020 kB
RssShmem:	       0 kB
VmData:	  443232 kB
VmStk:	     132 kB
VmExe:	    2044 kB
VmLib:	       4 kB
VmPTE:	     740 kB
VmSwap:	       0 kB
HugetlbPages:	       0 kB
CoreDumping:	0
THP_enabled:	1
Threads:	6
SigQ:	2/63038
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000000000
SigCgt:	fffffffe7fc1feff
CapInh:	00000000a80425fb
CapPrm:	00000000a80425fb
CapEff:	00000000a80425fb
CapBnd:	00000000a80425fb
CapAmb:	0000000000000000
NoNewPrivs:	0
Seccomp:	2
Speculation_Store_Bypass:	thread force mitigated
Cpus_allowed:	ff
Cpus_allowed_list:	0-7
Mems_allowed:	00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	80
nonvoluntary_ctxt_switches:	34
➜  ~ 
➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.oom_control 
oom_kill_disable 0
under_oom 0
oom_kill 0
➜  ~
➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.usage_in_bytes 
661151744
➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.soft_limit_in_bytes 
9223372036854771712
➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.limit_in_bytes 
9223372036854771712
➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.max_usage_in_bytes 
2289606656
➜  ~ 
➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.stat              
cache 244879360
rss 384471040
rss_huge 6291456
shmem 0
mapped_file 1216512
dirty 0
writeback 0
swap 0
pgpgin 4633497
pgpgout 4481921
pgfault 5716161
pgmajfault 0
inactive_anon 82276352
active_anon 287907840
inactive_file 129687552
active_file 129634304
unevictable 0
hierarchical_memory_limit 9223372036854771712
hierarchical_memsw_limit 9223372036854771712
total_cache 244879360
total_rss 384471040
total_rss_huge 6291456
total_shmem 0
total_mapped_file 1216512
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 4633497
total_pgpgout 4481921
total_pgfault 5716161
total_pgmajfault 0
total_inactive_anon 82276352
total_active_anon 287907840
total_inactive_file 129687552
total_active_file 129634304
total_unevictable 0
➜  ~ 
➜  ~ cat /proc/5613/oom_score
6
➜  ~ cat /proc/5613/oom_score_adj 
0
➜  ~ cat /proc/5613/oom_adj 
0
➜  ~ 

@jbarrick-mesosphere
Copy link
Member Author

➜  ~ uname -a
Linux box 5.1.5-arch1-2-ARCH #1 SMP PREEMPT Mon May 27 03:37:39 UTC 2019 x86_64 GNU/Linux
➜  ~ 

@jbarrick-mesosphere
Copy link
Member Author

➜  ~ docker version
Client:
 Version:           18.09.6-ce
 API version:       1.39
 Go version:        go1.12.4
 Git commit:        481bc77156
 Built:             Sat May 11 06:11:03 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.6-ce
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.12.4
  Git commit:       481bc77156
  Built:            Sat May 11 06:10:35 2019
  OS/Arch:          linux/amd64
  Experimental:     false
➜  ~ 

@jbarrick-mesosphere
Copy link
Member Author

jbarrick-mesosphere commented Jul 5, 2019

I've set a memory limit on the running container:

➜  ~ docker update --memory 380000000 --memory-swap 380000000 zealous_shannon
zealous_shannon
➜  ~

And then make got killed by the kernel:

make: *** [integration-test] Killed
Makefile:30: recipe for target 'integration-test' failed

But the container has not exited.

Memory usage before:

➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.usage_in_bytes 
661151744
➜  ~ 

Memory usage after:

➜  ~ cat /sys/fs/cgroup/memory/docker/99d762e604afcc06f7e34c76a63da6221733701c41bdf7deb6310cbabaa2b974/memory.usage_in_bytes 
16060416
➜  ~ 

@jbarrick-mesosphere jbarrick-mesosphere changed the title Integration tests can hang the system Integration tests can hang dockerd, ps, and the container Jul 5, 2019
jbarrick-mesosphere added a commit to jbarrick-mesosphere/kudo that referenced this issue Jul 8, 2019
… set --rm to remove containers after tests (kudobuilder#512), -it to support cancelling tests.
kensipe pushed a commit that referenced this issue Jul 9, 2019
…o remove containers after tests (#512), -it to support cancelling tests. (#522)

* Set Docker memory limit for tests to prevent hangs (#511), set --rm to remove containers after tests (#512), -it to support cancelling tests.

* Reduce memory limit to 2 gigs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant