New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in libcontainerd process handling #35809

Merged
merged 2 commits into from Dec 16, 2017

Conversation

Projects
None yet
6 participants
@cpuguy83
Contributor

cpuguy83 commented Dec 15, 2017

  • Fix some missing synchronization in libcontainerd
  • Fix error handling for kill/process not found

With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for ESRCH which would only
be returned from unix.Kill and never from containerd even though we
were checking containerd responses for it.

Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.

Related to #35594

Fix error handling for kill/process not found
With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for `ESRCH` which would only
be returned from `unix.Kill` and never from containerd even though we
were checking containerd responses for it.

Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Dec 15, 2017

Contributor

LGTM

Contributor

crosbymichael commented Dec 15, 2017

LGTM

Fix some missing synchronization in libcontainerd
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@vieux

This comment has been minimized.

Show comment
Hide comment
@vieux

vieux Dec 16, 2017

Collaborator

LGTM

Collaborator

vieux commented Dec 16, 2017

LGTM

@vieux

vieux approved these changes Dec 16, 2017

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 16, 2017

Member

z failure looks unrelated;

01:53:24 ----------------------------------------------------------------------
01:53:24 FAIL: check_test.go:366: DockerSwarmSuite.TearDownTest
01:53:24 
01:53:24 check_test.go:371:
01:53:24     d.Stop(c)
01:53:24 daemon/daemon.go:395:
01:53:24     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
01:53:24 ... Error: Error while stopping the daemon db005154c4e62 : exit status 130
01:53:24 
01:53:24 
01:53:24 ----------------------------------------------------------------------
01:53:24 PANIC: docker_api_swarm_service_test.go:201: DockerSwarmSuite.TestAPISwarmServicesUpdateStartFirst
01:53:24 
01:53:24 [db005154c4e62] waiting for daemon to start
01:53:24 [db005154c4e62] daemon started
01:53:24 
01:53:24 [db005154c4e62] daemon started
01:53:24 Attempt #2: daemon is still running with pid 12408
01:53:24 Attempt #3: daemon is still running with pid 12408
01:53:24 Attempt #4: daemon is still running with pid 12408
01:53:24 [db005154c4e62] exiting daemon
01:53:24 ... Panic: Fixture has panicked (see related PANIC)
01:53:24 
01:53:24 ----------------------------------------------------------------------

restarting for good measure

Member

thaJeztah commented Dec 16, 2017

z failure looks unrelated;

01:53:24 ----------------------------------------------------------------------
01:53:24 FAIL: check_test.go:366: DockerSwarmSuite.TearDownTest
01:53:24 
01:53:24 check_test.go:371:
01:53:24     d.Stop(c)
01:53:24 daemon/daemon.go:395:
01:53:24     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
01:53:24 ... Error: Error while stopping the daemon db005154c4e62 : exit status 130
01:53:24 
01:53:24 
01:53:24 ----------------------------------------------------------------------
01:53:24 PANIC: docker_api_swarm_service_test.go:201: DockerSwarmSuite.TestAPISwarmServicesUpdateStartFirst
01:53:24 
01:53:24 [db005154c4e62] waiting for daemon to start
01:53:24 [db005154c4e62] daemon started
01:53:24 
01:53:24 [db005154c4e62] daemon started
01:53:24 Attempt #2: daemon is still running with pid 12408
01:53:24 Attempt #3: daemon is still running with pid 12408
01:53:24 Attempt #4: daemon is still running with pid 12408
01:53:24 [db005154c4e62] exiting daemon
01:53:24 ... Panic: Fixture has panicked (see related PANIC)
01:53:24 
01:53:24 ----------------------------------------------------------------------

restarting for good measure

@thaJeztah thaJeztah merged commit 52656da into moby:master Dec 16, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 38427 has succeeded
Details
janky Jenkins build Docker-PRs 47162 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 7542 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 18679 has succeeded
Details
z Jenkins build Docker-PRs-s390x 7386 has succeeded
Details

@cpuguy83 cpuguy83 deleted the cpuguy83:fix_container_zombies branch Dec 18, 2017

@jchauncey

This comment has been minimized.

Show comment
Hide comment
@jchauncey

jchauncey Dec 18, 2017

So this doesnt seem to have fixed my problem =\

jchauncey commented Dec 18, 2017

So this doesnt seem to have fixed my problem =\

@jchauncey

This comment has been minimized.

Show comment
Hide comment
@jchauncey

jchauncey Dec 19, 2017

nevermind this has definitely solved the problem i was having with jenkins and docker.

jchauncey commented Dec 19, 2017

nevermind this has definitely solved the problem i was having with jenkins and docker.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 19, 2017

Member

Thanks for testing @jchauncey 👍

Member

thaJeztah commented Dec 19, 2017

Thanks for testing @jchauncey 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment