Skip to content

Avoid double close of agentInitDone#1415

Merged
sanimej merged 1 commit intomoby:masterfrom
mrjana:agent
Aug 24, 2016
Merged

Avoid double close of agentInitDone#1415
sanimej merged 1 commit intomoby:masterfrom
mrjana:agent

Conversation

@mrjana
Copy link
Copy Markdown
Contributor

@mrjana mrjana commented Aug 24, 2016

Avoid by reinitializing the channel immediately after closing the
channel within a lock. Also change the wait code to cache the channel in
stack be retrieving it from controller and wait on the stack copy of the
channel.

Signed-off-by: Jana Radhakrishnan mrjana@docker.com

@sanimej
Copy link
Copy Markdown

sanimej commented Aug 24, 2016

@mrjana There is only one close(c.agentInitDone) call which happens in agentSetup(). How does this lead to two close on the channels on daemon shutdown ? Are we getting spurious events from ListenClusterEvents() ?

@mrjana
Copy link
Copy Markdown
Contributor Author

mrjana commented Aug 24, 2016

Yeah we sometimes get more than one notification from engine. That may or may not be fixed. But from libnetwork point of view we should not panic on such events so to make it robust we have to make once instance of channel agenInitDone go through only one iteration of wait and close.

Comment thread agent.go Outdated
c.Lock()
if c.agent != nil {
close(c.agentInitDone)
c.agentInitDone = make(chan struct{})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agentSetup() and SetupIngress (caller of AgentInitWait) can get called from different async paths. With the earlier behavior of only closing the channel, even if SetupIngress gets called later it would return immediately if the agent has already been setup. But by creating the channel right after closing it won't it result in AgentInitWait waiting forever if SetupIngress gets called later ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting. Yeah agree that this could cause a problem. We may have to use c.agentInitDone as a tristate:

  • once initdone close it and make c.agentInitDone nil
  • if it is nil then it is already closed so waiter doesn't need to wait and simply return
  • if it is not nil then cache it and wait
  • Recreate the channel only on a leave

Avoid by reinitializing the channel immediately after closing the
channel within a lock. Also change the wait code to cache the channel in
stack be retrieving it from controller and wait on the stack copy of the
channel.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
@sanimej
Copy link
Copy Markdown

sanimej commented Aug 24, 2016

LGTM

@sanimej sanimej merged commit ae8d864 into moby:master Aug 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants