Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump ipam to 0.2.2 #2030

Merged
merged 3 commits into from Mar 29, 2017
Merged

Bump ipam to 0.2.2 #2030

merged 3 commits into from Mar 29, 2017

Conversation

jnummelin
Copy link
Contributor

Ipam 0.2.2 introduced mitigation to zombie netns issues by not actually releasing addresses that still respond to ping. This PR bumps agent to take ipam 0.2.2 into use.

@jnummelin jnummelin added this to the 1.1.6 milestone Mar 29, 2017
@SpComb
Copy link
Contributor

SpComb commented Mar 29, 2017

Note that the release errors are logged, but they do also crash the Weave actor. Not 100% sure what the side effects of this are. I suspect that for 1.1.x in particular it would be better to just trap the exception and log it, without crashing the Weave actor.

I, [2017-03-29T07:38:18.760665 #1]  INFO -- Kontena::Workers::ServicePodWorker: terminating redis-daemon-1
I, [2017-03-29T07:38:18.798955 #1]  INFO -- Kontena::ServicePods::Terminator: terminating service: /redis-daemon-1
D, [2017-03-29T07:38:19.237155 #1] DEBUG -- Kontena::Workers::WeaveWorker: waited 0.1s until: network ready yielded true
I, [2017-03-29T07:38:19.237439 #1]  INFO -- Kontena::NetworkAdapters::Weave: Remove container=3fa47c19a89678055124e971222057707c4845b9691b2232145b1fed5ad15162 from network=kontena at cidr=10.81.128.51/16
D, [2017-03-29T07:38:19.237531 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: releasing address 10.81.128.51/16 for network kontena
D, [2017-03-29T07:38:19.247834 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: Request POST /IpamDriver.ReleaseAddress: 409 Conflict: {"Error":"Skip zombie address=10.81.128.51 in pool=kontena that still responds to ping"}
E, [2017-03-29T07:38:19.248242 #1] ERROR -- Kontena::Workers::WeaveWorker: failed to remove container: Kontena::NetworkAdapters::IpamError: Skip zombie address=10.81.128.51 in pool=kontena that still responds to ping
E, [2017-03-29T07:38:19.248301 #1] ERROR -- Kontena::Workers::WeaveWorker: /app/lib/kontena/network_adapters/ipam_client.rb:117:in `handle_error_response'
/app/lib/kontena/network_adapters/ipam_client.rb:76:in `rescue in release_address'
/app/lib/kontena/network_adapters/ipam_client.rb:66:in `release_address'
/app/lib/kontena/network_adapters/weave.rb:354:in `remove_container'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
(celluloid):0:in `remote procedure call'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:45:in `value'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/proxy/sync.rb:22:in `method_missing'
/app/lib/kontena/workers/weave_worker.rb:100:in `on_container_destroy'
/app/lib/kontena/workers/weave_worker.rb:65:in `on_container_event'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/async.rb:7:in `dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
E, [2017-03-29T07:38:19.248093 #1] ERROR -- : Actor crashed!
Kontena::NetworkAdapters::IpamError: Skip zombie address=10.81.128.51 in pool=kontena that still responds to ping
	/app/lib/kontena/network_adapters/ipam_client.rb:117:in `handle_error_response'
	/app/lib/kontena/network_adapters/ipam_client.rb:76:in `rescue in release_address'
	/app/lib/kontena/network_adapters/ipam_client.rb:66:in `release_address'
	/app/lib/kontena/network_adapters/weave.rb:354:in `remove_container'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
I, [2017-03-29T07:38:19.251000 #1]  INFO -- Kontena::NetworkAdapters::Weave: initialized
I, [2017-03-29T07:38:19.253478 #1]  INFO -- Kontena::NetworkAdapters::WeaveExecutor: initialized
I, [2017-03-29T07:38:19.254410 #1]  INFO -- Kontena::NetworkAdapters::WeaveExecutor: initialized

Particularly something like scaling down a lot of containers seems to trigger side-effects:

kontena-agent | E, [2017-03-29T07:45:55.003913 #1] ERROR -- : Actor crashed!
kontena-agent | Kontena::NetworkAdapters::IpamError: Skip zombie address=10.81.128.79 in pool=kontena that still responds to ping
kontena-agent | 	/app/lib/kontena/network_adapters/ipam_client.rb:117:in `handle_error_response'
kontena-agent | 	/app/lib/kontena/network_adapters/ipam_client.rb:76:in `rescue in release_address'
kontena-agent | 	/app/lib/kontena/network_adapters/ipam_client.rb:66:in `release_address'
kontena-agent | 	/app/lib/kontena/network_adapters/weave.rb:354:in `remove_container'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
kontena-agent | 	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
kontena-agent | E, [2017-03-29T07:45:55.029713 #1] ERROR -- Kontena::Workers::EventWorker: Celluloid::DeadActorError: attempted to call a dead actor: adapter_image?
E, [2017-03-29T07:46:02.173576 #1] ERROR -- Kontena::Workers::WeaveWorker: failed to remove container: Celluloid::DeadActorError: attempted to call a dead actor: network_ready?
E, [2017-03-29T07:46:02.188209 #1] ERROR -- Kontena::Workers::WeaveWorker: /usr/lib/ruby/gems/2.3.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/responses.rb:29:in `value'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:45:in `value'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/proxy/sync.rb:22:in `method_missing'
/app/lib/kontena/helpers/weave_helper.rb:23:in `block in wait_network_ready?'
/app/lib/kontena/helpers/wait_helper.rb:29:in `block in wait_until'
/app/lib/kontena/helpers/wait_helper.rb:28:in `loop'
/app/lib/kontena/helpers/wait_helper.rb:28:in `wait_until'
/app/lib/kontena/helpers/wait_helper.rb:50:in `wait_until!'
/app/lib/kontena/helpers/weave_helper.rb:22:in `wait_network_ready?'
/app/lib/kontena/workers/weave_worker.rb:98:in `on_container_destroy'
/app/lib/kontena/workers/weave_worker.rb:65:in `on_container_event'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/async.rb:7:in `dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
E, [2017-03-29T07:46:02.188073 #1] ERROR -- Kontena::Workers::EventWorker: Celluloid::DeadActorError: attempted to call a dead actor: adapter_image?

@jnummelin jnummelin added enhancement and removed bug labels Mar 29, 2017
@jakolehm
Copy link
Contributor

@SpComb @jnummelin I agree that actor should catch/log that error and not crash.

@jnummelin
Copy link
Contributor Author

Yes, I'll put some rescue around that case

@jnummelin
Copy link
Contributor Author

jnummelin commented Mar 29, 2017

Actors do not crash anymore on this.

kontena-agent | I, [2017-03-29T08:27:18.940863 #1]  INFO -- Kontena::NetworkAdapters::Weave: Remove container=daab113e2869b2796d9ed3d2a9a27f4fd4ef98319c19903b82bb522afdc4582b from network=kontena at cidr=10.81.128.13/16
kontena-agent | D, [2017-03-29T08:27:18.940941 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: releasing address 10.81.128.13/16 for network kontena
kontena-agent | W, [2017-03-29T08:27:18.947233 #1]  WARN -- Kontena::NetworkAdapters::IpamClient: {"Error":"Skip zombie address=10.81.128.13 in pool=kontena that still responds to ping"}
kontena-agent | D, [2017-03-29T08:27:19.053661 #1] DEBUG -- Kontena::Workers::WeaveWorker: waited 0.1s until: network ready yielded true
kontena-agent | I, [2017-03-29T08:27:19.056578 #1]  INFO -- Kontena::NetworkAdapters::Weave: Remove container=0e14bc38bdc1d22edf8c554c6ad28846102a3f466b792760fc6f88a3b09c4836 from network=kontena at cidr=10.81.128.80/16
kontena-agent | D, [2017-03-29T08:27:19.056700 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: releasing address 10.81.128.80/16 for network kontena
kontena-agent | W, [2017-03-29T08:27:19.065343 #1]  WARN -- Kontena::NetworkAdapters::IpamClient: {"Error":"Skip zombie address=10.81.128.80 in pool=kontena that still responds to ping"}

handle_error_response(error)
if error.response.status == 409
# IPAM return 409 in case the address still responds to ping, case zombies
warn error.response.body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to handle this a bit higher up in the stack, in the Weave actor?

def remove_container(...)

rescue IpamError => error
  # leave it for later cleanup
  warn "Failed to release container=#{container_id} from network=#{overlay_network} at cidr=#{overlay_cidr}: #{error}"
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that this is a bit of a special case and all the "real" errors would let the weave actor crash this way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that it would be preferable to just drop any release errors. AFAIK letting the current weave actor crash is not going to have any advantages, it doesn't really affect any recovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point. I'll move the code to that direction

@jnummelin
Copy link
Contributor Author

Now IpamClient raises on un-expected statuses and Weave rescues in #remove_container:

kontena-agent | I, [2017-03-29T08:58:45.858441 #1]  INFO -- Kontena::NetworkAdapters::Weave: Remove container=9c9fd6f5d6d6cb7effd003af15355846bf196b483a906a1e74ae70d6ef69a257 from network=kontena at cidr=10.81.128.44/16
kontena-agent | D, [2017-03-29T08:58:45.858503 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: releasing address 10.81.128.44/16 for network kontena
kontena-agent | D, [2017-03-29T08:58:45.867021 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: Request POST /IpamDriver.ReleaseAddress: 409 Conflict: {"Error":"Skip zombie address=10.81.128.44 in pool=kontena that still responds to ping"}
kontena-agent | W, [2017-03-29T08:58:45.867131 #1]  WARN -- Kontena::NetworkAdapters::Weave: Failed to release container=9c9fd6f5d6d6cb7effd003af15355846bf196b483a906a1e74ae70d6ef69a257 from network=kontena at cidr=10.81.128.44/16: Skip zombie address=10.81.128.44 in pool=kontena that still responds to ping

Copy link
Contributor

@SpComb SpComb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I, [2017-03-29T09:58:55.051653 #1]  INFO -- Kontena::NetworkAdapters::Weave: Remove container=d2e48c1c2a8aa84c9abb934d4155c3019b8e81946a455b5d5ed8021460b6853d from network=kontena at cidr=10.81.128.110/16
D, [2017-03-29T09:58:55.051722 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: releasing address 10.81.128.110/16 for network kontena
D, [2017-03-29T09:58:55.068567 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: Request POST /IpamDriver.ReleaseAddress: 409 Conflict: {"Error":"Skip zombie address=10.81.128.110 in pool=kontena that still responds to ping"}
W, [2017-03-29T09:58:55.068776 #1]  WARN -- Kontena::NetworkAdapters::Weave: Failed to release container=d2e48c1c2a8aa84c9abb934d4155c3019b8e81946a455b5d5ed8021460b6853d from network=kontena at cidr=10.81.128.110/16: Skip zombie address=10.81.128.110 in pool=kontena that still responds to ping

The warning message is slightly repetitive now, but that's okay.

W, [2017-03-29T09:59:23.167807 #1]  WARN -- Addresses::Cleanup: Skip zombie address=10.81.128.110 in pool=kontena that still responds to ping

@SpComb SpComb merged commit 288c6be into master Mar 29, 2017
@SpComb SpComb deleted the fix/ipam-0.2.2 branch March 29, 2017 10:01
kke pushed a commit that referenced this pull request Mar 31, 2017
* bump ipam to 0.2.2

* handle 409 zombie reponses in IpamClient

* let IpamClient raise error in case of 409 and weave actor rescue it in release
kke added a commit that referenced this pull request Apr 3, 2017
* Bump IPAM to version 0.2.2 (#2030)

* bump ipam to 0.2.2

* handle 409 zombie reponses in IpamClient

* let IpamClient raise error in case of 409 and weave actor rescue it in release

* Bump to 1.1.6

* Fix header
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants