New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump ipam to 0.2.2 #2030
Bump ipam to 0.2.2 #2030
Conversation
Note that the release errors are logged, but they do also crash the
Particularly something like scaling down a lot of containers seems to trigger side-effects:
|
@SpComb @jnummelin I agree that actor should catch/log that error and not crash. |
Yes, I'll put some rescue around that case |
Actors do not crash anymore on this.
|
handle_error_response(error) | ||
if error.response.status == 409 | ||
# IPAM return 409 in case the address still responds to ping, case zombies | ||
warn error.response.body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to handle this a bit higher up in the stack, in the Weave
actor?
def remove_container(...)
rescue IpamError => error
# leave it for later cleanup
warn "Failed to release container=#{container_id} from network=#{overlay_network} at cidr=#{overlay_cidr}: #{error}"
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that this is a bit of a special case and all the "real" errors would let the weave actor crash this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that it would be preferable to just drop any release errors. AFAIK letting the current weave actor crash is not going to have any advantages, it doesn't really affect any recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point. I'll move the code to that direction
Now
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I, [2017-03-29T09:58:55.051653 #1] INFO -- Kontena::NetworkAdapters::Weave: Remove container=d2e48c1c2a8aa84c9abb934d4155c3019b8e81946a455b5d5ed8021460b6853d from network=kontena at cidr=10.81.128.110/16
D, [2017-03-29T09:58:55.051722 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: releasing address 10.81.128.110/16 for network kontena
D, [2017-03-29T09:58:55.068567 #1] DEBUG -- Kontena::NetworkAdapters::IpamClient: Request POST /IpamDriver.ReleaseAddress: 409 Conflict: {"Error":"Skip zombie address=10.81.128.110 in pool=kontena that still responds to ping"}
W, [2017-03-29T09:58:55.068776 #1] WARN -- Kontena::NetworkAdapters::Weave: Failed to release container=d2e48c1c2a8aa84c9abb934d4155c3019b8e81946a455b5d5ed8021460b6853d from network=kontena at cidr=10.81.128.110/16: Skip zombie address=10.81.128.110 in pool=kontena that still responds to ping
The warning message is slightly repetitive now, but that's okay.
W, [2017-03-29T09:59:23.167807 #1] WARN -- Addresses::Cleanup: Skip zombie address=10.81.128.110 in pool=kontena that still responds to ping
* bump ipam to 0.2.2 * handle 409 zombie reponses in IpamClient * let IpamClient raise error in case of 409 and weave actor rescue it in release
* Bump IPAM to version 0.2.2 (#2030) * bump ipam to 0.2.2 * handle 409 zombie reponses in IpamClient * let IpamClient raise error in case of 409 and weave actor rescue it in release * Bump to 1.1.6 * Fix header
Ipam 0.2.2 introduced mitigation to zombie netns issues by not actually releasing addresses that still respond to ping. This PR bumps agent to take ipam 0.2.2 into use.