New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ansible-mitogen] 'delegate_to' logic #340
Comments
Hi, thanks for reporting! Is this using Connection Delegation? (mitogen_via=...). If so, it might be another instance of #251. I'm not looking at bugs for the past week as I'm working hard on next development branch :) But will hopefully get to this on Saturday or Sunday. Sorry for the delay |
Yes, could be another instance of #251. I'm using Connection Delegation twice: (E is ProxyJump in .ssh/config for H, but that's not relevant here) |
Just to note I've reproduced this locally, the relevant code looks entirely wrong. Hopefully will have a fix for this before end of the weekend. |
Hello @dw,
|
Hi Florent! Thank you for persisting with this :) Can you please share:
One possibility is that it's spinning up so many interpreters that the target machine starts IO thrashing and slows way down. Another is that because delegation is single-threaded, simply targetting many, many containers (possibly in conjunction with a slow machine) means one tasks begin timing out waiting for the single thread to become available. I can arrange to fix this as part of 0.2, it's a relatively simple change. A final possibility is some odd race condition, but usually they do not manifest as simple timeouts. Thanks again! |
Ah! One final data point would be useful -- does the timeout appear to occur after DEFAULT_TIMEOUT seconds have elapsed? If it is earlier, that would be a strong indicator of some bug :) |
When inventory name did not match remote_addr, it would attempt to SSH to the inventory name.
PlayContext.delegate_to is the unexpanded template, Ansible doesn't keep a copy of it around anywhere convenient. We either need to re-expand it or take the expanded version that was stored on the Task, which is what is done here.
Hello David, Anyway, here we go:
When the issue appeared in case 2, the "timed out" instances of the task were not run on E. It was not the Anyway, more details ASAP when/if I can reproduce it. |
Hello again David,
I think we can close #340, as it is initially related to delegation, which seems to work now. It would be good if someone else could confirm that async delegated tasks are working fine on their side. |
Hi David, sorry, I forgot to test and report back: so I updated mitogen to master version (v2.5 + a couple of commits including fix for #548) and updated ansible to 2.7.5. As a side note, I still have timeout issues (not 100% reproducible) with Thank you for your help. |
Hi, let's start by saying this project is quite impressive (goals, design, initial documentation). I spent half a day reading, understanding (or trying to) and trying to get it working as an alternative to the sshjail plugin (great either, in other aspects).
Anyway, I got pretty much what I wanted: H is a freebsd host with jails, like the J jail. As I use sudo, I had to use mitogen_sudo, which was kind of a travel :). Anyway, as I said, my existing playbooks workds on J and its siblings. Except one: at some point, I run a command on an external host E (which is OK when tested with ansible ping module) that renew many TLS certificates in one run, thus I us the ansible delegate_to logic to achive that.
Let's see:
And that's where the logic is strange to me, as I look at the error I get:
As I understand, H (hosting J) is trying to connect via SSH to E, when that should be the ansible control machine that should do it. OK, may that be one case where mitogen differs from ansible vanilla behaviour? I've defined a simple "tree" of connections (meaning my control machine can reach them all), and not a complete P2P web allowing any connection from any point. Is it fundamentally condemn to fail, from the mitogen point of view? Because I don't plan to allow J or H to reach E anytime soon (different security networks).
Versions information:
Meanwhile, I'll continue to dig.
The text was updated successfully, but these errors were encountered: