-
Notifications
You must be signed in to change notification settings - Fork 199
Ansible Copy module stops (no subsequent task) #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried the obvious patch, attached here, but that doesn't work. It removes the traceback, but ansible still stops dead.
|
Attaching threadstacks. Keeps looping with "no change since last time" |
When this happens, there are no ssh processes running any longer, but ansible still has 300+ open fd's to various sockets and pipes. |
This one looks exciting! threadstacks.log shows worker-0, 1, 2, 3 and 4 in the process of connection attempts, but you say there are no SSH processes? Regarding the "if/when SSH disconnects", is this usual for your environment? How many hosts were your targetting in this run before the hang? Thanks for reporting! |
This may at least in part be a manifestation of the missing downstream propagation from https://github.com/dw/mitogen/issues/76 |
Am I right in saying the inline tracebacks are from Git, but the threadstacks.log is from 0.2.2? |
All the logs are from git, but they may have been from me patching it, so I'll try and redo them. |
I was targeting about 150 hosts. It is normal for a small percentage of ssh connections to disconnect. I've so far found the following reasons for that:
|
Including a threadstack that's confirmed from git master (18685da) |
This sounds like you're hitting the same bug as #352, in that one an EC2 instance went down for retirement and the code didn't notice. I hope to have this fixed in 0.2.3, but if not 0.2.3, an 0.2.4 that'll follow shortly afterwards, as 0.2.3 is huge already :) |
Sound like it's the same problem to me too: some connection errors aren't processed correctly. Looking forward to testing 0.2.3 and 0.2.4. |
If you're looking for an easy way to test: in ansible.cfg, add
and in /path/to/timessh deliberately stop ssh:
then any task in a playbook that takes longer than 3 seconds will trigger the bug. |
Hi Berend, Sorry for such a huge delay on this one -- I forgot there were related bugs. Your attached test case is indeed a reproducer for the class of issues resolved in #76. This is now on master. Thanks so much for reporting this, and most of all, thanks a ton for a reproducer that made the work equivalent to another ticket. :) This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/ Thanks for reporting this! |
…_For_TT73_TT74 11465 automarking for tt73 tt74
Ansible "hangs" -- waits forever to move to the next task. This seems to always happen on a copy task if/when an ssh process closes unexpectedly. The remaining hosts continue the copy task, but no subsequent task starts. The playbook stops (doesn't finish)
There's a traceback in mitogen:
and then later Ansible notices it:
The text was updated successfully, but these errors were encountered: