Streams disconnected in network-intense Ansible task #593
We had a specific Ansible task failed due to stream disconnected.
We faced this on our test environment with 19 hosts during the task where all the hosts pull about 1GB docker image from the registry running in the 1st host. This makes outgoing traffic from 1st host almost maxed out for its 1Gbps NIC for about 70 secs. In each attempt a few out of 19 hosts failed, and the failing hosts changed over the attempts.
Debug messages around the error is below (IPs were replaced):
One suspicious thing was this error came along with the message
We tried adding
Some more background info:
We really liked speed up with Mitogen in our tests in cloud and hope to get this solved so we can use it onprem. Any suggestion for configs or things to look at would be appreciated.
The text was updated successfully, but these errors were encountered:
This is an excellent bug report, thanks so much! Indeed, Mitogen by default configures SSH for 15 second heartbeats and times out when 3 have not been received. This setting has never been exposed to Ansible, and the default may be far too aggressive in any case.
For now, consider editing 'mitogen/ssh.py' and setting
I will add new variables to control these values, and perhaps bump the default up to something more reasonable, like 5 minutes.
Thanks again :)
This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/
Thanks for reporting this!