Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can test.ping via salt-call from the minion, but can't ping same minion from master #27237

Closed
scoates opened this issue Sep 18, 2015 · 6 comments

Comments

@scoates
Copy link
Contributor

scoates commented Sep 18, 2015

I'm having a strange problem with many of my minions right now.
It's difficult to search for, so please feel free to link this issue to another one if it truly overlaps with something that already exists.

The short version is that on a minion, I can salt-call test.ping and get True, but if I salt minionname test.ping from the master, I get Minion did not return. [Not connected]. salt-run manage.up also does not show this minion.

Digging in a bit, if I restart the minion (luckily (?), I have several in this state), I can then ping from either direction. At least for a while.

The minion is on 2015.5.2, and the master on a dev version of 2015.8 (2015.8.0rc3-165-ge3aec14) ; if there's evidence that this was fixed after these versions, I'll accept that as "you need to update", but I'd like to avoid this at this time if it's not very likely to be the problem.

For more detail: I am indeed running ZMQ 4.0.5 on both ends.

Additionally, if I watch the connection with tcpdump, when initiating from the master, I don't see any traffic, but when I initiate from the minion, there's session traffic.

Nothing obvious in the minion or master logs.

@scoates
Copy link
Contributor Author

scoates commented Sep 18, 2015

One additional detail that might help (or might not, but I figure it's worth including): my minions do this in cron:

*/5 * * * * salt-call test.ping > /dev/null || systemctl restart salt-minion

This was added to try to make our (volatile; they're on laptops that move between networks a lot) minions reconnect quickly.

@scoates
Copy link
Contributor Author

scoates commented Sep 18, 2015

I upgraded one of the failing minions to 2015.8.0-39-g253ac5e and waited an hour.
Same behavior: minion can salt-call test.ping but master can't salt minionname test.ping.

@scoates
Copy link
Contributor Author

scoates commented Sep 18, 2015

Also of note, the salt minion node shows this connection:

salt-mini 5234          root   26u  IPv4  25150      0t0  TCP 10.0.2.15:36435->REDACTED.compute-1.amazonaws.com:4505 (ESTABLISHED)

But nothing for :4506 (we actually run on different ports, but these are the equivalents).

@scoates
Copy link
Contributor Author

scoates commented Sep 18, 2015

More new info:

# salt-run jobs.list_jobs | grep test.ping | wc -l
1609

So, this makes some sense: the minion runs the job without the queue, and the master enqueues a job (that it can never run, somehow).

However, I don't know what's filling the queue (it does grow over time), nor why it's not being emptied.

jobs.active takes ~35 seconds and returns nothing.

The jobs returned by jobs.list_jobs have a start time, but no end time. e.g.:

20150918191004093172:
    ----------
    Arguments:
    Function:
        test.ping
    StartTime:
        2015, Sep 18 19:10:04.093172
    Target:
        sean.gateway.fkvpn.net
    Target-type:
        glob
    User:
        root

@scoates
Copy link
Contributor Author

scoates commented Sep 18, 2015

Digging even deeper…

This minion node is a virtual machine. I decided to check tcpdump on my workstation (the VM's host) to see if it was seeing the salt master's message. It seemed silent at first and then crashed tcpdump:

tcpdump: pktap_filter_packet: pcap_add_if_info(vboxnet1, 1) failed: pcap_add_if_info: pcap_compile_nopcap() failed

I can get this to happen fairly reliably.
Switching my blame from Salt to VirtualBox. No idea how to write a bug for them, though.

Sorry for the noise. Thanks to anyone who looked at this.
Also, hopefully this helps someone who runs into the same situation.

(Closing.)

@scoates scoates closed this as completed Sep 18, 2015
@sonoracle
Copy link

Hi I am Facing The Same Problem can you please explain the Steps you followed resolved this issue.

Thanks in Advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants