Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presence detection for minions behind NAT not working #51764

Open
brejoc opened this issue Feb 22, 2019 · 21 comments
Open

Presence detection for minions behind NAT not working #51764

brejoc opened this issue Feb 22, 2019 · 21 comments
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE Core relates to code central or existential to Salt Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@brejoc
Copy link
Contributor

brejoc commented Feb 22, 2019

Description of Issue/Question

Presence detection for minions behind a NAT is not working. In minions.py we fetch all of the IPv4 addresses of the minions. For minion1 this will be the IP address the master can see. But for minion2 this will be the address the minion gets behind the NAT. Then those addresses are compared to the addresses the master can see from the minions, which will be the address of the NAT for minion2.

I'm not sure how to address this. Any hints welcome!

Setup

minion1 is directly connected to the master. minion2 is behind a NAT.

Steps to Reproduce Issue

Let both minions connect and fire a salt-run manage.alived.

Then you will see this:

$ salt-run manage.alived
- minion1

While this was expected, since both minions are connected:

$ salt-run manage.alived
- minion1
- minion2

Versions Report

Salt Version:
           Salt: 2018.3.0
 
Dependency Versions:
           cffi: 1.5.2
       cherrypy: 3.6.0
       dateutil: 2.4.2
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.29.0
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.13 (default, Jan 11 2017, 10:56:06) [GCC]
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 14.0.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.4
 
System Versions:
           dist: SuSE 12 x86_64
         locale: UTF-8
        machine: x86_64
        release: 4.4.73-5-default
         system: Linux
        version: SUSE Linux Enterprise Server  12 x86_64

But since this was introduced in 2015.8.0, this should be faulty all the way down.

@garethgreenaway garethgreenaway added this to the Blocked milestone Feb 22, 2019
@garethgreenaway garethgreenaway added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Feb 22, 2019
@garethgreenaway
Copy link
Contributor

@saltstack/team-core Any thoughts on this one?

@waynew
Copy link
Contributor

waynew commented Apr 10, 2019

Should manage.alive(d?) not just operate roughly like test.version (or test.ping) where it asks the minion to report back that it's alive?

@brejoc
Copy link
Contributor Author

brejoc commented Apr 11, 2019

@waynew No, alived is different in that regard. A test.ping puts some amount of pressure on the system/network if the installation is really big. So having something that works in a passive way (which alived does) is preferred in this case.

@waynew
Copy link
Contributor

waynew commented Apr 15, 2019

Ah, I see. So it sounds like the problem is really just that the master is missing some bit of information - perhaps there should be another grain set from the minion corresponding to the visible IP from the master?

I'm fairly new to that corner of the world, but given what I just read about alived:

Print a list of all minions that are up according to Salt's presence detection (no commands will be sent to minions)

That, combined with the source that you linked to, leads me to believe that there's a grain that should be there that isn't.

Of course we may not have thought about such a layout when we were building that feature, so we may never have written code to support that. But we should 😀

@DmitryKuzmenko
Copy link
Contributor

@brejoc if you'll append ipv4 grain with the external minion ip address the alived shall work. Is it possible for you to do salt minion2 grains.append ipv4 1.2.3.4 where 1.2.3.4 is the external address of your minion and re-run the manage.alived runner?
I'm not saying it's designed to be used in this way but by the code lookup I bet it will work.

@brejoc
Copy link
Contributor Author

brejoc commented Apr 18, 2019

@DmitryKuzmenko This workaround should indeed work, but might also cause additional problems when relying on that grain for other purposes.

@brejoc
Copy link
Contributor Author

brejoc commented Jan 8, 2020

I just did a test-run with two minions behind the same NAT. Both got the IPv4 address of the NAT as the ipv4 grain. As expected both of them are in the list wen salt-run manage.alived is executed. Even if one of them is turned off. So this is not a viable work-around.

@stale
Copy link

stale bot commented Feb 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Feb 7, 2020
@sagetherage
Copy link
Contributor

@waynew or @DmitryKuzmenko can one of you please do a bit of follow up, here?

@stale
Copy link

stale bot commented Feb 10, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale
Copy link

stale bot commented Mar 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Mar 11, 2020
@brejoc
Copy link
Contributor Author

brejoc commented Mar 12, 2020

Bad bot!

@stale
Copy link

stale bot commented Mar 12, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Mar 12, 2020
@DmitryKuzmenko DmitryKuzmenko added Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P2 Priority 2 P4 Priority 4 Core relates to code central or existential to Salt and removed P2 Priority 2 labels Apr 6, 2020
@DmitryKuzmenko
Copy link
Contributor

DmitryKuzmenko commented Apr 6, 2020

Okay, how it works now. Master takes 2 lists:

  1. The list of cached minion data where master takes minions ids and minions ip addresses known by minions from grains.
  2. The list of IPs connected to the master interface:port.

Master returns intersection of these sets. By obvious reason there are no minions behind NAT.

I have no quick ideas for the solution. I'll think about it. Sorry for missing this issue.
Maybe someone else from @saltstack/team-core has any idea?

@sagetherage sagetherage removed the P4 Priority 4 label Jun 3, 2020
@brejoc
Copy link
Contributor Author

brejoc commented Jul 1, 2020

I think the presence detection can't be done solely via checking the connections. Would we get the necessary information for the zeromq connections somehow?

@brejoc
Copy link
Contributor Author

brejoc commented Jul 6, 2020

@DmitryKuzmenko Do you maybe have an idea if this could be done at the application level without putting additional pressure on the network? The TCP connections just can't provide the information we need on this.

@DmitryKuzmenko
Copy link
Contributor

If I remember correctly zeromq doesn't provide any useful information about the opposite side of the connection ever in request-reply sockets. You can get the source IP from TCP socket but not from ZeroMQ.
What about minions it looks we need to add some logic to support minion presence behind NAT but I still have no good idea about this right now.
Keepalives from minions? This would flood the traffic in big configurations. Other ideas?

@afletch
Copy link
Contributor

afletch commented Sep 17, 2021

Also seeing this as an issue. All minions behind NAT are not shown as present.
Is this issue still getting attention @sagetherage ?

@afletch
Copy link
Contributor

afletch commented Oct 7, 2021

Having given various ideas some thought here, I think the minion data cache on the master needs to be populated with the address from which the minion is connecting and the src port. This can then be compared in the presence detection (which already uses the data cache) to determine if the minion is alive over NAT.

@afletch
Copy link
Contributor

afletch commented Dec 13, 2023

This is still an issue; minions behind NAT are not considered to be alive/joined according to salt's presence-detection system, can @sagetherage @waynew please offer some input as to how it can be moved forward?

@florianendrich
Copy link

Any news here? This would really help to setup other services which need the ip address the minions are connecting from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE Core relates to code central or existential to Salt Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

7 participants