Description
Description of Issue/Question
The connection of the salt-masters that run behind an AWS ELB to the salt-minions is flaky. Sometimes they work, most times they don't.
I would like to know if there is some flaw in my setup that I am not seeing, or if Salt only works with an HA Proxy as an ELB?
Or maybe Salt doesn't work at all behind an ELB?
Setup
I am running the following setup at AWS:
- Elastic Loadbalancer in front of two EC2 machines (Amazon Linux) with a docker container that the salt-master runs in
- Two EC2 instances with salt-minions installed
- The 'master' value in the minion config is set to the dns of the loadbalancer (SaltMaster-env-vpc-test.szfegmankg.us-east-1.elasticbeanstalk.com)
- The ELB accepts all traffic from the minions
- The Salt-masters accept all traffic from the ELB as well as from the minions
- The Salt-masters PKI Folder is shared between the two masters
- The Salt-masters have the same private+public keys
- The Salt-masters run on 2017.7.1
- The Salt-minions run on 2016.11.5 (I tried it with 2017.7.1, but got the same results)
- The Salt-minions accept all traffic from the ELB as well as from the masters
- The master config looks as follows:
worker_threads: 20
auto_accept: True
log_level: error
log_level_logfile: debug
extension_modules: srv/salt/ext
rest_cherrypy:
port: 8000
disable_ssl: True
debug: True
external_auth:
pam:
saltdev:
- .*
- '@runner'
# Setting the job_cache to redis.
# The redis config settings are generated at the start of the docker container and
# will be written into /etc/salt/master.d/redis.conf
master_job_cache: redis
cache: redis
pki_dir: /etc/salt/pki/master/efs
The minion config looks as follows:
id: WIN-AB3GO7BJ72I
log_file: C:\salt.log
multiprocessing: False
log_level_logfile: debug
pki_dir: /conf/pki/minion
master: SaltMaster-env-vpc-test.szfegmankg.us-east-1.elasticbeanstalk.com
master_type: str
master_alive_interval: 30
open_mode: True
root_dir: c:\salt
ipc_mode: tcp
recon_default: 1000
recon_max: 199000
recon_randomize: True
In the master log files, I can see on both masters:
2017-09-05 10:06:18,118 [salt.utils.verify][DEBUG ][35] This salt-master instance has accepted 2 minion keys.
A salt-key -L on both masters yield the same result:
Accepted Keys:
WIN-AB3GO7BJ72I
WIN-EDMP9VB716B
Denied Keys:
Unaccepted Keys:
Rejected Keys:
So it looks like all is fine and everything should work. However, a test.ping is extremely flaky. Sometimes it works, but most of the time it doesnt. Most of the time neither master gets any return from the minion and on the minion side I can see in the log that the minion never receives the message to execute 'test.ping' from the master.
Example 1:
test.ping from Master1:
root@d7383ff8f8bf:/# salt 'WIN-EDMP9VB716B' test.ping
[ERROR ] Exception raised when processing __virtual__ function for salt.loaded.int.cache.consul. Module will not be loaded: 'module' object has no attribute 'Consul'
[ERROR ] An un-handled exception was caught by salt's global exception handler:
KeyError: 'redis.ls'
Traceback (most recent call last):
File "/usr/bin/salt", line 10, in <module>
salt_main()
File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 476, in salt_main
client.run()
File "/usr/lib/python2.7/dist-packages/salt/cli/salt.py", line 173, in run
for full_ret in cmd_func(**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 805, in cmd_cli
**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 1597, in get_cli_event_returns
connected_minions = salt.utils.minions.CkMinions(self.opts).connected_ids()
File "/usr/lib/python2.7/dist-packages/salt/utils/minions.py", line 577, in connected_ids
search = self.cache.ls('minions')
File "/usr/lib/python2.7/dist-packages/salt/cache/__init__.py", line 244, in ls
return self.modules[fun](bank, **self._kwargs)
File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1113, in __getitem__
func = super(LazyLoader, self).__getitem__(item)
File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 101, in __getitem__
raise KeyError(key)
KeyError: 'redis.ls'
Traceback (most recent call last):
File "/usr/bin/salt", line 10, in <module>
salt_main()
File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 476, in salt_main
client.run()
File "/usr/lib/python2.7/dist-packages/salt/cli/salt.py", line 173, in run
for full_ret in cmd_func(**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 805, in cmd_cli
**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 1597, in get_cli_event_returns
connected_minions = salt.utils.minions.CkMinions(self.opts).connected_ids()
File "/usr/lib/python2.7/dist-packages/salt/utils/minions.py", line 577, in connected_ids
search = self.cache.ls('minions')
File "/usr/lib/python2.7/dist-packages/salt/cache/__init__.py", line 244, in ls
return self.modules[fun](bank, **self._kwargs)
File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1113, in __getitem__
func = super(LazyLoader, self).__getitem__(item)
File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 101, in __getitem__
raise KeyError(key)
KeyError: 'redis.ls'
I am aware that the redis error will be fixed soon #43295
Example 2:
test.ping from Master1, ~ 1 Minute after Example 1:
root@d7383ff8f8bf:/# salt 'WIN-EDMP9VB716B' test.ping
WIN-EDMP9VB716B:
True
Also during my tests, a test.ping from Master2 never succeeded.
Steps to Reproduce Issue
- Create an internal ELB at AWS
- Launch two salt-masters and attach them to the ELB
- Launch two salt-minion machines and configure the ELB dns name as the master
- Create a Redis cluster and configure the salt-masters job cache and minion data cache to use redis
- Create an EFS volume and mount it on both salt-masters
- Configure the PKI folder to be put on the mounted EFS volume
- Ensure that the network communication between the minions, the ELB and the masters is OK
- Try to send test.ping commands from each of the salt-masters
Versions Report
Salt Version:
Salt: 2017.7.1
Dependency Versions:
cffi: Not Installed
cherrypy: unknown
dateutil: 2.4.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.8
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.12 (default, Nov 19 2016, 06:48:10)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.2.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: Ubuntu 16.04 xenial
locale: ANSI_X3.4-1968
machine: x86_64
release: 4.9.43-17.38.amzn1.x86_64
system: Linux
version: Ubuntu 16.04 xenial