New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
publish.publish flaky-ness issues #4993
Comments
The solution here is to add a routine that gathers the data about the peer run from the job cache before returning to ensure that any missed returns are caught |
Ooooo tell me more (docs page?) |
heh, more like, take a look at the minion_publish method in master.py :) |
@thatch45 are you saying to set the data on the minion itself? Perhaps I'm not understanding... |
No, the data needs to be checked from the cache on the master before the minion_publish method returns. So we need to add a method to the local client that does this and then call it from the minion_publish method. This is entirely master side |
@thatch45 any progress on this front? |
I'm seeing more flaky-ness still in the publish.publish. I've added checking of a local cache using the data module, but publish.publish sometimes returns the proper data... What I mean is:
Those commands were run one right after another. This cannot be expected as I cannot build a nondeterministic infrastructure... help? Updated informer.py: """
This enables us to call the minions and search for a specific role
"""
import logging
# Import salt libs
import salt.utils
import salt.payload
log = logging.getLogger(__name__)
def load_cache():
"""Load the cache"""
return __salt__['data.load']()
def save_cache(data):
"""Save the data"""
return __salt__['data.dump'](data)
def clear_cache():
"""Clear the cache"""
__salt__['data.clear']()
def get_roles(role, *args, **kwargs):
"""
Send the informer.is_role command to all minions
"""
ret = []
cache = load_cache()
key = "roles_{0}".format(role)
if key in cache.keys():
print "--------------------------> CACHE: {0}".format(cache[key])
return cache[key]
nodes = __salt__['publish.publish']('*', 'grains.item', 'roles')
print "-------------------------------> NODES {0}".format(nodes)
for name, found_roles in nodes.items():
if role in found_roles.get('roles'):
ret.append(name)
__salt__['data.update']('roles_{0}'.format(role), ret)
return ret
def all_by_roles(*args, **kwards):
"""
Get all the hosts by their roles
"""
ret = {}
cache = load_cache()
key = "all_by_roles"
if key in cache.keys():
print "-------------------> CACHE {0}".format(cache[key])
return cache[key]
nodes = __salt__['publish.publish']('*', 'grains.item', 'roles')
print "-------------------------------> NODES {0}".format(nodes)
for name, found_role in nodes.items():
roles = found_role.get('roles', [name])
log.info("found {0} {1}".format(name, roles))
node = __salt__['publish.publish'](name, *args, **kwards)
if node:
for _, value in node.items():
print "------> {0}".format(value)
ip = value.pop()
if 'saltmaster' in roles:
log.info("Informer found: [saltmaster] = {0}".format(ip))
ret['saltmaster'] = ip
else:
for role in roles:
log.info("Informer found: [{0}] = {1}".format(role, ip))
ret[name] = ip
cache[key] = ret
save_cache(cache)
return ret |
+1, this one caught me as well, made publish.publish unsuitable for a project. |
@auser and @KB1JWQ , Tom pushed a fix for this last night to the develop branch. Can you guys test against develop? |
duplicate of #4402 |
I'm using a custom module (here before) that calls
publish.publish
. In watching the output of the module, the output of what it gathers differs in the same highstate run is different.The text was updated successfully, but these errors were encountered: