Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publish.publish flaky-ness issues #4993

Closed
auser opened this issue May 12, 2013 · 10 comments
Closed

publish.publish flaky-ness issues #4993

auser opened this issue May 12, 2013 · 10 comments
Milestone

Comments

@auser
Copy link
Contributor

auser commented May 12, 2013

I'm using a custom module (here before) that calls publish.publish. In watching the output of the module, the output of what it gathers differs in the same highstate run is different.

"""
This enables us to call the minions and search for a specific role
"""

import logging
import salt.utils

log = logging.getLogger(__name__)

def get_roles(role, *args, **kwargs):
    """
    Send the informer.is_role command to all minions
    """
    ret = []
    nodes = __salt__['publish.publish']('*', 'grains.item', 'roles')
    print "-------------------------------> NODES {0}".format(nodes)
    for name, found_roles in nodes.items():
      if role in found_roles.get('roles'):
        ret.append(name)

    return ret

def all_by_roles(*args, **kwards):
  """
  Get all the hosts by their roles
  """
  ret = {}
  nodes = __salt__['publish.publish']('*', 'grains.item', 'roles')
  print "-------------------------------> NODES {0}".format(nodes)
  for name, found_role in nodes.items():
    roles = found_role.get('roles', [name])
    log.info("found {0} {1}".format(name, roles))
    node = __salt__['publish.publish'](name, *args, **kwards)
    if node:
      for _, value in node.items():
        ip = value.pop()
        if 'saltmaster' in roles:
          log.info("Informer found: [saltmaster] = {0}".format(ip))
          ret['saltmaster'] = ip
        else:
          for role in roles:
            log.info("Informer found: [{0}] = {1}".format(role, ip))
            ret[name] = ip

  return ret
@thatch45
Copy link
Member

The solution here is to add a routine that gathers the data about the peer run from the job cache before returning to ensure that any missed returns are caught

@auser
Copy link
Contributor Author

auser commented May 12, 2013

Ooooo tell me more (docs page?)

@thatch45
Copy link
Member

heh, more like, take a look at the minion_publish method in master.py :)

@auser
Copy link
Contributor Author

auser commented May 13, 2013

@thatch45 are you saying to set the data on the minion itself? Perhaps I'm not understanding...

@thatch45
Copy link
Member

No, the data needs to be checked from the cache on the master before the minion_publish method returns. So we need to add a method to the local client that does this and then call it from the minion_publish method. This is entirely master side

@auser
Copy link
Contributor Author

auser commented May 21, 2013

@thatch45 any progress on this front?

@auser
Copy link
Contributor Author

auser commented May 21, 2013

I'm seeing more flaky-ness still in the publish.publish. I've added checking of a local cache using the data module, but publish.publish sometimes returns the proper data... What I mean is:

salt \* informer.all_by_roles # => {'saltmaster': {'roles': ['master']}, 'hadoop3': {'roles': ['hadoop_slave']}, 'hadoop1': {'roles': ['hadoop_master']}}
salt \* informer.all_by_roles # => {'saltmaster': {'roles': ['master']}, 'hadoop3': {'roles': ['hadoop_slave']}, 'hadoop2': {'roles': ['hadoop_slave']},'hadoop1': {'roles': ['hadoop_master']}}

Those commands were run one right after another. This cannot be expected as I cannot build a nondeterministic infrastructure... help?

Updated informer.py:

    """
    This enables us to call the minions and search for a specific role
    """

    import logging

    # Import salt libs
    import salt.utils
    import salt.payload

    log = logging.getLogger(__name__)

    def load_cache():
      """Load the cache"""
      return __salt__['data.load']()

    def save_cache(data):
      """Save the data"""
      return __salt__['data.dump'](data)

    def clear_cache():
      """Clear the cache"""
      __salt__['data.clear']()

    def get_roles(role, *args, **kwargs):
        """
        Send the informer.is_role command to all minions
        """
        ret = []
        cache = load_cache()
        key = "roles_{0}".format(role)
        if key in cache.keys():
          print "--------------------------> CACHE: {0}".format(cache[key])
          return cache[key]
        nodes = __salt__['publish.publish']('*', 'grains.item', 'roles')
        print "-------------------------------> NODES {0}".format(nodes)
        for name, found_roles in nodes.items():
          if role in found_roles.get('roles'):
            ret.append(name)

        __salt__['data.update']('roles_{0}'.format(role), ret)
        return ret

    def all_by_roles(*args, **kwards):
      """
      Get all the hosts by their roles
      """
      ret = {}
      cache = load_cache()
      key = "all_by_roles"
      if key in cache.keys():
        print "-------------------> CACHE {0}".format(cache[key])
        return cache[key]
      nodes = __salt__['publish.publish']('*', 'grains.item', 'roles')
      print "-------------------------------> NODES {0}".format(nodes)
      for name, found_role in nodes.items():
        roles = found_role.get('roles', [name])
        log.info("found {0} {1}".format(name, roles))
        node = __salt__['publish.publish'](name, *args, **kwards)
        if node:
          for _, value in node.items():
            print "------> {0}".format(value)
            ip = value.pop()
            if 'saltmaster' in roles:
              log.info("Informer found: [saltmaster] = {0}".format(ip))
              ret['saltmaster'] = ip
            else:
              for role in roles:
                log.info("Informer found: [{0}] = {1}".format(role, ip))
                ret[name] = ip
      cache[key] = ret
      save_cache(cache)
      return ret

@QuinnyPig
Copy link
Contributor

+1, this one caught me as well, made publish.publish unsuitable for a project.

@UtahDave
Copy link
Contributor

@auser and @KB1JWQ , Tom pushed a fix for this last night to the develop branch. Can you guys test against develop?

@thatch45
Copy link
Member

duplicate of #4402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants