Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt schedule doesn't return jobs result info to master #12653

Closed
pengyao opened this issue May 9, 2014 · 19 comments
Closed

salt schedule doesn't return jobs result info to master #12653

pengyao opened this issue May 9, 2014 · 19 comments
Labels
Feature new functionality including changes to functionality and code refactors, etc.
Milestone

Comments

@pengyao
Copy link
Contributor

pengyao commented May 9, 2014

salt schedule doesn't return jobs result info to master, and don't fire event to event bus. It's inconvenient for debugging or checking the schedule functions status.

@basepi
Copy link
Contributor

basepi commented May 12, 2014

That seems very strange to me. It should definitely be returning the results to the master, so that you can check the results in the job cache. What version of salt are you using?

@basepi
Copy link
Contributor

basepi commented May 12, 2014

Also, how are you verifying that the events are not firing, and have you checked the job cache for the results in question?

@basepi basepi added this to the Approved milestone May 12, 2014
@pengyao
Copy link
Contributor Author

pengyao commented May 13, 2014

salt version:

salt-minion-01.example.com:
  saltversion: 2014.1.3

schedule

salt-minion-01.example.com:
    ----------
    __mine_interval:
        ----------
        function:
            mine.update
        jid_include:
            True
        maxrunning:
            2
        minutes:
            60
    test-ping:
        ----------
        function:
            test.ping
        jid_include:
            True
        maxrunning:
            1
        seconds:
            30

minion log about schedule (in debug level)

2014-05-13 07:39:18,710 [salt.utils.schedule                         ][DEBUG   ] schedule.handle_func: adding this job to the jobcache with data {'fun': 'test.ping', 'jid': '20140513073918695801', 'pid': 2697, 'id': 'salt-minion-01.example.com'}

check the job on master

salt-run jobs.lookup_jid 20140513073918695801

there is nothing

@basepi
Copy link
Contributor

basepi commented May 13, 2014

I can reproduce this. After asking around, this was just never designed to return to the master, so I'm actually going to mark this as a feature request. Right now it's only adding it to the /var/cache/salt/minion/proc job cache. We should add a scheduler argument for returning to the master.

@basepi basepi added Feature and removed Bug labels May 13, 2014
@stbenjam
Copy link

Having an option for this would be really useful for the http://github.com/theforeman/foreman_salt plugin!

For now, we'll be loading state.highstate results via the job cache on the master -- which means, only those launched from the master get imported. Returners might be a better option, but that would require things like client X.509 certificates on the minions to hook into Foreman's infrastructure directly.

@pruiz
Copy link
Contributor

pruiz commented May 10, 2015

As an alternate workaround I am using returners as a way to send schedule's results back to master via events.. Something like the following may work for you too:

'''
Return data to master as an event.

'''

# Import python libs
import errno
import logging
import os
import shutil
import datetime
import hashlib

# Import salt libs
import salt.payload
import salt.utils

log = logging.getLogger(__name__)

# cache of the master mininon for this returner
MMINION = None

def _mminion():
    '''
    Create a single mminion for this module to use, instead of reloading all the time
    '''
    global MMINION

    if MMINION is None:
        MMINION = salt.minion.MasterMinion(__opts__)

    return MMINION

def convert(ret):
    '''
    Converts the return to a format
    '''
    return ret

def returner(ret):
    '''
    Return data to the master as an event
    '''
    _mminion().functions['event.send']('sample-event', ret)

@arthurzenika
Copy link
Contributor

Would be interested in this too. Am looking into using the custom returner posted by @pruiz

@enblde
Copy link

enblde commented Jan 18, 2016

+1 My use case is: schedule state.highstate test=True on all minions once every hour and set up a reactor listening on job finish events and passing the event data to a custom email runner which will notify me about all pending highstate changes. Without the scheduler job events this pipeline doesn't work.

Edit: I'm willing to contribute, but would need some beginner mentoring from more experienced person.

@arthurzenika
Copy link
Contributor

@enblde that's a great scenario and I think a lot of people can benefit from such a workflow. We're using the master returner above which seems to work quite well for 2015.8 but not so well for 2015.5 (so wheezy machines still there...). Haven't found the time to debug yet.

@scubahub
Copy link

scubahub commented Sep 9, 2016

+1 I am also using a custom returner, custom event, reactor, runner so I can alert when scheduled jobs fail. Just FYI for others doing this, there are several related issues that you have to deal with (at least on 2015.08):

#35812
#36114
#36187

@scubahub
Copy link

scubahub commented Sep 9, 2016

For everyone else wanting this I thought I should post my full solution for sending alerts via slack when a scheduled job fails....

In the schedule pillar I use a custom returner:

  run-ops-state-every-quarter-hour-before-midnight-pst:
    function: state.sls
    run_on_start: False
    args:
      - common.ops
    kwargs:
      saltenv: base
      test: False
    splay: 600
    seconds: 900 {#- 15 minutes #}
    range:
      start: 9:00pm
      end: 11:59pm
    returner: schedule-returner
    maxrunning: 3

In the state directory (listed in file roots) I have a "_returners" directory with schedule-returner.py (thanks @pruiz):

'''
Jobs run by the salt scheduler do not return events to the master (https://github.com/saltstack/salt/issues/12653).  This custom
returner creates a custom event "scheduled-job-return" with key data about the job and sends it to the master so it can be
handled by a reactor.

A reactor listening to the event will receive the following in the data object:

  data:
    stamp: time stamp (when returner was invoked)
    jid: the job id for the schedule job
    job_name: the name of the schedule job
    id: the minion id
    success: True if job executed successfully
    grains: Note: grains items only appear if the grain exists on the minion
      teams: list of teams
      servicerole: service role
      flavor: flavor
      cluster: cluster name
      environment: salt environment
      owner: minion owner
      pod: minion pod
      alert_channel: name (or list of names) of slack channel to notify on scheduled job failures
'''

# Import python libs
import logging
from datetime import datetime

# Import salt libs
import salt.payload
import salt.utils

log = logging.getLogger(__name__)

# cache of the master mininon for this returner
MMINION = None

def _mminion():
  '''
  Create a single master minion for this module to use, instead of reloading all the time.
  '''
  global MMINION

  if MMINION is None:
    MMINION = salt.minion.MasterMinion(__opts__)

  return MMINION

def returner(ret):
  '''
  Return data to the master as an event.

  :param dict ret: Data provided to the returner.  Use debug-returner to determine what keys/values are present.
  '''
  event_data = {}
  event_data["stamp"] = str(datetime.utcnow())
  event_data["job_name"] = ret["schedule"]
  event_data["id"] = ret["id"]
  event_data["success"] = ret["success"]
  event_data["return"] = ret["return"]
  event_data["jid"]  = ret["jid"]

  _mminion().functions['event.send']('scheduled-job-return', data=event_data,
                                     with_grains=[
                                       "teams",
                                       "servicerole",
                                       "flavor",
                                       "cluster",
                                       "environment",
                                       "owner",
                                       "pod",
                                       "alert_channel",
                                     ])

It is important to note that if you have multiple salt environments, the _returners directory must exist in the file roots for every salt environment or it will not be distributed to all minions.

On the master in /etc/salt/master.d/reactors I have:

reactor:
  - 'scheduled-job-return':
    - /srv/salt/base/state/reactors/scheduled-job-return-reactor.sls

The scheduled-job-return-reactor.sls file contains:

{#-
Don't do anything complex/time consuming in a runner.  Runner module is single threaded so only one can be running at a time.
Rather, use the reactor to execute a runner which will be launched on it's own thread.

Jobs run by the salt scheduler do not return events to the master (https://github.com/saltstack/salt/issues/12653).  So we use
our custom schedule-returner to generate our custom scheduled-job-return event.  This reactor passes the event data on to our
scheduled-job-return-runner runner for processing.
#}

handle-scheduled-job-return-event:
  runner.scheduled-job-return-runner.handle:
    - {{ data }}

Finally the scheduled-job-return-runner.py file (which lives in the $EXTENSION_MODULES/runners directory, in my case /srv/salt/base/extension_modules/runners/):

'''
Runner for handling our custom scheduled-job-return event.  This runner sends alerts to the appropriate slack channels when a job
fails.  Since slack is not secure so full details of failures are written to a log file rather than being included in the alert.
In case of any failures in alerting the runner will fall back to the "#REDACTED" channel, then to writing to the runner
log, and finally writing to the master log.
'''
import salt
import time
import logging
import math
import json
import collections
import traceback

__LOG = logging.getLogger(__name__)

FALLBACK_ALERT_CHANNEL = "#REDACTED"

MODULES = None

def _modules():
  '''
  Create a single list of modules instead of loading them every time.
  '''
  global MODULES

  if MODULES is None:
    MODULES = salt.loader.minion_mods(__opts__)

  return MODULES


def _send_alert(channels, message):
  '''
  Send an alert to a list of slack channels.

  :param list of str channels: List of slack channels to send the alert to.
  :param message: The alert message.
  :return: None
  '''
  for c in channels:
    _modules()["slack.post_message"](c, message, "REDACTED:add slack bot user name", "REDACTED:slack api key for bot")


def _normalize_alert_channel(alert_channel, target_minion_id):
  '''
  The alert_channel grain must exist and can either be a string or a list of strings.  Each value must start with a "#" as
  the slack api does not appear to work with direct messages to user accounts.  If there are no valid entries then the fallback
  alert channel is used.

  :param (str or list of str) alert_channel: channel(s) to be alerted for this minion
  :param str target_minion_id: the minion
  :return: list of validated channels to be alerted for this minion
  :rtype: list of str
  '''
  nv = []
  if not alert_channel or alert_channel == "None": # Due to https://github.com/saltstack/salt/issues/36187 have to test for "None"
    alert_channel = [FALLBACK_ALERT_CHANNEL,]
    _send_alert(alert_channel, "Minion *{}* does not have the *alert_channel* grain (or it is empty).".format(target_minion_id))

  if isinstance(alert_channel, basestring):
    alert_channel = [alert_channel,]
  elif not isinstance(alert_channel, list):
    alert_channel = [FALLBACK_ALERT_CHANNEL,]
    _send_alert(alert_channel, "Minion *{}* has an invalid value type for the *alert_channel* grain (must be string or list).".format(target_minion_id))

  for c in alert_channel:
    if not c or c == "None": # Due to https://github.com/saltstack/salt/issues/36187 have to test for "None"
      _send_alert([FALLBACK_ALERT_CHANNEL,], "Minion *{}* has an invalid value in the *alert_channel* grain list ({}).".format(
          target_minion_id, c))
    elif not isinstance(c, basestring):
      _send_alert([FALLBACK_ALERT_CHANNEL,], "Minion *{}* has an invalid value type for an entry in the the *alert_channel* "
                                            "grain list (must be string or list).".format(target_minion_id))
    elif not c.startswith("#"):
      _send_alert([FALLBACK_ALERT_CHANNEL,], "Minion *{}* has an invalid value in the *alert_channel* grain list ({} does not "
                                            "start with #).".format(target_minion_id, c))
    else:
      nv.append(c)
  return nv


def handle(data, log_file = "/opt/app/var/log/salt/runner.log"):
  '''
  Generate any appropriate response to a scheduled-job-return event.

  :param dict data: Data included with the scheduled-job-return event.
  :param Optional[str] log_file: log file to use for detailed event data.  If None then nothing will be written.
  :return: True if successful.
  :rtype: bool
  '''
  minion_config = salt.config.client_config('/etc/salt/minion')
  try:
    job_success = data["success"]
    ts = data["stamp"]
    jid = data["jid"]
    job_return = data["return"]
    job_name = data["job_name"]
    target_minion_id = data["id"]
    grains = data["grains"]
    alert_channel = grains["alert_channel"] if "alert_channel" in grains else None

    alert_channels = _normalize_alert_channel(alert_channel, target_minion_id)

    failure_messages = []
    if job_success:
      # is it really?
      # Due to https://github.com/saltstack/salt/issues/35812 need to check the value of return
      # Due to https://github.com/saltstack/salt/issues/36114 need to check individual state returns when a scheduled job
      # executes multiple states (which is indicated by the return data being a dict).

      # Try to populate failure_messages with relevant details re what failed (but be mindful of sensitive info).

      if job_return and isinstance(job_return, collections.Sequence) and not isinstance(job_return, basestring):
        entry = job_return[0]
        if entry and isinstance(entry, basestring):
          if entry.startswith("No matching sls found for") \
              or entry.startswith("The function 'state.apply' is running as PID")\
              or (entry.startswith("Specified SLS") and "is not available on the salt master" not in entry):
            job_success = False
            failure_messages.append(entry)

      if isinstance(job_return, collections.Mapping):
        for state, result in job_return.iteritems():
          if not result["result"]:
            job_success = False
            failure_messages.append(result["comment"])
            break

    if not job_success:
      message = "--------------------\n*{}*\n".format(ts)
      message += "\tScheduled job *{}* *FAILED*\n".format(job_name)
      message += "\tid: {}\n".format(target_minion_id)
      for g in grains.keys():
        message += "\t{}: {}\n".format(g, grains[g])
      if failure_messages:
        message += "\tClues:\n"
        for m in failure_messages:
          message += "\t\t{}\n".format(m)
      try:
        if log_file:
          pre = "{} [{}][{}] ".format(ts, jid, target_minion_id)
          message += "For detailed failure info check the log {} on {} for jid {}\n".format(log_file, minion_config["master"], jid)
          try:
            if isinstance(job_return, basestring):
              failure_message = job_return
            else:
              failure_message = json.dumps(job_return, sort_keys=True, indent=4, separators=(',', ': '))
            with open(log_file, "a") as f:
              f.write(pre + message + failure_message + "\n")
          except Exception as ex:
            message += "Failed to write details to {}, check the master log instead\n".format(log_file)
            __LOG.error(pre + message + failure_message + "Failed to write to log: {}".format(ex))
            __LOG.error(traceback.format_exc())
      finally:
        _send_alert(alert_channels, message)

  except Exception as ex:
    message = "Runner scheduled-job-return-runner.py encountered an exception processing scheduled job.\n"
    message += "\tjob: {}\n".format(job_name)
    message += "\tminion: {}\n".format(target_minion_id)
    message += "\texception: {}".format(ex)
    message += "{}\n".format(traceback.format_exc())
    message += "For detailed failure info check the master log on {}\n".format(minion_config["master"])
    __LOG.error(message)
    if log_file:
      with open(log_file, "a") as f:
        f.write(message + "\n")
        f.write(traceback.format_exc() + "\n")
      message += "Detailed failure info also available in the log {} on {} for jid {}\n".format(log_file, minion_config["master"], jid)
    _send_alert([FALLBACK_ALERT_CHANNEL], message)
    return False

  return True

@arthurzenika
Copy link
Contributor

@scubahub thanks for sharing your code. Maybe we could pool our needs together to contribute a master returner.

The first thing I see would to make the tag configurable. And maybe determine what we want to return (or again, make it configurable).

I use the function name in the tag by doing the following :

def returner(ret):
    '''
    Return data to the master as an event
    '''
    fun = ret.get('fun','').replace('.','/')
    _mminion().functions['event.send']('master/event_returner/{0}'.format(fun), ret)

@scubahub
Copy link

@arthurlogilab Sounds like a great idea. I would want the event to be more configurable, in my use case 'fun' is not sufficient but I could make it work if #36278 was resolved.

Also, a note on the code. I am having some issues with reactor blowing up due to yaml issues with the ret data in some cases. I just realized I can write the reactor in python rather than yaml (sls) which avoids the json->yaml->json issues. I'll post the alternate reactor when I get it done and tested.

@scubahub
Copy link

scubahub commented Sep 28, 2016

Here is the promised update. To get around the issue of the json->yaml->json (from returner to reactor to runner) which sometimes results in a yaml render failure in the reactor, I have reworked the bits and rewritten the reactor in pure python.

Here is the updated schedule-returner.py file (with code to handle #35784):

'''
Jobs run by the salt scheduler do not return events to the master (https://github.com/saltstack/salt/issues/12653).  This custom
returner creates a custom event "scheduled-job-return" with key data about the job and sends it to the master so it can be
handled by a reactor.

'''

# Import python libs
import logging
from datetime import datetime

# Import salt libs
import salt.payload
import salt.utils

log = logging.getLogger(__name__)

# cache of the master mininon for this returner
MMINION = None

def _mminion():
  '''
  Create a single master minion for this module to use, instead of reloading all the time.
  '''
  global MMINION

  if MMINION is None:
    MMINION = salt.minion.MasterMinion(__opts__)

  return MMINION

def returner(ret):
  '''
  Return data to the master as an event.

  :param dict ret: Data provided to the returner.  Use debug-returner to determine what keys/values are present.
  '''

  # Due to https://github.com/saltstack/salt/issues/35784
  # have to remove recursion from the fun_args parameter before trying to pass it to event.send or will get a recursion depth
  # exception from verylong_encoder
  data = dict(ret)
  new_fun_args = []
  if ("fun_args" in ret and ret["fun_args"]):
    for i in ret["fun_args"]:
      if not isinstance(i, dict):
        new_fun_args.append(i)
      else:
        new_dict = {}
        for k,v in i.iteritems():
          if k == "__pub_fun_args":
            new_dict[k] = "redacted due to https://github.com/saltstack/salt/issues/35784"
          else:
            new_dict[k] = v
        new_fun_args.append(new_dict)
  data["fun_args"] = new_fun_args

  _mminion().functions['event.send']('scheduled-job-return', data=data,
                                     with_grains=[ #list whatever grains you are interested in here
                                       "teams",
                                       "servicerole",
                                       "flavor",
                                       "cluster",
                                       "environment",
                                       "owner",
                                       "pod",
                                       "alert_channel",
                                     ])

Here is the updated scheduled-job-return-reactor.sls file:

#!py

def run():
  __salt__['saltutil.runner']("scheduled-job-return-runner.handle", data=data, __env__=__env__)
  return {}

And here is the updated scheduled-job-runner.py file (with code to handle #36187, #35812 and #36114):

'''
Runner for handling our custom scheduled-job-return event.  This runner sends alerts to the appropriate slack channels when a job
fails.  Since slack is not secure so full details of failures are written to a log file rather than being included in the alert.
In case of any failures in alerting the runner will fall back to the "#REDACTED" channel, then to writing to the runner
log, and finally writing to the master log.
'''
import salt
import logging
import json
import collections
import traceback

__LOG = logging.getLogger(__name__)

FALLBACK_ALERT_CHANNEL = "#REDACTED"

MODULES = None

def _modules():
  '''
  Create a single list of modules instead of loading them every time.
  '''
  global MODULES

  if MODULES is None:
    MODULES = salt.loader.minion_mods(__opts__)

  return MODULES


def _send_alert(channels, message):
  '''
  Send an alert to a list of slack channels.

  :param list of str channels: List of slack channels to send the alert to.
  :param message: The alert message.
  :return: None
  '''
  for c in channels:
    _modules()["slack.post_message"](c, message, "REDACTED-BOT-NAME", "REDACTED-API-KEY")


def _normalize_alert_channel(alert_channel, target_minion_id):
  '''
  The alert_channel grain must exist and can either be a string or a list of strings.  Each value must start with a "#" as
  the slack api does not appear to work with direct messages to user accounts.  If there are no valid entries then the fallback
  alert channel is used.

  :param (str or list of str) alert_channel: channel(s) to be alerted for this minion
  :param str target_minion_id: the minion
  :return: list of validated channels to be alerted for this minion
  :rtype: list of str
  '''
  nv = []
  if not alert_channel or alert_channel == "None": # Due to https://github.com/saltstack/salt/issues/36187 have to test for "None"
    alert_channel = [FALLBACK_ALERT_CHANNEL,]
    _send_alert(alert_channel, "Minion *{}* does not have the *alert_channel* grain (or it is empty).".format(target_minion_id))

  if isinstance(alert_channel, basestring):
    alert_channel = [alert_channel,]
  elif not isinstance(alert_channel, list):
    alert_channel = [FALLBACK_ALERT_CHANNEL,]
    _send_alert(alert_channel, "Minion *{}* has an invalid value type for the *alert_channel* grain (must be string or list).".format(target_minion_id))

  for c in alert_channel:
    if not c or c == "None": # Due to https://github.com/saltstack/salt/issues/36187 have to test for "None"
      _send_alert([FALLBACK_ALERT_CHANNEL,], "Minion *{}* has an invalid value in the *alert_channel* grain list ({}).".format(
        target_minion_id, c))
    elif not isinstance(c, basestring):
      _send_alert([FALLBACK_ALERT_CHANNEL,], "Minion *{}* has an invalid value type for an entry in the the *alert_channel* "
                                             "grain list (must be string or list).".format(target_minion_id))
    elif not c.startswith("#"):
      _send_alert([FALLBACK_ALERT_CHANNEL,], "Minion *{}* has an invalid value in the *alert_channel* grain list ({} does not "
                                             "start with #).".format(target_minion_id, c))
    elif not _modules()["slack.find_room"](c, api_key="REDACTED-API-KEY"):
      _send_alert([FALLBACK_ALERT_CHANNEL,], "Minion *{}* has a room name in the *alert_channel* grain list that does not exist ("
                                             "{})".format(target_minion_id, c))
    else:
      nv.append(c)
  return nv


def handle(data, log_file = "/opt/app/var/log/salt/runner2.log"):
  '''
  Generate any appropriate response to a scheduled-job-return event.

  :param dict data: Data included with the scheduled-job-return event.
  :param Optional[str] log_file: log file to use for detailed event data.  If None then nothing will be written.
  :return: True if successful.
  :rtype: bool
  '''
  minion_config = salt.config.client_config('/etc/salt/minion')
  job_name = "unknown"
  target_minion_id = "unknown"
  jid = "unknown"
  try:
    job_info = data["data"]
    job_success = job_info["success"]
    ts = data["_stamp"]
    jid = job_info["jid"]
    job_return = job_info["return"]
    job_name = job_info["schedule"]
    target_minion_id = job_info["id"]
    grains = job_info["grains"]
    alert_channel = grains["alert_channel"] if "alert_channel" in grains else None

    alert_channels = _normalize_alert_channel(alert_channel, target_minion_id)

    failure_messages = []
    if job_success:
      # is it really?
      # Due to https://github.com/saltstack/salt/issues/35812 need to check the value of return
      # Due to https://github.com/saltstack/salt/issues/36114 need to check individual state returns when a scheduled job
      # executes multiple states (which is indicated by the return data being a dict).

      # Try to populate failure_messages with relevant details re what failed (but be mindful of sensitive info).

      if job_return and isinstance(job_return, collections.Sequence) and not isinstance(job_return, basestring):
        entry = job_return[0]
        if entry and isinstance(entry, basestring):
          if entry.startswith("No matching sls found for") \
              or entry.startswith("The function 'state.apply' is running as PID") \
              or (entry.startswith("Specified SLS") and "is not available on the salt master" in entry) \
              or entry.startswith("Pillar failed to render"):
            job_success = False
            failure_messages.append(entry)
            if len(job_return) > 1 and isinstance(job_return[1], basestring):
              failure_messages.append("\n\t{}".format(job_return[1:]))

      if isinstance(job_return, collections.Mapping):
        for state, result in job_return.iteritems():
          if not result["result"]:
            job_success = False
            failure_messages.append(result["comment"])
            break

    if not job_success:
      message = "--------------------\n*{}*\n".format(ts)
      message += "\tScheduled job *{}* *FAILED*\n".format(job_name)
      message += "\tid: {}\n".format(target_minion_id)
      for g in grains.keys():
        message += "\t{}: {}\n".format(g, grains[g])
      if failure_messages:
        message += "\tClues:\n"
        for m in failure_messages:
          message += "\t\t{}\n".format(m)
      try:
        if log_file:
          pre = "{} [{}][{}] ".format(ts, jid, target_minion_id)
          message += "For detailed failure info check the log {} on {} for jid {}\n".format(log_file, minion_config["master"], jid)
          try:
            if isinstance(job_return, basestring):
              failure_message = job_return
            else:
              failure_message = json.dumps(job_return, sort_keys=True, indent=4, separators=(',', ': '))
            with open(log_file, "a") as f:
              f.write(pre + message + failure_message + "\n")
          except Exception as ex:
            message += "Failed to write details to {}, check the master log instead\n".format(log_file)
            __LOG.error(pre + message + failure_message + "Failed to write to log: {}".format(ex))
            __LOG.error(traceback.format_exc())
      finally:
        _send_alert(alert_channels, message)

  except Exception as ex:
    message = "Runner scheduled-job-return-runner.py encountered an exception processing scheduled job.\n"
    message += "\tjob: {}\n".format(job_name)
    message += "\tminion: {}\n".format(target_minion_id)
    message += "\texception: {}".format(ex)
    message += "{}\n".format(traceback.format_exc())
    message += "For detailed failure info check the master log on {}\n".format(minion_config["master"])
    __LOG.error(message)
    if log_file:
      try: # in case json barfs for some reason
        data_msg += "data: {}\n".format(json.dumps(data, sort_keys=True, indent=4, separators=(',', ': ')))
      except Exception as ex1:
        data_msg += "data: <exception using json.dumps on the data> {}".format(ex1)
      with open(log_file, "a") as f:
        f.write(message + "\n")
        f.write(data_msg + "\n")
        f.write(traceback.format_exc() + "\n")
      message += "Detailed failure info also available in the log {} on {} for jid {}\n".format(log_file, minion_config["master"],
                                                                                                jid)
    _send_alert([FALLBACK_ALERT_CHANNEL], message)
    return False

  return True

@cro
Copy link
Contributor

cro commented Sep 28, 2016

@scubahub In the meantime this would be a great item for the salt-contrib repo. Are you interested?

@scubahub
Copy link

@cro Sure, haven't run into salt-contrib before.

@MorphBonehunter
Copy link
Contributor

I stumble over this case and wonder if there is an documentation fault here: Job Management

By default, data about jobs runs from the Salt scheduler is returned to the master.
Setting the return_job parameter to False will prevent the data from being sent
back to the Salt master.

So i implement an schedule which runs once per minute (verified in the minion log that it runs)
but can not find any entries with salt-run jobs.list_jobs.
So do i misunderstand the Docs?

@vutny
Copy link
Contributor

vutny commented Jun 21, 2017

This is fixed and could be closed. FYI @rallytime @Ch3LL

@Ch3LL Ch3LL closed this as completed Jun 21, 2017
@TheBirdsNest
Copy link

Hi all, I am currently running Salt 3004 and still seem to have this issue.
When running 'test.ping' at 1 minute intervals, I can see in the minion debug log the job is executing but nothing appears on the event bus and is not returned to the master..

Note this is a Proxy Minion, not a standard Minion.

Schedule Config:

/run # salt \* schedule.list
c73b4f53-2528-4ff9-af51-82837bfd18e2:
    schedule:
      enabled: true
      example:
        cron: 0 * * * *
        enabled: true
        function: test.ping
        name: example
      switchport_utilization:
        cron: 0 10,14 * * 1-5
        enabled: true
        function: interface.utilization_summary
        name: switchport_utilization
        splay: 1800

Job Cache (note nothing for test.ping over several minutes):

20221101170052670517:
    ----------
    Arguments:
        - 20221101170037613363
    Function:
        saltutil.find_job
    StartTime:
        2022, Nov 01 17:00:52.670517
    Target:
        - c73b4f53-2528-4ff9-af51-82837bfd18e2
    Target-type:
        list
    User:
        root
20221101170222905275:
    ----------
    Arguments:
    Function:
        saltutil.refresh_pillar
    StartTime:
        2022, Nov 01 17:02:22.905275
    Target:
        *
    Target-type:
        glob
    User:
        root
20221101170413944875:
    ----------
    Arguments:
    Function:
        schedule.get
    StartTime:
        2022, Nov 01 17:04:13.944875
    Target:
        *
    Target-type:
        glob
    User:
        root
20221101170450383361:
    ----------
    Arguments:
    Function:
        schedule.list
    StartTime:
        2022, Nov 01 17:04:50.383361
    Target:
        *
    Target-type:
        glob
    User:
        root

Proxy Minion debug log is showing up with this every 60 seconds:

[DEBUG   ] LazyLoaded interface.utilization_summary
[DEBUG   ] LazyLoaded test.ping
[DEBUG   ] LazyLoaded mine.update
[DEBUG   ] Subprocess Thread-249-Schedule-__proxy_keepalive cleaned up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature new functionality including changes to functionality and code refactors, etc.
Projects
None yet
Development

No branches or pull requests