Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule not changed on pillar update after minion restart #38523

Closed
MorphBonehunter opened this issue Jan 3, 2017 · 15 comments
Closed

schedule not changed on pillar update after minion restart #38523

MorphBonehunter opened this issue Jan 3, 2017 · 15 comments
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P2 Priority 2 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around stale
Milestone

Comments

@MorphBonehunter
Copy link
Contributor

Description of Issue/Question

If you define an schedule in pillar data the schedule is implemented.
If you then try to change the schedule interval, the schedule is updated but fails to
update the "_schedule.conf" File.

Setup

Defined an schedule in "pillars/pandora/init.sls":

schedule:
    underworld_pki_run_state:
        function: state.apply
        minutes: 1
        args:
            - underworld-pki

So the Schedule is implemented:

# salt 'pandora*' schedule.list
pandora.underworld.lan:
    schedule:
      underworld_pki_run_state:
        args:
        - underworld-pki
        enabled: true
        function: state.apply
        minutes: 1

and is running:

# tail -f /var/log/salt/minion | grep "Running scheduled job: underworld_pki_run_state"
2017-01-03 10:03:37,611 [salt.utils.schedule][INFO    ][23384] Running scheduled job: underworld_pki_run_state
2017-01-03 10:04:37,611 [salt.utils.schedule][INFO    ][23384] Running scheduled job: underworld_pki_run_state
2017-01-03 10:05:37,611 [salt.utils.schedule][INFO    ][23384] Running scheduled job: underworld_pki_run_state

The "_schedule.conf" shows:

schedule:
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false}
  underworld_pki_run_state:
    args: [underworld-pki]
    function: state.apply
    minutes: 1

Steps to Reproduce Issue

So now try to alter the timeframe in "pillars/pandora/init.sls":

schedule:
    underworld_pki_run_state:
        function: state.apply
        minutes: 2
        args:
            - underworld-pki

Update pillars:

# salt 'pandora*' saltutil.refresh_pillar && date
pandora.underworld.lan:
    True
Di 3. Jan 10:06:50 CET 2017

So the Schedule shows the right values:

# salt 'pandora*' schedule.list
pandora.underworld.lan:
    schedule:
      underworld_pki_run_state:
        args:
        - underworld-pki
        enabled: true
        function: state.apply
        minutes: 2

and the schedule is running as expected:

# tail -f /var/log/salt/minion | grep "Running scheduled job: underworld_pki_run_state"
2017-01-03 10:08:37,610 [salt.utils.schedule][INFO    ][23384] Running scheduled job: underworld_pki_run_state
2017-01-03 10:10:37,610 [salt.utils.schedule][INFO    ][23384] Running scheduled job: underworld_pki_run_state
2017-01-03 10:12:37,611 [salt.utils.schedule][INFO    ][23384] Running scheduled job: underworld_pki_run_state

But the "_schedule.conf" shows still:

schedule:
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false}
  underworld_pki_run_state:
    args: [underworld-pki]
    function: state.apply
    minutes: 1

So, now restart the minion:

# systemctl restart salt-minion && date
Di 3. Jan 10:16:39 CET 2017

and look at the logs:

# tail -f /var/log/salt/minion | grep "Running scheduled job: underworld_pki_run_state"
2017-01-03 10:16:43,950 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state
2017-01-03 10:17:43,768 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state
2017-01-03 10:18:43,768 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state

The schedule falls back to the 1-Minute intervall, the "schedule.list" shows the right stuff:

pandora.underworld.lan:
    schedule:
      underworld_pki_run_state:
        args:
        - underworld-pki
        enabled: true
        function: state.apply
        minutes: 2

Also an pillar refresh does not work:

# salt 'pandora*' saltutil.refresh_pillar && date
pandora.underworld.lan:
    True
Di 3. Jan 10:20:13 CET 2017

# tail -f /var/log/salt/minion | grep "Running scheduled job: underworld_pki_run_state"
2017-01-03 10:20:43,768 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state
2017-01-03 10:21:43,768 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state
2017-01-03 10:22:43,767 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state

Bigger Problem is, that also an change of the timeframe in "pillars/pandora/init.sls" does not do anything now:

schedule:
    underworld_pki_run_state:
        function: state.apply
        minutes: 5
        args:
            - underworld-pki

looks good:

# salt 'pandora*' saltutil.refresh_pillar && date
pandora.underworld.lan:
    True
Di 3. Jan 10:23:38 CET 2017

# salt 'pandora*' schedule.list
pandora.underworld.lan:
    schedule:
      underworld_pki_run_state:
        args:
        - underworld-pki
        enabled: true
        function: state.apply
        minutes: 5

But doesn't work:

2017-01-03 10:23:43,767 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state
2017-01-03 10:24:43,767 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state
2017-01-03 10:25:43,767 [salt.utils.schedule][INFO    ][24129] Running scheduled job: underworld_pki_run_state

Content of "_schedule.conf" is still:

schedule:
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false}
  underworld_pki_run_state:
    args: [underworld-pki]
    function: state.apply
    minutes: 1

Versions Report

Installed Salt in Arch Linux.

Salt Version:
           Salt: 2016.11.1
 
Dependency Versions:
           cffi: 1.9.1
       cherrypy: 8.5.0
       dateutil: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.24.0
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: 2.17
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.13 (default, Dec 21 2016, 07:16:46)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.2
            ZMQ: 4.2.0
 
System Versions:
           dist:   
        machine: x86_64
        release: 4.8.13-1-ARCH
         system: Linux
        version: Not Installed

Workaround

Only thing to get this "reset":

systemctl stop salt-minion && rm /etc/salt/minion.d/_schedule.conf && systemctl start salt-minion
@gtmanfred
Copy link
Contributor

Setting the schedule in the pillar does not change the /etc/salt/minion.d/_schedule.conf file, it just operates as a different place to store the schedule options.

You would need to use a schedule.absent or schedule.disabled to change that.

https://docs.saltstack.com/en/latest/ref/states/all/salt.states.schedule.html#salt.states.schedule.absent

The _schedule.conf file is really only there to make sure that the mine gets updated periodically. Nothing else in salt knows about it unless you use the schedule.present/absent states.

@gtmanfred gtmanfred added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Jan 3, 2017
@gtmanfred gtmanfred added this to the Blocked milestone Jan 3, 2017
@MorphBonehunter
Copy link
Contributor Author

@gtmanfred so if

The _schedule.conf file is really only there to make sure that the mine gets updated periodically. Nothing else in salt knows about it unless you use the schedule.present/absent states.

why is then the "_schedule.conf" updated with:

  underworld_pki_run_state:
    args: [underworld-pki]
    function: state.apply
    minutes: 1

another question is then, when i do not need an "schedule.present" for implementing the Schedule (as you can see, i only change the pillar data and run saltutil.refresh_pillar) why must i use schedule.absent to get rid of it?
so the original question persists; why is the schedule cycle not updated when i change the timeframe?

@MorphBonehunter
Copy link
Contributor Author

MorphBonehunter commented Jan 3, 2017

Actually the schedule can be removed but reapear after minion restart.
When i delete the schedule entry in the pillar data, i got the following after an "saltutil.refresh_pillar":

# salt 'pandora*' schedule.list && date
pandora.underworld.lan:
    ----------
    schedule:
        ----------
Di 3. Jan 18:48:37 CET 2017

The schedule is not executed anymore.
But after restart of the salt-minion the schedule reapeared (note the data is NOT in the pillar):

# salt 'pandora*' schedule.list
pandora.underworld.lan:
    schedule:
      underworld_pki_run_state:
        args:
        - underworld-pki
        enabled: true
        function: state.apply
        jid_include: true
        maxrunning: 1
        minutes: 1
        name: underworld_pki_run_state

and got executed:

2017-01-03 18:51:53,980 [salt.utils.schedule][INFO    ][12830] Running scheduled job: underworld_pki_run_state
2017-01-03 18:52:53,980 [salt.utils.schedule][INFO    ][12830] Running scheduled job: underworld_pki_run_state
2017-01-03 18:53:53,980 [salt.utils.schedule][INFO    ][12830] Running scheduled job: underworld_pki_run_state

But if i now delete the "_schedule.conf" the schedule disapear (there are still NO changes in the pillar data) after restart of the minion.
In the whole process i do not use any schduler state or module call.

@gtmanfred
Copy link
Contributor

Hrm, i was unaware of this capability.

Can you provide the output of running pillar.items?

@MorphBonehunter
Copy link
Contributor Author

MorphBonehunter commented Jan 3, 2017

Yes, sure (i reduce the output to the relevant sections because of sensitive data).
With the following in the pillar file "pillars/pandora/init.sls":

schedule:
    underworld_pki_run_state:
        function: state.apply
        minutes: 1
        args:
            - underworld-pki

I get the following pillar item:

# salt 'pandora*' pillar.item schedule
pandora.underworld.lan:
    ----------
    schedule:
        ----------
        __update_grains:
            ----------
            args:
                |_
                  ----------
                - grains_refresh
            function:
                event.fire
            minutes:
                60
            name:
                __update_grains
        underworld_pki_run_state:
            ----------
            args:
                - underworld-pki
            function:
                state.apply
            minutes:
                1
            name:
                underworld_pki_run_state

After the deletion of the entry in the Pillar File i get:

daniel@pandora ~/repos/underworld/saltstack [master *] $ salt 'pandora*' pillar.item schedule
pandora.underworld.lan:
    ----------
    schedule:

Mhh...i wonder where the "__update_grains" entry comes from...

For completeness, if i change the timeframe the pillar item is updated correct to:

pandora.underworld.lan:
    ----------
    schedule:
        ----------
        __update_grains:
            ----------
            args:
                |_
                  ----------
                - grains_refresh
            function:
                event.fire
            minutes:
                60
            name:
                __update_grains
        underworld_pki_run_state:
            ----------
            args:
                - underworld-pki
            function:
                state.apply
            minutes:
                5
            name:
                underworld_pki_run_state

@gtmanfred
Copy link
Contributor

I believe the __update_grains comes from /etc/salt/master.d/_schedule.conf on the master

@MorphBonehunter
Copy link
Contributor Author

Mhh...strange the only occurrence of "_schedule.conf" is "/etc/salt/minion.d/_schedule.conf".
This is because in the system where the master runs also an minion is running which connects to the master in this system...i don't know if this is important...

@gtmanfred
Copy link
Contributor

Actually, i was mistaken, this is inserted into the schedule here.

    def _refresh_grains_watcher(self, refresh_interval_in_minutes):
        '''
        Create a loop that will fire a pillar refresh to inform a master about a change in the grains of this minion
        :param refresh_interval_in_minutes:
        :return: None
        '''
        if '__update_grains' not in self.opts.get('schedule', {}):
            if 'schedule' not in self.opts:
                self.opts['schedule'] = {}
            self.opts['schedule'].update({
                '__update_grains':
                    {
                        'function': 'event.fire',
                        'args': [{}, 'grains_refresh'],
                        'minutes': refresh_interval_in_minutes
                    }
            })

in salt/minion.py

@MorphBonehunter
Copy link
Contributor Author

Ok, i see this in the code...i was not aware that this is found in the pillar data of the minion.
And it's only found if another schedule is defined in pillar.
I try to read a little bit in the code and found the only occurrence of "_schedule.conf" is in salt/utils/schedule.py:

    def persist(self):
        '''
        Persist the modified schedule into <<configdir>>/minion.d/_schedule.conf
        '''
        config_dir = self.opts.get('conf_dir', None)
        if config_dir is None and 'conf_file' in self.opts:
            config_dir = os.path.dirname(self.opts['conf_file'])
        if config_dir is None:
            config_dir = salt.syspaths.CONFIG_DIR

        minion_d_dir = os.path.join(
            config_dir,
            os.path.dirname(self.opts.get('default_include',
                                          salt.config.DEFAULT_MINION_OPTS['default_include'])))

        if not os.path.isdir(minion_d_dir):
            os.makedirs(minion_d_dir)

        schedule_conf = os.path.join(minion_d_dir, '_schedule.conf')
        log.debug('Persisting schedule')
        try:
            with salt.utils.fopen(schedule_conf, 'wb+') as fp_:
                fp_.write(
                    salt.utils.to_bytes(
                        yaml.dump({'schedule': self.option('schedule')})
                    )
                )
        except (IOError, OSError):
            log.error('Failed to persist the updated schedule',
                      exc_info_on_loglevel=logging.DEBUG)

So if i understand this right, persistent schedules (default is true) are placed in this file.
As i'm not an programmer i can not dig it further down why then the first define includes this in the file, but changes in pillar data does not update this.
I reckon that if i define "persist: false" then, this is not included in the file and works as expected except for the "run_on_start".
I try to verify this later today.
Maybe this is somethin related to #24502.

@MorphBonehunter
Copy link
Contributor Author

So if i insert the "persist: False" in the pillar data, the pillar is:

# salt 'pandora*' pillar.item schedule
pandora.underworld.lan:
    ----------
    schedule:
        ----------
        __update_grains:
            ----------
            args:
                |_
                  ----------
                - grains_refresh
            function:
                event.fire
            minutes:
                60
            name:
                __update_grains
        underworld_pki_run_state:
            ----------
            args:
                - underworld-pki
            function:
                state.apply
            jid_include:
                True
            maxrunning:
                1
            minutes:
                1
            name:
                underworld_pki_run_state
            persist:
                False

What i did not excpect is that the "_schedule.conf" is still updated:

schedule:
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false}
  underworld_pki_run_state:
    args: [underworld-pki]
    function: state.apply
    minutes: 1
    persist: false

If i now delete the pillar data, refresh the pillar the schedule stops but the entry persists.
So after restart of the minion, the Schedule is reexecuted and i must delete the file to get rid of it.

But here comes another strange behavior! The "_schedule.conf" is only updated when i do the following steps:

  • have an "old" entry in the pillar
  • change this to an updated schedule with "persist: yes"
  • stop salt-minion, delete _schedule.conf, start minion

If i do an fresh start:

  • delete pillar data
  • refresh pillar
  • stop salt-minion, delete _schedule.conf, start minion
  • add pillar data
  • refresh pillar

there are no new entries in the "_schedule.conf"!
So in this state, i can remove the pillar data and the schedule stops, i can restart the minion without the reappearance, all is fine...

This changes when you restart the salt-minion with an active Pillar data...
After the restart the "_schedule.conf" is updated with the non-persistent schedule and the problem reappears.

@MorphBonehunter
Copy link
Contributor Author

@gtmanfred this issue is actually on blocked and pending discussussion.
As the behavios is in my eyes an Bug what can i do that this issue is reclassified?

@gtmanfred
Copy link
Contributor

Yup, sorry i have been busy and forgot to update this.

@gtmanfred gtmanfred modified the milestones: Approved, Blocked Jan 6, 2017
@gtmanfred gtmanfred added Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P2 Priority 2 TEAM Core and removed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged labels Jan 6, 2017
@MorphBonehunter
Copy link
Contributor Author

Hehe...no Problem. Thanks!

@sp1r
Copy link

sp1r commented Mar 1, 2017

Hello everybody, I have same troubles with schedules. And it looks like all this scheduler business is a complete mess.

Scheduler have 2 sources of jobs: config and pillar, merged with config.merge. But it tries to modify resulting dict (for example with schedule.add_job). This brings us many problems:

# salt/modules/shedule.py

    def add_job(self, data, persist=True):

...skipped...

        schedule = self.option('schedule')
        if new_job in schedule:
            log.info('Updating job settings for scheduled '
                     'job: {0}'.format(new_job))
        else:
            log.info('Added new job {0} to scheduler'.format(new_job))

        schedule.update(data)

...skipped...

    def option(self, opt):
        if 'config.merge' in self.functions:
            return self.functions['config.merge'](opt, {}, omit_master=True)
        return self.opts.get(opt, {})
  1. Since config.merge prefers data from configuration files, when 2 jobs with same ID present in _schedule.conf and pillar - pillar one will be completely ignored. This behaviour @MorphBonehunter have reported.
# salt/modules/config.py

def merge(value,
          default='',
          omit_opts=False,
          omit_master=False,
          omit_pillar=False):
    ret = None
    if not omit_opts:
        if value in __opts__:
            ret = __opts__[value]
            if isinstance(ret, str):
                return ret

...skipped...

    if not omit_pillar:
        if value in __pillar__:
            tmp = __pillar__[value]
            if ret is None:
                ret = tmp
                if isinstance(ret, str):
                    return ret
            elif isinstance(ret, dict) and isinstance(tmp, dict):
                tmp.update(ret)
                ret = tmp
  1. When persisting jobs configuration to disk Scheduler grabs pillar data and stores it. After this no more changes can ba made to this job via pillar. This happens any time functions of module schedule are called with parameter persist=True.
# salt/modules/shedule.py

    def persist(self):

...skipped...

            with salt.utils.fopen(schedule_conf, 'wb+') as fp_:
                fp_.write(
                    salt.utils.to_bytes(
                        yaml.dump({'schedule': self.option('schedule')})
                    )
                )
  1. During initialization minion adds job "__mine_interval" for mine functions to schedule (if mine is enabled) using schedule.add_job with persist=True. i.e. triggers previous note.
# salt/minion.py

    @tornado.gen.coroutine
    def _post_master_init(self, master):

...skipped...

	# add default scheduling jobs to the minions scheduler
        if self.opts['mine_enabled'] and 'mine.update' in self.functions:
            self.schedule.add_job({
                '__mine_interval':
                {
                    'function': 'mine.update',
                    'minutes': self.opts['mine_interval'],
                    'jid_include': True,
                    'maxrunning': 2,
                    'return_job': self.opts.get('mine_return_job', False)
                }
            }, persist=True)
            log.info('Added mine.update to scheduler')
        else:
            self.schedule.delete_job('__mine_interval', persist=True)
  1. More interesting now! When there is schedule configuration in pillar, job "__mine_interval" created during initialization also goes into (cached?) pillar. And can be seen with salt '*' pillar.get schedule. If then pillar changes on master after refresh_pillar this job disappears.

  2. Function schedule.list prefers pillar data rather than _schedule.conf and thus should return invalid response.

# salt/modules/shedule.py

    def list(self, where):
        schedule = {}
        if where == 'pillar':
            if 'schedule' in self.opts['pillar']:
                schedule.update(self.opts['pillar']['schedule'])
        elif where == 'opts':
            schedule.update(self.option('schedule'))
        else:
            schedule.update(self.option('schedule'))
            if 'schedule' in self.opts['pillar']:
                schedule.update(self.opts['pillar']['schedule'])

I believe there are more issues.

I am quite new to the project, and dont know enough about it internal structure to propose best solution. But it seems to me that this issues can be resolved easily if only explicit separation of config data management from pillar data can be implemented.

PS. All code snippets and from repository branch "develop".

@stale
Copy link

stale bot commented Sep 12, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Sep 12, 2018
@stale stale bot closed this as completed Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P2 Priority 2 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around stale
Projects
None yet
Development

No branches or pull requests

3 participants