Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2018.3] Fixes to schedule maxrunning on master #50130

Conversation

Projects
None yet
6 participants
@garethgreenaway
Copy link
Member

commented Oct 19, 2018

What does this PR do?

Adding some master specific functions to uitls/masters.py to determine if a Salt process is running. Updating utils/schedule.py to use the appropriate running function either from utils/master.py or utils/minion.py depending on where the scheduled job is running. Adding tests to test maxrunning in scheduled jobs for both the minion and master.

What issues does this PR fix or reference?

#49957

Tests written?

Yes

Commits signed with GPG?

Yes/No

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

@garethgreenaway garethgreenaway requested a review from saltstack/team-core as a code owner Oct 19, 2018

@salt-jenkins salt-jenkins requested a review from saltstack/team-suse Oct 19, 2018

@terminalmage

This comment has been minimized.

@garethgreenaway garethgreenaway force-pushed the garethgreenaway:49957_master_schedule_ignoring_maxrunning branch from f77dc70 to e6efe48 Oct 19, 2018

@isbm
Copy link
Contributor

left a comment

I am not sure I like the whole approach and design of this code. And it is not portable. 😞

Show resolved Hide resolved salt/utils/master.py Outdated
Show resolved Hide resolved salt/utils/master.py Outdated
Show resolved Hide resolved salt/utils/master.py Outdated
Show resolved Hide resolved salt/utils/master.py Outdated
Show resolved Hide resolved salt/utils/master.py Outdated
Show resolved Hide resolved salt/utils/master.py
Show resolved Hide resolved salt/utils/schedule.py
Show resolved Hide resolved salt/utils/master.py Outdated
Show resolved Hide resolved salt/utils/master.py
Show resolved Hide resolved salt/utils/master.py
@isbm

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2018

DISCLAIMER: The code below does not have to be that way. It is just the way to express solution idea.

What I would propose here is different approach at this problem. Since our task here is to let maxrunning to be respected. Therefore the whole purpose of this particular routine is to properly count maximum parallel running jobs and nothing else. To this, I would:

  1. Avoid approach heuristically to figure out "what could be the process"? Instead I would solve that from the other direction: each job should be already prepared with the required in a future info/metadata about it and only then fired to perform. That protocol might be very small, like just one byte attached to the data of job on boolean basis. This way gives us the following advantages:

    • No longer need to filter supported OSes. The whole thing is portable.
    • It is now guaranteed to know the kind of job/process.
    • More maintainable and structured code: it does only what it supposed to, being located in one place.
  2. Remove any kind of PID cleanup helpers. This is simply another task, another place to do this. Either job scheduler should properly do it (again, the data from No.1 could be happily reused!) or job itself can "kill/cleanup" its post-mortem data, as that is done with typical threading/multiprocessing based on mutexes/signals. All that PIDs cleaning should be there, and be completely separate logical flow, thus:

    • Much easier to locate bugs, if any
    • No overlapping "spaghetti" code

So the conceptually in pseudo-code I would express the solution idea and its interface in the following way. I personally would prefer it as a class, so one can instantiate it and start working with the jobs. Or just refer an instance at module level (YMMV). It is so:

class JobTrackerUtility(object):
    '''
    Class that tracks jobs in cache.
    This should also be extended to
    https://github.com/saltstack/salt/pull/50078
    '''
    def register(self, job):
        '''
        This registers job and works with the cache/serialiser.
        '''
        ...
        return True or False  # Registered or not

    def start_job(self, payload):
        '''
        We just start one job and register it.
        '''
        command, other_things = payload.parse_in_some_way()
        job = Job()  # Wrapper to a job in cache.
                     # Can be just a dict as well,
                     # but objects just more convenient.
        job.command = command
        job.is_parallel = other_things.get_is_countable()  # This is conceptual idea
        self.register(job)  # This job is not yet started!

    def get_all_running_jobs(self):
        '''
        This just reads what is in cache alongside
        with the whole metadata.
        '''
        # At this point we no longer care if this is
        # Windows or Linux or Solaris.
        ...
        return available_jobs

    def running_jobs(self):
        '''
        Get only jobs that we respect for maxrunning
        '''
        jobs_we_need = []
        for job in self.get_all_running_jobs():
            if job.we_are_interested_in():  # This is where we use beforehand prepared info
                jobs_we_need.append(job)

        return jobs_we_need

    def cleanup_jobs(self):
        '''
        Cleanup jobs.
        '''
        # This internally would get all running jobs,
        # see if job is actually running and cleanup PIDs or
        # leave as is.
        for job in self.get_all_running_jobs():
            if not job.is_running():
                job.cleanup()  # This Job() instance would remove pid
                               # check what is needed
                               # clean cache etc

There is already an RFC: #50078 which has Job Retry idea. I wrote there an alternative approach to this and it seems better to work with caches on mater/minion. We can also reuse this approach above there and then in SaltSSH on multiple requests at the same time. The job tracker above fits there to control, view and manage jobs for the similar purposes: retry jobs from the cache, poll them, add them from another process or local client etc.

The only downside here that it would require more development. 😉

@cachedout @terminalmage thoughts?

Show resolved Hide resolved salt/utils/master.py
Show resolved Hide resolved salt/utils/master.py Outdated

@garethgreenaway garethgreenaway force-pushed the garethgreenaway:49957_master_schedule_ignoring_maxrunning branch from 4e2592c to 42711a2 Oct 22, 2018

@rallytime rallytime requested a review from cachedout Oct 30, 2018

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Oct 31, 2018

@garethgreenaway Review comments?

@garethgreenaway garethgreenaway force-pushed the garethgreenaway:49957_master_schedule_ignoring_maxrunning branch from 7bcfeb1 to 2c2d82d Oct 31, 2018

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2018

@garethgreenaway You're going to need to update filename_map for this one. That is why that test is failing.

@garethgreenaway

This comment has been minimized.

Copy link
Member Author

commented Nov 8, 2018

@rallytime I'll take a look.

@garethgreenaway garethgreenaway force-pushed the garethgreenaway:49957_master_schedule_ignoring_maxrunning branch 3 times, most recently from 5b9db4f to 1166569 Nov 12, 2018

@cachedout
Copy link
Collaborator

left a comment

Why are there changes to doc/conf.py in here? Is that intentional?

@garethgreenaway

This comment has been minimized.

Copy link
Member Author

commented Nov 13, 2018

@cachedout Definitely not. Fixing that up.


ret = []
proc_dir = os.path.join(opts['cachedir'], 'proc')
if not os.path.isdir(proc_dir):

This comment has been minimized.

Copy link
@cachedout

cachedout Nov 13, 2018

Collaborator

Wouldn't this be considered an error? I'm wondering if we should log this at at least info level if not error?

This comment has been minimized.

Copy link
@garethgreenaway

garethgreenaway Nov 13, 2018

Author Member

Perhaps a case where no jobs have run and the directory doesn't exist yet?

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Nov 13, 2018

@isbm Are you all right with this going in as it sits? If so, could you please approve it? Thanks.

@isbm

isbm approved these changes Nov 14, 2018

@isbm

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2018

@cachedout it still seems to me that it post-figures out running processes and thus still looks prone to errors, instead of pre-registering them and getting it reliable as OS would normally do. Nothing in that area has been changed from the design perspective. But since you are asking to approve, I am approving...

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Nov 20, 2018

@garethgreenaway There are test failures here.

garethgreenaway added some commits Oct 19, 2018

Adding some master specific functions to uitls/masters.py to determin…
…e if a Salt process is running. Updating utils/schedule.py to use the appropriate running function either from utils/master.py or utils/minion.py depending on where the scheduled job is running. Adding tests to test maxrunning in scheduled jobs for both the minion and master.

garethgreenaway added some commits Oct 31, 2018

@garethgreenaway garethgreenaway force-pushed the garethgreenaway:49957_master_schedule_ignoring_maxrunning branch from 4439662 to 2f6dac9 Nov 24, 2018

@cachedout cachedout merged commit 5b7ab35 into saltstack:2018.3 Nov 26, 2018

10 checks passed

WIP Ready for review
Details
continuous-integration/jenkins/pr-merge This commit looks good
Details
jenkins/pr/docs The docs job has passed
Details
jenkins/pr/lint The lint job has passed
Details
jenkins/pr/py2-centos-7 The py2-centos-7 job has passed
Details
jenkins/pr/py2-ubuntu-1604 The py2-ubuntu-1604 job has passed
Details
jenkins/pr/py2-windows-2016 The py2-windows-2016 job has passed
Details
jenkins/pr/py3-centos-7 The py3-centos-7 job has passed
Details
jenkins/pr/py3-ubuntu-1604 The py3-ubuntu-1604 job has passed
Details
jenkins/pr/py3-windows-2016 The py3-windows-2016 job has passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.