New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistency with "jid" on minion scheduled jobs and the returners output #47471

Merged
merged 1 commit into from May 9, 2018

Conversation

Projects
None yet
6 participants
@meaksh
Member

meaksh commented May 4, 2018

What does this PR do?

This PR proposes a fix to an inconsistent behavior happening at the moment with scheduled jobs and the returners output for these jobs.

Currently the "jid" which is included in the returner output is generated by the standalone job created on the minion, but it's not generated and stored on the Salt master. Salt does the following for scheduled jobs on minions:

1- A jid is generated by the standalone schedule job on the minion.
2- Once the job is completed, the output is provided to the returner.
3- Now the output is passed back to the Salt master but the jid is internally set to 'req' to indicate the "jid" needs to be created at master side. I guess the reason for this is to avoid jids collisions because those were generated on the minion instead on the master.

This behavior produces inconsistent references to "jid" between the Salt event bus (which includes the "jid" generated on the master) and the returner output (with a different one generated by the minion) for the same Salt job.

Previous Behavior

Given this scheduled job:

$ salt-call schedule.list
local:
    schedule:
      job1:
        enabled: true
        function: test.ping
        jid_include: true
        maxrunning: 1
        name: job1
        returner: rawfile_json
        seconds: 30

I got this returner output:

{"fun_args": [], "jid": "20180504111741349996", "return": true, "retcode": 0, "success": true, "schedule": "job1", "pid": 16959, "fun": "test.ping", "id": "min-sles12sp3.tf.local"}

While I get this on the Salt event bus:

salt/job/20180504111741855597/ret/min-sles12sp3.tf.local	{
    "_stamp": "2018-05-04T09:17:41.857059", 
    "arg": [], 
    "cmd": "_return", 
    "fun": "test.ping", 
    "fun_args": [], 
    "id": "min-sles12sp3.tf.local", 
    "jid": "20180504111741855597", 
    "pid": 16959, 
    "retcode": 0, 
    "return": true, 
    "schedule": "job1", 
    "success": true, 
    "tgt": "min-sles12sp3.tf.local", 
    "tgt_type": "glob"
}

Notice that the two jid are different for the same Salt job.

New Behavior

The proposed behavior is to included a new schedule_jid attribute on the scheduled job return which references the jid that was used on the minion while executing the job (which would be the one referenced on the returner output):

salt/job/20180504111939850120/ret/min-sles12sp3.tf.local	{
...
    "fun": "test.ping",
    "jid": "20180504112008951172", 
    "pid": 17938,
    "schedule": "job1", 
    "schedule_jid": "20180504111939347461", 
...
}

and this returner output:

{"fun_args": [], "jid": "20180504111939347461", "return": true, "retcode": 0, "success": true, "schedule": "job1", "pid": 17938, "fun": "test.ping", "id": "min-sles12sp3.tf.local"}

I'm targeting this PR to 2017.7 branch since this could be considered as a bug. Downside is that this would produce a change on the output schema of the job return event. Let me know if I should point this directly to develop.

Once this PR is merged, a change on https://github.com/SUSE/salt-netapi-client/blob/master/src/main/java/com/suse/salt/netapi/event/JobReturnEvent.java to reflect the new attribute there.

What do you think about this proposal? Is this feasible or do you think we should use another approach?

Tests written?

No

Commits signed with GPG?

Yes

@rallytime rallytime requested review from s0undt3ch and garethgreenaway May 4, 2018

@thatch45

This comment has been minimized.

Member

thatch45 commented May 4, 2018

I am not a big fan of this solution, as it adds a field that will lead to confusion in the future. But we do need to figure out a solution here.
Sorry that I don't recall of the top of my head where this is. but where is the event created? We should figure out how to make sure that generate_jid is only called once during the return pipeline. If the event is generating a separate jid then that is the root of the problem and should be addressed

@s0undt3ch

I defer my review to @thatch45

@thatch45

I think we need to get to the root of this problem instead

@garethgreenaway

This comment has been minimized.

Member

garethgreenaway commented May 8, 2018

I spent some time looking into the root cause. The jids are inconsistent because it's basically using two returners. One returner is the one defined in the scheduled job, which is generating it's own jid and the other is the local_cache returner, which generates another jid because jid is being set to req. If we only set jid to req in the event that we're not overriding the returner in the scheduled job then the jids will be consistent.

diff --git a/salt/utils/schedule.py b/salt/utils/schedule.py
index 33e1359304..6f09b0b654 100644
--- a/salt/utils/schedule.py
+++ b/salt/utils/schedule.py
@@ -941,11 +941,13 @@ class Schedule(object):
                 else:
                     # Send back to master so the job is included in the job list
                     mret = ret.copy()
-                    mret['jid'] = 'req'
-                    if data.get('return_job') == 'nocache':
-                        # overwrite 'req' to signal to master that
-                        # this job shouldn't be stored
-                        mret['jid'] = 'nocache'
+                    # No returners defined, so we're only sending back to the master
+                    if not data_returner and not self.schedule_returner:
+                        mret['jid'] = 'req'
+                        if data.get('return_job') == 'nocache':
+                            # overwrite 'req' to signal to master that
+                            # this job shouldn't be stored
+                            mret['jid'] = 'nocache'
                     load = {'cmd': '_return', 'id': self.opts['id']}
                     for key, value in six.iteritems(mret):
                         load[key] = value

@meaksh meaksh force-pushed the meaksh:2017.7-fix-inconsistent-scheduled-jid-with-returners branch from fbe7199 to f079939 May 9, 2018

@meaksh

This comment has been minimized.

Member

meaksh commented May 9, 2018

@garethgreenaway - Thanks a lot for taking a look to this issue! I really appreciate your time here.

I've just tested your proposed patch and it works perfectly so I've just updated the PR to include it.

Ping @thatch45 in order to get a final approval for the proposed patch. Thanks everyone!

@thatch45

Yes! That is the right way :)

@rallytime rallytime merged commit d3121fc into saltstack:2017.7 May 9, 2018

6 of 9 checks passed

jenkins/PR/salt-pr-linode-cent7-py3 Pull Requests » Salt PR - Linode CentOS 7 - PY3 #4792 — FAILURE
Details
jenkins/PR/salt-pr-rs-cent7-n Pull Requests » Salt PR - RS CentOS 7 #18847 — FAILURE
Details
default Build started sha1 is merged.
Details
WIP ready for review
Details
jenkins/PR/salt-pr-clone Pull Requests » Salt PR - Clone #24967 — SUCCESS
Details
jenkins/PR/salt-pr-docs-n Pull Requests » Salt PR - Docs #17085 — SUCCESS
Details
jenkins/PR/salt-pr-linode-ubuntu14-n Pull Requests » Salt PR - Linode Ubuntu14.04 #22722 — SUCCESS
Details
jenkins/PR/salt-pr-linode-ubuntu16-py3 Pull Requests » Salt PR - Linode Ubuntu16.04 - PY3 #9761 — SUCCESS
Details
jenkins/PR/salt-pr-lint-n Pull Requests » Salt PR - Code Lint #21705 — SUCCESS
Details
@isbm

This comment has been minimized.

Contributor

isbm commented May 11, 2018

@garethgreenaway awesome! Thanks looking into that! 💯

@meaksh meaksh deleted the meaksh:2017.7-fix-inconsistent-scheduled-jid-with-returners branch May 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment