New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'salt-run jobs.active' doesn't list all minions running the job #9526
Comments
Is this consistently reproducible? My bet in this instance is that the minion timed out on the master's query for the job. Additionally, what OS is the minion running? |
Ah, just noticed the |
I'm a gentoo newb -- what is that |
Yup, Gentoo. Well, I ssh'ed into srv3 and verified with I can tell you next Thursday if it happens again (it was reproducible last week). |
PS: When the job was done, srv3 showed up in |
Yep, I'm almost certain it's an premature timeout issue. When I find time to finish making the secondary timeout configurable, I think that will resolve this issue. |
Good news, that secondary timeout is configurable now! It's not in 2014.1.0rc3, but it will be in 2014.1.0, out later this week. Just put |
Now that 2014.1.0 is shipped, can we close this issue? https://github.com/saltstack/salt/releases/tag/v2014.1.0 |
Sounds good to me! |
@basepi I think gather_job_timeout default should be higher - I had to set it to 5 to prevent timeouts while running pkg.latest_version on debian minions. I assume the load generated by running apt-get upgrade etc... was causing a spike in I/O which prevented find_job returning fast enough. |
I agree. I'm going to bump it to 5 by default, 2 is too short for a |
#11062 is in. |
I have to re-open the issue. srv1 (master):
srv2 (minion):
srv3 (minion):
I started a long running job from the master: After ~8-10sec I pressed
That's OK. Now I run
That's not OK:
Looks like an empty, not resolved value? Not? After some minutes I got
Still an active job without a running minion? Well, the I thought that this should be working in 2014.1.0 (and the master is running 2014.1.1 now). I also have set Can I help you debugging this? |
Some debug logs:
srv1 (master) debug log:
srv3 (minion, now also running 2014.1.1) debug log:
|
Hrm, that is strange behavior. I'll reopen and we can do more investigation. |
For the record, I just tested this and my single minion in my test was also missing from the
So something does seem to be amiss there. Haven't had a chance to code dive to see what the "Returned" piece is supposed to be. |
* Fixed output of running minions (saltstack#9526) * Correctly identify returned minions.
* Bugfix from saltstack/salt#9526 added
* Fixed output of running minions (saltstack#9526) * Correctly identify returned minions.
Thank you @thatch45 ! I see that one of the fixes is to skip over the 'jid' header; salt / salt / runners / jobs.py has def active(): changed to be; @Whissi , I wonder if you still see the 'empty' jid in the "Returned" section after the fix. I was also puzzled what it is, I presume it's a header. |
* Bugfix from saltstack/salt#9526 added
@basepi , I don't understand why this fix was not picked for 2014.1.3, 2014.1,4 and now it is also missing in 2014.1.5. Can you help or explain? Thanks. |
@Whissi because no one ever pinged me that it needed to be cherry-picked. If there's a fix that goes in, and you want to make sure it gets into the next bugfix release, you need to make sure it gets a "Bugfix - Cherry-Pick" label (usually by pinging me). Pull requests that have been cherry-picked, and so will be in the next bugfix release (or are already in a bugfix release) have the label "Bugfix - [Done] Cherry-Pick". I have labeled the pull request for cherry-picking. |
* Fixed output of running minions (#9526) * Correctly identify returned minions.
Hi,
I am missing "srv3.example.org" in the list. And yes, srv3 is running the job:
The text was updated successfully, but these errors were encountered: