-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
publish.publish should return same info as running on salt-master #4402
Comments
Hmm, this is serious, we may need to update some of the peer system, I will take care of this |
If there's anything I can do, please let me know. |
Thanks, I think that this interface probably just needs to be updated with some of the changes in the event bus. It is mostly getting to it time-wise for me, I might be a few more days. If you want to take a look it is in the minion_publish method in master.py |
I found minion_publish. Can you point me at the relevant changes to the event bus? |
The question is what method is being used in the localclient, the object is called self.client or self.local and it should use a function that uses the event bus like cmd_iter |
It calls if ret_form == 'clean':
return self.local.get_returns( # 2nd call
jid,
self.ckminions.check_minions(
clear_load['tgt'],
expr_form
),
timeout
)
elif ret_form == 'full':
ret = self.local.get_full_returns( # 3rd call
jid,
self.ckminions.check_minions(
clear_load['tgt'],
expr_form
),
timeout
)
ret['__jid__'] = jid
return ret I don't see a There's also a I traced thorough into the various Next steps? Where can I read about event bus stuff? |
Ouch, yes, this needs to be changed over. There are some docs on the event system: The local client has some legacy commands that use filesystem polling, I thought I had gotten them all out, but these get commands use filesystem polling. |
Fixed by 029bdad |
So, I just upgraded to 0.15.0 across all boxes. Still have this issue.
There are currently 51 boxes, and here I have only 31 responses. |
Very odd indeed, I could by no means reproduce this one I made the changes, we will revisit this! |
Let me know if there's anything I can do to help repro the issue. We have a pretty vanilla deploy, but maybe it's environmental? How would I nail that down? |
I wonder if bumping up the worker_threads on the master might help? What are they at now? |
#worker_threads: 5 Commented out in the config (default). What would be a reasonable setting? Any other settings I should also consider adjusting? |
the peer system can hold onto workers, and so it can cause for some things to be dripped here, lets try bumping it up to 15 and see if that helps. |
I bumped to 15 and that seems to reduce the number of missing results. I got 45 answers. I bumped to 20 worker threads and got 46 answers pretty consistently. I note that my master's logfile has
I'm guessing there's some tuning I need to do here. Is there a wiki page with instructions for this? |
oh, those are always there, I need to change these log messages to info. I am wondering if there is a timeout issue somewhere, how long does it take for the data to return? |
I fixed that error message by putting ulimit 100000 in the /etc/init/salt-master.conf script stanza. That'd be a pretty easy thing to throw into the .debs. Who can I talk to about that? Running time sudo salt-call publish.publish '*' network.ip_addrs '' 'glob' 30 | grep ':$' | sort | wc -l I'm getting between 11s and 15s pretty consistently. |
hmm, so it is hitting the timeouts, this is good information, although it just confirms that the minions are not all being caught, it might be a case of salt being too fast for salt as well, although I thought we got these covered. after the peer call does the command show up in: This is important to hunt down, but once 0.15.1 comes out with the mine interface fixed up it might be a faster and more reliable interface to use here. |
ahammond@staging05:~$ time sudo salt-call publish.publish '*' network.ip_addrs '' 'glob' 30 | grep ':$' | sort | wc -l
[INFO ] Configuration file path: /etc/salt/minion
[INFO ] Package debconf-utils is not installed.
[INFO ] Publishing 'network.ip_addrs' to tcp://209.114.36.150:4506
46
real 0m12.815s
user 0m0.890s
sys 0m0.380s Appears in list_jobs: '20130507013136745102':
Arguments: []
Function: network.ip_addrs
Start Time: 2013, May 07 01:31:36.745102
Target: '*'
Target-type: glob
User: root And the output matches what I'm getting when I run the command directly on the master: root@salt:/srv/salt# salt-run jobs.lookup_jid 20130507013136745102 | grep ':$' | wc -l
48 root@salt:/srv/salt# salt \* network.ip_addrs | grep ':$' | wc -l
48 So... apparently it's a matter of getting the results back to the minion that published the call? |
very good data this! |
ok, this is going to require a new method to be added to the localclient class, so I will have this soon :) |
Any news on this? Can I help? |
I believe this has been fixed in the develop branch. Can you test that @ahammond ? |
I'm willing to do some testing. Do I need to upgrade just the master or all the minions, too? |
You'd probably have to upgrade both the master and the minions. Is that incorrect, @thatch45 ? |
Yes, I just pushed updates to git while I was on the plane that should fix this but I have only been able to test it a little. You should only need to update the master |
I upgraded the master to 665965f this afternoon.
And now my publish.publish call from the minion is getting zero results:
I checked
|
Yes, I just fixed this yesterday, there was a case where it was working and a case where it was failing that I did not see initially. It should be better now |
Looks good, although I see that
That strikes me as a little bit odd / awkward. Will I need to filter it out of other |
Thanks, nice catch |
Thanks for fixing this! Now I've just got to get the
issue resolved and my (rather complicated, but hopefully clever and useful) hosts management state will be working. 😄 |
When I run a command directly via salt on the salt-master, I get responses from all my minions. When I run the same command on a minion via salt-call publish.publish, I get responses from only about 3/4 of my minions.
I updated salt and 0mq to lastest stable:
And also the master:
I'm using the following command on the master for testing:
Which looks like it's getting results from all minions. However
Even when I increase the timeout for publish.publish, I still am not getting responses from all systems.
I ran this command a number of times and saved the output, looking for commonalities in which minions are not responding, but all of the minions responded at least once. As near as I can tell, which minions are included in the output is random, but it always hovers around 36. Changing the command to test.ping has no effect on the number of responding minions, nor which minions respond.
I checked logfiles for both minion and master. The only error I see is "The value for the 'max_open_files' setting, 100000, is higher than what the user running salt is allowed to raise to, 4096. Defaulting to 4096."
The text was updated successfully, but these errors were encountered: