Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syndic strips vital parts of events (such as 'retcode' and 'success') #34992

Closed
szjur opened this issue Jul 27, 2016 · 3 comments
Closed

Syndic strips vital parts of events (such as 'retcode' and 'success') #34992

szjur opened this issue Jul 27, 2016 · 3 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P3 Priority 3 Salt-Syndic severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@szjur
Copy link

szjur commented Jul 27, 2016

Description of Issue/Question

I am using salt-api to send asynchronous commands to SaltStack and listen to the events with their results. After going over from a single master setup to 2x master + 2x syndic architecture I noticed that syndics blatantly strip some vital parts of events.

See this return event for cmd.run execution on a minion connected directly to the top level master:

{u'tag': u'salt/job/20160727122554918097/ret/srv5.my.domain.com', u'data': {u'fun_args': [u'rm /xxx'], u'jid': u'20160727122554918097', u'return': u"rm: cannot remove `/xxx': No such file or directory", u'retcode': 1, u'success': True, u'cmd': u'_return', u'_stamp': u'2016-07-27T10:25:54.971787', u'fun': u'cmd.run', u'master_id': u'srv5.my.domain.com', u'id': u'srv5.my.domain.com'}}

And this is what it looks like when it goes through a syndic:

{u'tag': u'salt/job/20160727122920664709/ret/srv7.my.domain.com', u'data': {u'fun_args': None, u'jid': u'20160727122920664709', u'return': u"rm: cannot remove `/xxx': No such file or directory", u'_stamp': u'2016-07-27T10:29:21.204630', u'fun': u'cmd.run', u'master_id': u'srv5.my.domain.com', u'id': u'srv7.my.domain.com'}}

As you can see 'retcode' and 'success' are gone. I analysed master.py and minion.py code and it seems that all they care in case of events passed from a syndic is 'return'. The method _syndic_return() in class AESFuncs in master.py leaves no doubts to that. Also the sending part of the code seems to lose that information and obviously doesn't send it. While debugging I saw that _process_event() in minion.py gets the proper contents of the event but by the time it reaches _return_pub() it is already gone.

This is really problematic. If you use cmd.run to execute remote commands you will no longer be able to see their exit code when you use master-syndic architecture.

Setup

Salt 2016.03.1. 4 servers - 2 masters (sharing the same key) and 2 syndics (also sharing the same key - different than the top level masters) connecting to both of them. A bunch of minions connecting to the syndics in failover mode.

Steps to Reproduce Issue

Run

salt directly.connected.minion.name cmd.run rm /nonexistent

on any minion directly connected to the master. You will see the command output and some info that the result code was != 0 (by the way the exit code of that execution will most likely be 11, not the real code of the remote command...).

Then from the top level master run

salt minion.connected.via.a.syndic cmd.run rm /nonexistent

on any minion connected via a syndic. You will no longer see the info that the result code of the command was not 0.

Versions Report

2016.03.1

@Ch3LL
Copy link
Contributor

Ch3LL commented Jul 29, 2016

@szjur thank you very much for pointing this out. This indeed appears to be a bug and looks like we need to ensure the event data is the same regardless of the data being sent to a master or through a syndic. Will label this a bug so we can get this fixed up. Thanks

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior P3 Priority 3 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around labels Jul 29, 2016
@Ch3LL Ch3LL added this to the Approved milestone Jul 29, 2016
@Ch3LL Ch3LL added Core relates to code central or existential to Salt Salt-Syndic labels Jul 29, 2016
@szjur
Copy link
Author

szjur commented Jul 29, 2016

I debugged it for hours, as I had never analysed SaltStack code before and came up with a simple hack for that. I can't share the exact diff for that due to legal reasons, but I think I can safely give you some ideas. Basically in minion.py in _process_event() at some point you do

jdict[event['data']['id']] = event['data']['return']

ignoring 'retcode', 'success' and probably a few more.

Then in master.py _syndic_return() the following code:
ret = {'jid': load['jid'],
'id': key,
'return': item}

also cares about 'return' only. I'm sure you guys can fix it in a more professional way than I did, having not much clue about what is going on in all that code.

@meggiebot meggiebot modified the milestones: C 5, Approved Aug 8, 2016
@DmitryKuzmenko DmitryKuzmenko added the fixed-pls-verify fix is linked, bug author to confirm fix label Aug 18, 2016
@DmitryKuzmenko DmitryKuzmenko removed the fixed-pls-verify fix is linked, bug author to confirm fix label Aug 24, 2016
@szjur
Copy link
Author

szjur commented Aug 24, 2016

Excellent guys! :-) Now make it go to the job cache on the master (#35691) and I'll start loving SaltStack ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P3 Priority 3 Salt-Syndic severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

4 participants