Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syndic stops forwarding job results if the local salt-master is restarted #34973

Closed
szjur opened this issue Jul 26, 2016 · 2 comments
Closed
Assignees
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P3 Priority 3 Salt-Syndic severity-high 2nd top severity, seen by most users, causes major problems
Milestone

Comments

@szjur
Copy link

szjur commented Jul 26, 2016

Description of Issue/Question

I built a setup with 2 masters and 2 syndics connecting to both of them. The thing is that if salt-master service is restarted on a syndic, it simply stops forwarding job results to the top level masters. The jobs are published and their results are coming back to the local salt-master, which is clearly seen in its log in debug mode. However, salt-syndic no longer forwards the results back to the top level master that initially published the job.
Restarting salt-syndic process fixes the problem but you need to remember to do it in case salt-master is restarted on a syndic. If you don't do that, you may no longer even be able to reach the minion on that syndic to remediate it.

Setup

Salt 2016.03.1. 4 servers - 2 masters (sharing the same key) and 2 syndics (also sharing the same key - different than the top level masters) connecting to both of them. Some minions connecting to the syndics in failover mode.

Steps to Reproduce Issue

Do a simple setup as decribed above, then restart salt-master on both syndic servers and watch the complete outage of your environment.

Versions Report

2016.03.1 on RHEL 6.5. All required packages taken from SaltStack repository.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jul 28, 2016

@szjur thanks for letting us know about this issue. Currently this might be a difficult problem to solve and has been reported to works sometimes. We will label it as a bug though so we can get this in our backlog of things to look more into.

Would you mind testing this with the tcp transport and posting back the results?

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior P3 Priority 3 severity-high 2nd top severity, seen by most users, causes major problems Salt-Syndic labels Jul 28, 2016
@Ch3LL Ch3LL added this to the Approved milestone Jul 28, 2016
@Ch3LL Ch3LL added the Core relates to code central or existential to Salt label Jul 28, 2016
@szjur
Copy link
Author

szjur commented Jul 28, 2016

Sure, I can do some tests. But where would you like me to change that transport - between the syndics and the masters only? For end nodes I've already set up ZMQ filtering - very poorly documented by the way, in particular for a multi-tier setup. Filtering is apparently not possible for TCP transport. What also worries me is that TCP transport is listed at https://docs.saltstack.com/en/latest/ref/configuration/master.html as experimental.

This problem is actually something that you can live with and workaround. The more serious issue with syndics I found is mutilating events coming back from minions (#34992). That completely ruined my use case started in a single-master setup.

@meggiebot meggiebot modified the milestones: C 5, Approved Aug 8, 2016
@DmitryKuzmenko DmitryKuzmenko added the fixed-pls-verify fix is linked, bug author to confirm fix label Aug 26, 2016
@DmitryKuzmenko DmitryKuzmenko removed the fixed-pls-verify fix is linked, bug author to confirm fix label Aug 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P3 Priority 3 Salt-Syndic severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

No branches or pull requests

4 participants