Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master failover in minion configuration not working as expected #57660

Closed
doesitblend opened this issue Jun 12, 2020 · 2 comments · Fixed by #57699
Closed

Master failover in minion configuration not working as expected #57660

doesitblend opened this issue Jun 12, 2020 · 2 comments · Fixed by #57699
Assignees
Labels
Bug broken, incorrect, or confusing behavior Magnesium Mg release after Na prior to Al severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around ZD The issue is related to a Zendesk customer support ticket.
Projects
Milestone

Comments

@doesitblend
Copy link
Contributor

Description
If a name for configured Salt master cannot be resolved in DNS it silently falls back to 127.0.0.1. It might be Ok for some cases but definitely isn’t Ok of our case when we use ‘master_type=failover’.

With this (see below) configuration we would expect that if minion cannot resolve master1 it would try master2 and then master3, instead it resolves master1 to 127.0.0.1 and tries to connect there. After some timeout it proceeds to the next one and follows the same procedure, tries to resolve, if this DNS resolution fails tries 127.0.0.1 and so on. That creates following issues for use:

  • Unnecessary delays on minion start and salt-call operations when their primary master is down
  • On hosts that run both minion and masters that leads into undesired communication between local minion and local master. That local master not always trusts local minion and in such cases local minion fails completely it its primary master is unresolvable.

The problem resides in resolve_dns function in minion.py module. It has a second parameter “fallback=True” that controls fallback to 127.0.0.0 for address that cannot be resolved. Then when minion code loops though the list of given masters it calls that function without second parameter and builds master_uri as tcp://127.0.0.1:4506 for master that cannot be resolved.

Instead it should skip such master all together and raise an exception if no master can be reached. Or that “fallback” feature has to be configurable through minion configuration file.

Setup
Add the following configuration to your minion configuration:

# master failover configuration
master:
- master1
- master2
- master3
master_failback: False
random_master: False
master_shuffle: False
master_type: failover
master_alive_interval: 600
# mater_type failover wants retry_dns 0
retry_dns: 0

Expected behavior
Masters should be attempted in order specified in minion configuration.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
2018.3.2, 3000.2
@doesitblend doesitblend added Bug broken, incorrect, or confusing behavior ZD The issue is related to a Zendesk customer support ticket. CS-R2 labels Jun 12, 2020
@doesitblend
Copy link
Contributor Author

ZD-5282

@sergeyfd
Copy link
Contributor

Thanks for creating it.

@sagetherage sagetherage added this to the Approved milestone Jun 15, 2020
@sagetherage sagetherage added the severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around label Jun 15, 2020
@sagetherage sagetherage added this to Planning in Magnesium Jun 15, 2020
@sagetherage sagetherage added the Magnesium Mg release after Na prior to Al label Jun 15, 2020
@sagetherage sagetherage moved this from Planning to Commit in Magnesium Jun 24, 2020
@sagetherage sagetherage moved this from Commit to In progress in Magnesium Jul 14, 2020
@sagetherage sagetherage modified the milestones: Approved, Magnesium Jul 14, 2020
Magnesium automation moved this from In progress to Done Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Magnesium Mg release after Na prior to Al severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around ZD The issue is related to a Zendesk customer support ticket.
Projects
No open projects
Magnesium
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants