Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fallback to 127.0.0.1 for un-resolvable masters in failover mode #57699

Merged
merged 23 commits into from Oct 8, 2020

Conversation

sergeyfd
Copy link
Contributor

What does this PR do?

Do not fall back to 127.0.0.1 in DNS resolution for master nodes in master_type is `failover'

What issues does this PR fix or reference?

Fixes: #57660

Merge requirements satisfied?

Commits signed with GPG?

No

@sergeyfd sergeyfd requested a review from a team as a code owner June 17, 2020 17:34
@ghost ghost requested review from xeacott and removed request for a team June 17, 2020 17:35
@sagetherage sagetherage added Bug broken, incorrect, or confusing behavior Magnesium Mg release after Na prior to Al ZD The issue is related to a Zendesk customer support ticket. labels Jun 18, 2020
DmitryKuzmenko
DmitryKuzmenko previously approved these changes Jun 22, 2020
@DmitryKuzmenko
Copy link
Contributor

@sergeyfd could you please write a test for your change?

@DmitryKuzmenko DmitryKuzmenko added the Needs-Testcase PR needs test cases written, or the issue is about a bug/feature that needs test cases label Jun 22, 2020
@sergeyfd
Copy link
Contributor Author

Yes, I'll try. For some reason "setup.py test" fails in my setup with import error for saltfactories.

@DmitryKuzmenko
Copy link
Contributor

DmitryKuzmenko commented Jun 22, 2020

@sergeyfd have you tried nox? Like this:

nox -e 'pytest-zeromq-3.7(coverage=False)' -- -v tests/integration/shell/test_minion.py

You also can specify the test you want to run with colons:

nox -e 'pytest-zeromq-3.7(coverage=False)' -- -v tests/integration/shell/test_minion.py::MinionTest::test_exit_sigterm

@sergeyfd
Copy link
Contributor Author

That works, thanks! Will add a couple of unit tests later this week.

@DmitryKuzmenko
Copy link
Contributor

@sergeyfd thank you! Glad to help! Ping me anytime here or in community Slack

@sergeyfd
Copy link
Contributor Author

@DmitryKuzmenko , done. Please take a look. Should have done it from the beginning, the patch was slightly incorrect.

self.assertEqual(opts["master"], "master2")
return MockPubChannel()

io_loop = salt.ext.tornado.ioloop.IOLoop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a tornado testing package that makes things a way easier for testing async code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it deserves it because it doesn't test any async functionality. io_loop here isn't needed here at all, so I removed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitryKuzmenko Are we Ok now?

@DmitryKuzmenko
Copy link
Contributor

@sergeyfd thank you for your work! Could you please take a look at the mentioned tornado testing package and update the tests correspondingly? The package is already used in some tests you can just grep AsyncTestCase in tests folder for examples.

DmitryKuzmenko
DmitryKuzmenko previously approved these changes Jul 7, 2020
@DmitryKuzmenko
Copy link
Contributor

I'm good now, thank you!
I've restarted 3 timed out builds. Let's wait for results.

@DmitryKuzmenko
Copy link
Contributor

@sergeyfd it looks timeout failures are related:

[2020-07-07T16:22:54.453Z]          test_master_type_failover (unit.test_minion.MinionTestCase)
[2020-07-07T16:22:55.063Z]        [CPU:0.0%|MEM:26.0%]  ... 16:22:54,414 [salt.loader                                                                                                                                                                   :870 ][CRITICAL] Failed to load grains defined in grain file pending_reboot.pending_reboot in function <function pending_reboot at 0x000001BA9F282CA8>, error:
[2020-07-07T16:22:55.063Z]        Traceback (most recent call last):
[2020-07-07T16:22:55.063Z]          File "c:\users\administrator\appdata\local\temp\kitchen\testing\.nox\runtests-parametrized-3-coverage-true-crypto-none-transport-zeromq\lib\site-packages\win32com\client\dynamic.py", line 89, in _GetGoodDispatch
[2020-07-07T16:22:55.063Z]            IDispatch = pythoncom.connect(IDispatch)
[2020-07-07T16:22:55.063Z]        pywintypes.com_error: (-2147221021, 'Operation unavailable', None, None)
[2020-07-07T16:22:55.063Z]        
[2020-07-07T16:22:55.063Z]        During handling of the above exception, another exception occurred:
[2020-07-07T16:22:55.063Z]        
[2020-07-07T16:22:55.063Z]        Traceback (most recent call last):
[2020-07-07T16:22:55.063Z]          File "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\kitchen\\testing\salt\loader.py", line 859, in grains
[2020-07-07T16:22:55.063Z]            ret = funcs[key](**kwargs)
[2020-07-07T16:22:55.063Z]          File "C:\Users\ADMINI~1\AppData\Local\Temp\kitchen\testing\salt\grains\pending_reboot.py", line 31, in pending_reboot
[2020-07-07T16:22:55.064Z]            return {"pending_reboot": salt.utils.win_system.get_pending_reboot()}
[2020-07-07T16:22:55.064Z]          File "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\kitchen\\testing\salt\utils\win_system.py", line 522, in get_pending_reboot
[2020-07-07T16:22:55.064Z]            if check():
[2020-07-07T16:22:55.064Z]          File "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\kitchen\\testing\salt\utils\win_system.py", line 488, in get_pending_windows_update
[2020-07-07T16:22:55.064Z]            return salt.utils.win_update.needs_reboot()
[2020-07-07T16:22:55.064Z]          File "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\kitchen\\testing\salt\utils\win_update.py", line 1095, in needs_reboot
[2020-07-07T16:22:55.064Z]            obj_sys = win32com.client.Dispatch("Microsoft.Update.SystemInfo")
[2020-07-07T16:22:55.064Z]          File "c:\users\administrator\appdata\local\temp\kitchen\testing\.nox\runtests-parametrized-3-coverage-true-crypto-none-transport-zeromq\lib\site-packages\win32com\client\__init__.py", line 95, in Dispatch
[2020-07-07T16:22:55.064Z]            dispatch, userName = dynamic._GetGoodDispatchAndUserName(dispatch,userName,clsctx)
[2020-07-07T16:22:55.064Z]          File "c:\users\administrator\appdata\local\temp\kitchen\testing\.nox\runtests-parametrized-3-coverage-true-crypto-none-transport-zeromq\lib\site-packages\win32com\client\dynamic.py", line 114, in _GetGoodDispatchAndUserName
[2020-07-07T16:22:55.064Z]            return (_GetGoodDispatch(IDispatch, clsctx), userName)
[2020-07-07T16:22:55.064Z]          File "c:\users\administrator\appdata\local\temp\kitchen\testing\.nox\runtests-parametrized-3-coverage-true-crypto-none-transport-zeromq\lib\site-packages\win32com\client\dynamic.py", line 91, in _GetGoodDispatch
[2020-07-07T16:22:55.064Z]            IDispatch = pythoncom.CoCreateInstance(IDispatch, None, clsctx, pythoncom.IID_IDispatch)
[2020-07-07T16:22:55.064Z]        pywintypes.com_error: (-2147024891, 'Access is denied.', None, None)
[2020-07-07T16:22:55.064Z]        16:22:54,429 [salt.minion                                                                                                                                                                   :651 ][CRITICAL] 'master_type' set to 'failover' but 'retry_dns' is not 0. Setting 'retry_dns' to 0 to failover to the next master on DNS errors.
[2020-07-07T16:53:34.095Z] Cancelling nested steps due to timeout

@DmitryKuzmenko
Copy link
Contributor

Sorry I can't say anything useful by a quick look. I'll return to this later to analyze the failures.

@krionbsd
Copy link
Contributor

krionbsd commented Jul 8, 2020

rerun windows

@krionbsd
Copy link
Contributor

krionbsd commented Jul 8, 2020

re-run windows2016

@krionbsd
Copy link
Contributor

krionbsd commented Jul 8, 2020

re-run windows2019

@sergeyfd
Copy link
Contributor Author

sergeyfd commented Jul 8, 2020

@krionbsd I don't think re-run will help. It constantly times out on these new unit tests in Windows. Don't know why.

@garethgreenaway garethgreenaway removed the Needs-Testcase PR needs test cases written, or the issue is about a bug/feature that needs test cases label Jul 15, 2020
@sagetherage
Copy link
Contributor

We are looking into the Windows tests as they are a problem more globally.

@dwoz dwoz merged commit 6a30ab4 into saltstack:master Oct 8, 2020
@welcome
Copy link

welcome bot commented Oct 8, 2020

Congratulations on your first PR being merged! 🎉

@sergeyfd sergeyfd deleted the failover-fix branch October 22, 2020 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior has-failing-test Magnesium Mg release after Na prior to Al ZD The issue is related to a Zendesk customer support ticket.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Master failover in minion configuration not working as expected
7 participants