Fix a race condition in manage runner #44958

terminalmage · 2017-12-13T08:49:03Z

The runner was not connecting the client event listener to the event bus. This meant that all events between the client.run_job() and when client.get_cli_event_returns() began polling for events would be lost. Before get_cli_event_returns (via LocalClient's get_iter_returns) gets around to polling for events, it first checks to see if the job cache has a record of the jid it's querying. When using a custom returner for the job cache, one which has even a little bit of latency, return events from minions may sneak past before the jid lookup is complete, meaning that get_iter_returns will not return them and the manage runner will assume the minion did not respond.

Connecting to the event bus before we run the test.ping ensures that we do not miss any of these events.

Resolves #44820

NOTE you can test this fix by adding a time.sleep(5) after the client.run_job(). This will introduce enough latency between the run_job and get_cli_iter_returns to trigger the behavior. With the connect_pub line commented out, the minion will fail, even though with debug logging turned on you can clearly see the return come in. With the fix in place, the sleep does not prevent the return event from being processed by get_cli_iter_returns, and the return from salt-run manage.status is as expected.

terminalmage · 2017-12-13T08:51:52Z

@msteed the fix here (as well as the addition of the sleep to aid in confirmation) should be simple enough for you to try on the master you're using for testing purposes.

The runner was not connecting the client event listener to the event bus. This meant that all events between the `client.run_job()` and when `client.get_cli_event_returns()` began polling for events would be lost. Before `get_cli_event_returns` (via LocalClient's `get_iter_returns`) gets around to polling for events, it first checks to see if the job cache has a record of the jid it's querying. When using a custom returner for the job_cache, one which has even a little bit of latency, return events from minions may sneak past before the jid lookup is complete, meaning that `get_iter_returns` will not return them and the manage runner will assume the minion did not respond. Connecting to the event bus before we run the test.ping ensures that we do not miss any of these events.

terminalmage · 2017-12-13T09:06:36Z

Actually I noticed that the run_job func has a listen arg that does the same thing I was doing manually in my initial commit. I just pushed a cleaner fix.

msteed · 2017-12-13T19:59:51Z

@terminalmage : Thanks, this works perfectly

terminalmage · 2017-12-13T20:00:45Z

👍

Backport #44958 to 2016.11 branch

terminalmage added ZRELEASED - 2017.7.3 ZRELEASED - 2018.3.0 labels Dec 13, 2017

terminalmage mentioned this pull request Dec 13, 2017

Custom returner breaks manage runner #44820

Closed

terminalmage added 2 commits December 13, 2017 03:05

No need to manually do connect_pub, use listen=True in run_job

ef749ab

terminalmage force-pushed the issue44820 branch from f5add73 to ef749ab Compare December 13, 2017 09:06

cachedout approved these changes Dec 13, 2017

View reviewed changes

cachedout merged commit dad2d72 into saltstack:2017.7 Dec 13, 2017

terminalmage mentioned this pull request Dec 13, 2017

Backport #44958 to 2016.11 branch #44972

Merged

rallytime pushed a commit that referenced this pull request Dec 14, 2017

Merge pull request #44972 from terminalmage/bp-44958

9a74062

Backport #44958 to 2016.11 branch

terminalmage deleted the issue44820 branch January 5, 2018 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a race condition in manage runner #44958

Fix a race condition in manage runner #44958

terminalmage commented Dec 13, 2017 •

edited

terminalmage commented Dec 13, 2017

terminalmage commented Dec 13, 2017

msteed commented Dec 13, 2017

terminalmage commented Dec 13, 2017

Fix a race condition in manage runner #44958

Fix a race condition in manage runner #44958

Conversation

terminalmage commented Dec 13, 2017 • edited

terminalmage commented Dec 13, 2017

terminalmage commented Dec 13, 2017

msteed commented Dec 13, 2017

terminalmage commented Dec 13, 2017

terminalmage commented Dec 13, 2017 •

edited