Skip to content

Bugfix: manage.down crash#48447

Merged
dwoz merged 16 commits intosaltstack:developfrom
isbm:isbm-manage.down-crash
Mar 5, 2019
Merged

Bugfix: manage.down crash#48447
dwoz merged 16 commits intosaltstack:developfrom
isbm:isbm-manage.down-crash

Conversation

@isbm
Copy link
Copy Markdown
Contributor

@isbm isbm commented Jul 5, 2018

What does this PR do?

Bugfix.

Previous Behavior

  • Stop master
  • salt-run manage.<anything>
  • Result:
Exception occurred in runner manage.down: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/salt/client/mixins.py", line 387, in _low
    data['return'] = self.functions[fun](*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/salt/runners/manage.py", line 191, in down
    ret = status(output=False, tgt=tgt, tgt_type=tgt_type).get('down', [])
  File "/usr/lib/python2.7/site-packages/salt/runners/manage.py", line 111, in status
    ret['up'], ret['down'] = _ping(tgt, tgt_type, timeout, gather_job_timeout)
  File "/usr/lib/python2.7/site-packages/salt/runners/manage.py", line 43, in _ping
    pub_data = client.run_job(tgt, 'test.ping', (), tgt_type, '', timeout, '', listen=True)
  File "/usr/lib/python2.7/site-packages/salt/client/__init__.py", line 347, in run_job
    raise SaltClientError(general_exception)
SaltClientError: Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.

New Behavior

Result:

$ salt-run manage.up
General error occurred:

    Salt request timed out. The master is not responding. You may need
    to run your command with `--async` in order to bypass the
    congested event bus. With `--async`, the CLI tool will print the
    job id (jid) and exit immediately without listening for responses.
    You can then use `salt-run jobs.lookup_jid` to look up the results
    of the job in the job cache later.

$ echo $?
70

Tests written?

No

@isbm isbm requested a review from a team as a code owner July 5, 2018 11:48
@ghost ghost self-requested a review July 5, 2018 11:48
Copy link
Copy Markdown
Contributor

@rallytime rallytime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good except for one PY2/3 compatibility comment.

Comment thread salt/client/mixins.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str() should be six.text_type here for PY3

@rallytime rallytime requested a review from cachedout July 5, 2018 17:21
@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Jul 6, 2018

@rallytime here. You have it! 😉

@rallytime
Copy link
Copy Markdown
Contributor

re-run all

Comment thread salt/client/mixins.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing a {0} here, I think? We've got two in .format, but only one {} string.

Copy link
Copy Markdown
Contributor Author

@isbm isbm Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KaiSforza yes, sir! We did. Thanks for spotting this. Fixd.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank the linter for spotting it! Hopefully it'll be better next time!

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Jul 11, 2018

@rallytime should be OK now.

@cachedout
Copy link
Copy Markdown
Contributor

Honestly, I really don't like this much change in the runner, where we've had regression problems in the past, just to cover this single edge case. Can't we just clean up the way the error is displayed in the LocalClient?

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Jul 12, 2018

@cachedout

Honestly, I really don't like this much change in the runner

But there are no changes actually. All I did is this:

  • Runner's .run() was doing conceptually two things at once, so I just removed if/else into two functions, so it is clearer now.

  • "just clean up the way the error is displayed in the LocalClient", as you say.

  • Add a generic text function that formats crashes like that all over the place. Useful to format it in all scripts, not just here. I am still working on CLI tools like these, because we have the same thing doing in different homes instead of keeping these sort of functions in one utils home.

So I don't expect any regressions. Instead I am trying to bring scattered bits into their "homes". It is not finished though, because other PRs are still pending and blocking it.

@cachedout
Copy link
Copy Markdown
Contributor

@isbm Thanks. Makes sense. What PRs are blocking this?

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Jul 20, 2018

@cachedout sorry for the mis-wording this: not an actual PRs, but the further moves. I was actually fixing the runner issue, but text formatter function supposed to be intermediary that I will refactor it seamlessly in my further PRs that are related here. Sometimes it just cannot go in parallel...

Comment thread salt/client/mixins.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to log the exception here given that we're returning an error to the user? We've never really standardized on one behavior over the other.

@isbm isbm force-pushed the isbm-manage.down-crash branch from 067ae52 to c524316 Compare July 23, 2018 08:27
@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Jul 23, 2018

Do we really need to log the exception here given that we're returning an error to the user? We've never really standardized on one behavior over the other.

@cachedout I personally think we have to return errors to the logs in any case. Because when you want to troubleshoot a production system, you usually take logs. And have all that recorded gives you a better view what user is trying to do. Besides, the exception originally only assumes that the error is wrong parameters. In fact, it can also be that the function is broken and something else might happen.

But I've changed this as you wish.

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Jul 23, 2018

@cachedout BTW, speaking of standard here would be a very good time, since I would have a take on logging messages to clean them up: lots of misleading and cryptic messages are going to all logging levels we use. So if you would add here your thoughts, that would be a very good timing.

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Jul 23, 2018

@cachedout just now I found it. An example why I think errors should not be assumed, but logged in details, separately, even if this is CLI. On Python 3, psutil is different and so ps.top module is totally broken at the moment. And this is how it looks like at the moment:

$ salt-call --local ps.top 1

Passed invalid arguments: unorderable types: Process() < Process().

Usage:

    Return a list of top CPU consuming processes during the interval.
    num_processes = return the top N CPU consuming processes
    interval = the number of seconds to sample CPU usage over

    CLI Examples:

    .. code-block:: bash

        salt '*' ps.top

        salt '*' ps.top 5 10

The "Passed invalid arguments: unorderable types: Process() < Process()" doesn't look right, because (-l debug) is actually this:

Traceback (most recent call last):
  File "/home/bo/work/salt/salt/cli/caller.py", line 212, in call
    ret['return'] = func(*args, **kwargs)
  File "/home/bo/work/salt/salt/modules/ps.py", line 156, in top
    for idx, (diff, process) in enumerate(reversed(sorted(usage))):
TypeError: unorderable types: Process() < Process()

P.S. I am fixing this too, BTW.

@rallytime
Copy link
Copy Markdown
Contributor

@isbm This has one lint failure.

It is also causing a bunch of related tests to fail. Can you take a look?

@isbm isbm force-pushed the isbm-manage.down-crash branch 2 times, most recently from a41a7c0 to 7e66b81 Compare August 3, 2018 08:14
@cachedout
Copy link
Copy Markdown
Contributor

Apologies, @isbm but we have a merge conflict on this now.

@isbm isbm force-pushed the isbm-manage.down-crash branch from f032b65 to a1e7885 Compare August 21, 2018 12:20
@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Aug 21, 2018

@cachedout done!

@cachedout
Copy link
Copy Markdown
Contributor

@isbm Thanks! There are a couple of DRY suggestions that CodeClimate has here. Do you want to take a look at those? It's not required of course but they're just suggestions.

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Aug 23, 2018

@cachedout but this is not really my code. I would go further refactoring of course, and slicing more, but then you will again say I am touching holy runner. 😉 Maybe let it pass finally as it is for now, and then in next PRs I will get back to this (since we have work around runners too, so could be an opportunity to watering climate down indeed).

@cachedout
Copy link
Copy Markdown
Contributor

@isbm Looks like some tests are unhappy with this change.

@cachedout
Copy link
Copy Markdown
Contributor

@isbm Bump. :)

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Aug 28, 2018

@cachedout yep, moment.

@cachedout
Copy link
Copy Markdown
Contributor

@isbm Thanks. Just ping me when we're ready here. :)

@rallytime
Copy link
Copy Markdown
Contributor

Hi @isbm - is this something you're still working on?

@isbm isbm force-pushed the isbm-manage.down-crash branch from f54ff37 to bf7cc1e Compare October 16, 2018 12:25
@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Oct 16, 2018

@rallytime there was tests issue, I rebased to refresh and see what's still broken.

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Nov 5, 2018

@rallytime this is kind of stuck. What do we do and what are all those errors? 😉

@thatch45
Copy link
Copy Markdown
Contributor

@isbm, what is the status here?

@dwoz dwoz merged commit ae69e87 into saltstack:develop Mar 5, 2019
@austinpapp
Copy link
Copy Markdown
Contributor

@dwoz this needs to be reverted. It's breaking the runner. Every ret is coming back with 70 which according to https://github.com/saltstack/salt/blob/develop/salt/defaults/exitcodes.py#L36 is an internal server error.

I found this by using run_run_plus() when several integration tests under runners were failing. specifically integration/runners/test_jobs.py

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Mar 19, 2019

@austinpapp or change error code here. I am no longer sure why it end up with the EX_SOFTWARE there, as the issue is soon year old.

@austinpapp
Copy link
Copy Markdown
Contributor

@isbm happy to make a change. However, im not sure what the intention of the else is for. as it stands, as long as new exception is raise in the try block, the ret is always returned as { 'retcode': 70}.

@isbm
Copy link
Copy Markdown
Contributor Author

isbm commented Mar 19, 2019

@austinpapp I am not really sure, seems in this commit: fa12414 I am actually doing this. But as much as I remember, I only spitted already existing work. Either code was already there like this, or got changed EX_OK to EX_SOFTWARE for the reasons I do not remember anymore. This was rebased also couple of times, so I would not be surprised that something got mistakenly in.

Nevertheless, seems if the whole this block would go away, that should do it. Because it works only if Exception doesn't happen and this return thus won't be overridden.

Again, apologies for the inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants