Reactivated juju-crashdump. #73

Merged
merged 2 commits into from Jan 30, 2017

Conversation

Projects
None yet
2 participants
Collaborator

petevg commented Jan 30, 2017

We skip it if it doesn't exist, but run it if it does.

We also now skip packing the matrix.log and glitch_plan into a tarball,
as that can fail if we are logging while doing it, and it can be handled
on a higher level.

Tweaked tests so that they don't expect matrix tarballs (or crashdump
tarballs.)

@johnsca @kwmonroe @ktsakalozos Testing this is a little tricky. tox -r -e functional works, but won't actually run crashdump. To run crashdump, you need to add it to your PATH, and run matrix manually (or add it to the virtualenv, and run tox without -r). Here's a good matrix invocation:

export PATH=$PATH:/path/to/plugins && mkdir /tmp/foo && matrix -l DEBUG -s raw -d /tmp/foo

Reactivated juju-crashdump.
We skip it if it doesn't exist, but run it if it does.

We also now skip packing the matrix.log and glitch_plan into a tarball,
as that can fail if we are logging while doing it, and it can be handled
on a higher level.

Tweaked tests so that they don't expect matrix tarballs (or crashdump
tarballs.)
Collaborator

petevg commented Jan 30, 2017

There are a couple of flaws in this PR, which I have been discovering as I have been running additional double-check tests, and letting glitch sow chaos.

  1. If there are no machines left, crashdump will throw an Exception. I am tempted to just say that this is part of life with glitch ... we could make crashdump behave better when run against an empty model, though.

  2. If glitch creates a machine as its last action, crashdump can fail with an ssh error, because the machine doesn't exist yet. I'm not 100% certain about the best way to address this, but am poking at it. Here's a glitch plan that will replicate this second issue:

actions:
- action: reboot
  selectors:
  - {application: ubuntu, selector: units}
  - {selector: leader, value: true}
  - {selector: one}
- action: add_unit
  selectors:
  - {application: ubuntu, selector: applications}
- action: kill_juju_agent
  selectors:
  - {application: ubuntu, selector: units}
  - {selector: leader, value: true}
  - {selector: one}
- action: remove_unit
  selectors:
  - {application: ubuntu, selector: units}
  - {selector: leader, value: true}
  - {selector: one}
- action: add_unit
  selectors:
  - {application: ubuntu, selector: applications}

Minor style suggestion on logging, but 👍 on the review regardless.

matrix/utils.py
+ success = await execute_process(cmd, log)
+ log.info("Crashdump result: {}".format(success))
+ except FileNotFoundError as e:
+ log.error(
@johnsca

johnsca Jan 30, 2017

Owner

For the logging, what about something like this:

    try:
        success = await execute_process(cmd, log)
        if success:
            log.info("Crashdump COMPLETE")
        else:
            log.error("Crashdump FAILED")
    except FileNotFoundError as e:
        log.warn("Crashdump SKIPPED (not installed)")

It would also be nice if, should it fail, the stderr is automatically logged as well, rather than having to re-run with -lDEBUG, maybe an option to execute_process?

@petevg

petevg Jan 30, 2017

Collaborator

@johnsca I like it. Will make changes and resubmit.

@johnsca johnsca merged commit c34230a into master Jan 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment