Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload crash dump when crash utility fails #4984

Merged
merged 1 commit into from
May 11, 2018

Conversation

jknphy
Copy link
Contributor

@jknphy jknphy commented May 7, 2018

Upload crash dump when crash utility fails. As requested in https://bugzilla.suse.com/show_bug.cgi?id=1090659 we need some way to provide the dump to be analyzed.

my $crash_cmd = "echo exit | crash `ls -1t /var/crash/*/vmcore | head -n1` /boot/vmlinux-`uname -r`$suffix";
assert_script_run "$crash_cmd", 600;
validate_script_output "$crash_cmd", sub { m/PANIC/ }, 600;
if (script_run($crash_cmd)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to increase timeout here too?

upload_logs '/tmp/crash_saved.tar' if is_sle('15+');
}
else {
assert_script_run "$crash_cmd", 600;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can skip this run as we already executed it on L51

record_soft_failure 'boo#1090659 - crash: invalid kernel virtual address: xxxxxxxxx type: yyyyyyyy';
script_run 'ls -lah /boot/';
script_run 'tar -cvf /tmp/crash_saved.tar /var/crash/*';
upload_logs '/tmp/crash_saved.tar' if is_sle('15+');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why record the soft failure and save the tar file but not upload it on non-sle-15+?

}
else {
assert_script_run "$crash_cmd", 600;
validate_script_output "$crash_cmd", sub { m/PANIC/ }, 600;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already in the original code the $crash_cmd was called twice? Would it work to call script_run "$crash_cmd |& tee /dev/$serialdev and then check with wait_serial? I guess not in this order as the serial output is consumed by script_run checking for the exit status. But probably script_run '…', 0 and an explicit wait_serial for both the output and the exit code would work if you want to play around with that, e.g. something like

script_run("$crash_cmd |& tee /dev/$serialdev; echo crash-status-\$? > /dev/$serialdev", 0);
if (!wait_serial('PANIC', 600)) {
    diag "crash could not find crash file";
    …
    return;
}
wait_serial('crash-status-0') or die "crash did not complete successfully";

@jknphy jknphy force-pushed the fix_toolchain_zypper branch 3 times, most recently from bef5157 to a3a8c33 Compare May 9, 2018 10:37
@jknphy jknphy changed the title Upload crash dump when crash utility fails [WIP] Upload crash dump when crash utility fails May 9, 2018
@jknphy jknphy force-pushed the fix_toolchain_zypper branch 2 times, most recently from 9717aa9 to 916256f Compare May 10, 2018 06:41
@jknphy
Copy link
Contributor Author

jknphy commented May 10, 2018

The only way that I found to keep the timeout, evaluate both possible errors and upload the file is in the latest code. Basically this is what it does:

  • Run the script and immediately start to read the serial output.
  • Search for "PANIC: ", which in my opinion was incomplete so I extended, because some text need to appear after "PANIC:" reflecting how to dump was triggered.
  • Search for the output of the command.
  • Search both at the same time with one regex, I found it more reliable than searching sequentially
  • Avoid to wait for the whole timeout using if instead of unless
  • Upload the file (reminder: main goal of this task).

VR: http://dhcp254.suse.cz/tests/1212#step/kdump_and_crash/52

@jknphy jknphy changed the title [WIP] Upload crash dump when crash utility fails Upload crash dump when crash utility fails May 10, 2018
return;
}
else {
record_soft_failure 'boo#1090659 - crash: invalid kernel virtual address';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looks a bit generic because now this branch would always be hit in case the above wait_serial either did not find PANIC, non-zero exit code or timeout.
You could check the actual return of wait_serial in detail

@jknphy jknphy force-pushed the fix_toolchain_zypper branch 2 times, most recently from 0ed865f to 529f578 Compare May 11, 2018 13:34
@jknphy
Copy link
Contributor Author

jknphy commented May 11, 2018

I realized that in ppc it fails but the command ended, so we can do it in post_fail_hook in a cleaner way. We just need to remove the post_fail_hook when the bug is solved. The code was starting to be very cumbersome using the serial output and in this way we still check if the crash is ok and if the crash have a PANIC written. The post_fail_hook is generic for any kind of problem about crash, so it will be useful for future.
VR: http://dhcp254.suse.cz/tests/1222#step/kdump_and_crash/65
VR forcing the failing to see how it looks (searching, for instance XPANIC, instead ): http://dhcp254.suse.cz/tests/1223#step/kdump_and_crash/65

@okurz okurz merged commit 210cd47 into os-autoinst:master May 11, 2018
@okurz
Copy link
Member

okurz commented May 11, 2018

Nice, much cleaner :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants