Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look for problems in the system #1713

Merged
merged 1 commit into from Aug 23, 2016

Conversation

mnowaksuse
Copy link

@mnowaksuse mnowaksuse commented Aug 15, 2016

POO#11930

This new sub problem_detection() detects problems in the installed
system and uploads log to worker. Currently it looks into systemd
journal, coredumpctl, systemctl et all for core dumps, signs of python
traceback, general problems in the journal etc. (Other proposals
welcome.)

I implemented POO#11930 not as an enhancement to textinfo.pm test but as
a function which might be called from different places (installed system
and even installer itself), and multiple times. Atm I execute it from
console/consoletest_setup.pm and x11/shutdown.pm. Problem of current
approach is that it adds 5-6 minutes to the test suite per execution.

Verification run: http://assam.suse.cz/tests/2678#downloads.

@mnowaksuse mnowaksuse added the WIP Work in progress label Aug 15, 2016
@okurz
Copy link
Member

okurz commented Aug 15, 2016

Interesting approach. I like the idea in general. What I am not so sure about is that you are calling this in "consoletest_setup" which is a part of a lot of scenarios and will further slow down the execution in cases when everything is just fine. Also, as complexity increases in your problem analysis, jobs can also fail in there. I suggest to better separate what is "setup", what is "gather generic information" and "do analysis of a real problem".

For a start, maybe you can even in the current state move out of the logs gathering from "consoletest_setup" and make that module just what it says, a "setup", e.g. only including setting up the serial port, consoles, making sure packagekit is stopped, etc.

The sub problem_detection by itself looks good but can we start by calling it in post_fail_hook for a start and discuss later where it should be called even in case of passing test modules?

@mnowaksuse
Copy link
Author

http://assam.suse.cz/tests/2774#step/consoletest_setup/31

Removed some duplicities from console/consoletest_setup, moved analysis to export_logs(). problem_detection() is now being called from export_logs() only but can be called from tests as well where needed. problem_detection()'s analysis is now bit faster as it's result is packed and uploaded only once.

Still:

  1. Calling save_and_upload_log() feels overcomplicated.
  2. My feeling is that once core dump is found on the system, we reviewer should be notified as it's probably something we want to be fixed anyway. But using assert_script_run() just end the sub (as it should): http://assam.suse.cz/tests/2777#step/consoletest_setup/57.

}
else {
script_run("$cmd | tee $file");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perl beginner question: As $args should always exist as a variable, does it work to just write

script_run("$cmd | tee $file", $args->{timeout});

in any case without getting a warning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually works as use warnings is not present. Thanks for the hint.

@okurz
Copy link
Member

okurz commented Aug 19, 2016

can't access your machine, is it down?

@mnowaksuse
Copy link
Author

can't access your machine, is it down?

It's back online now.

@mnowaksuse
Copy link
Author

@@ -70,16 +70,14 @@ sub run() {
script_run "ps axf > /tmp/psaxf.log";
script_run "cat /proc/loadavg > /tmp/loadavg_consoletest_setup.txt";

# Just after the setup: let's see the network configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need it in each succeeded run, please do not remove it or only uploading when the test failed.

POO#11930

This new sub problem_detection() detects problems in the installed
system and uploads log to worker. Currently it looks into systemd
journal, coredumpctl, systemctl et all for core dumps, signs of python
traceback, general problems in the journal etc. (Other proposals
welcome.)

I implemented POO#11930 not as an enhancement to textinfo.pm test but as
a function which might be called from different places (installed system
and even installer itself), and multiple times.
@okurz
Copy link
Member

okurz commented Aug 22, 2016

LGTM

wanna remove the WIP label?

@mnowaksuse mnowaksuse removed the WIP Work in progress label Aug 23, 2016
@nilxam
Copy link
Member

nilxam commented Aug 23, 2016

LGTM

@nilxam nilxam merged commit d8042cc into os-autoinst:master Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants