Schedule test modules to debug poo#88273 #12791

alvarocarvajald · 2021-06-25T14:52:54Z

The ticket https://progress.opensuse.org/issues/88273 describes a frequent test failure on the QAM TestRepo 12-SP3 HA rolling update tests that requires more investigation.

This PR is adding 2 modules to check some assets' integrity both in the worker and in SUT to verify if changes to these files could explain the frequent issues.

The issue itself can be described as follows: 2 MM jobs are started from the same qcow2 image in the same job group, to eventually configure them as an HA cluster; before any HA configuration is done, both SUT boot from the qcow2, are registered to SCC with migration/register_system and updated with update/zypper_up; then both SUT are rebooted. and even though by this time both tests should be identical, one of them is seen successfully booting, while the other shows an Invalid Magic Number error in grub while booting and fails the test, failing the whole MM job: https://openqa.suse.de/tests/6323075#step/console_reboot/4

Issue is frequent enough in openqa.suse.de to be a concern (https://openqa.suse.de/tests/6323075#next_previous), but cloning the job into our development environment shows the test always passing: http://mango.qa.suse.de/tests/4140#next_previous

Tests were moved from openqaworker10 to openqaworker9 as an initial attempt to fix/gather more details on the failure, but this has not changed the outcome in openqa.suse.de, which is why these changes are now being submitted.

Related ticket: https://jira.suse.com/browse/TEAM-4330 & https://progress.opensuse.org/issues/88273
Needles: N/A
Verification run: http://mango.qa.suse.de/tests/4143 & http://mango.qa.suse.de/tests/4144

This commit adds the info/show_hdd_info test module. The purpose of the module is to record in the test details and in autoinst-log information such as name, size and checksum digest for HDD_# test assets.

This commit adds the console/check_boot_files test module, the purpose of which is to record in the test details MD5 checksums for files in /boot. It will focus exclusively on vmlinu*, initrd*, config*, sysctl* and symver*.

ricardobranco777

LGTM. Thanks!

okurz

In your git commit message subject s/uptate/update/

But in general, do you actually need this changed merged to master or just want to have it temporarily and use http://open.qa/docs/#_triggering_tests_based_on_an_any_remote_git_refspec_or_open_github_pull_request

alvarocarvajald · 2021-06-28T08:26:43Z

In your git commit message subject s/uptate/update/

Fixed.

But in general, do you actually need this changed merged to master or just want to have it temporarily and use http://open.qa/docs/#_triggering_tests_based_on_an_any_remote_git_refspec_or_open_github_pull_request

You have a point that this may not be required to be merged to master, as outside of this poo investigation, I don't see how useful the new modules would be and they are not actually testing anything.

I've added the WIP label so this is not accidentally merged while I go the openqa-clone-custom-git-refspec route, and will remove and comment if I think a merge is necessary.

Two things though:

Last time I tried openqa-clone-custom-git-refspec with MM jobs, the CASEDIR setting was only applied to the parent job, which is the support server in this case. If this is still the case, testing this way will not work and either a merge or further tweaks to openqa-clone-custom-git-refspec will be necessary, in which case I will propose we merge. Granted, it has been months since the last time I tried this.
Ideally, tests with openqa-clone-custom-git-refspec should fail in the same step (Invalid Magic Number in grub) for its results to be relevant to the investigation. However, I fear they may pass and not give us relevant information.

alvarocarvajald · 2021-06-28T08:33:21Z

Testing with openqa-clone-custom-git-refspec --clone-job-args="--host openqa.suse.de --clone-children":

CASEDIR it's only set in the support server, so this will not do anything :(

okurz · 2021-06-28T08:34:53Z

Two things though:

* Last time I tried `openqa-clone-custom-git-refspec` with MM jobs, the **CASEDIR** setting was only applied to the parent job, which is the support server in this case. If this is still the case, testing this way will not work and either a merge or further tweaks to `openqa-clone-custom-git-refspec` will be necessary, in which case I will propose we merge. Granted, it has been months since the last time I tried this.

Any test coverage for multi-machine features is still far inferior to single-machine operations so there might still be problems. At least openqa-clone-job can handle multi-machine jobs and openqa-clone-custom-git-refspec is using that so it might work, can't promise though. As alternative you can also post jobs or isos with the parameter CASEDIR pointing to a git repo. My idea behind openqa-clone-custom-git-refspec was merely a simple wrapper to parse a git branch behind github pull requests but if you want to trigger with a simple git repo pointer you don't need more than the test variable CASEDIR, see the description in https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc

alvarocarvajald · 2021-06-28T09:29:10Z

Tried with:

openqa-clone-job --host openqa.suse.de --clone-children https://openqa.suse.de/tests/6342760 CASEDIR=https://github.com/alvarocarvajald/os-autoinst-distri-opensuse.git#poo#88273

But CASEDIR it's only on the parent job as well:

I expected command line settings from openqa-clone-job to get to all cloned tests. Will keep trying to figure out a way to do this.

okurz · 2021-06-28T10:20:22Z

I expected command line settings from openqa-clone-job to get to all cloned tests. Will keep trying to figure out a way to do this.

Add --parental-inheritance

alvarocarvajald · 2021-06-28T10:27:50Z

Add --parental-inheritance

That did the trick. Thanks!

alvarocarvajald · 2021-06-29T09:54:36Z

https://openqa.suse.de/tests/6345324

https://openqa.suse.de/tests/6345325

https://openqa.suse.de/tests/6345326

We have some results there, so I don't think there's any need to merge this into master anymore.

Closing the PR.

alvarocarvajald requested a review from juadk June 25, 2021 14:53

alvarocarvajald added 2 commits June 25, 2021 16:58

Add test module that reports information of HDD assets

5274733

This commit adds the info/show_hdd_info test module. The purpose of the module is to record in the test details and in autoinst-log information such as name, size and checksum digest for HDD_# test assets.

Add module to display MD5 checksums for files in /boot

0d40d4a

This commit adds the console/check_boot_files test module, the purpose of which is to record in the test details MD5 checksums for files in /boot. It will focus exclusively on vmlinu*, initrd*, config*, sysctl* and symver*.

alvarocarvajald force-pushed the poo#88273 branch from 6784cd8 to c7428eb Compare June 25, 2021 14:59

ricardobranco777 approved these changes Jun 25, 2021

View reviewed changes

okurz requested changes Jun 25, 2021

View reviewed changes

alvarocarvajald added the WIP Work in progress label Jun 28, 2021

Add debug modules into QAM 12-SP3 rolling update tests

fcf0cd6

alvarocarvajald force-pushed the poo#88273 branch from c7428eb to fcf0cd6 Compare June 28, 2021 08:17

alvarocarvajald requested a review from okurz June 28, 2021 08:26

alvarocarvajald closed this Jun 29, 2021

alvarocarvajald deleted the poo#88273 branch March 4, 2024 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schedule test modules to debug poo#88273 #12791

Schedule test modules to debug poo#88273 #12791

alvarocarvajald commented Jun 25, 2021

ricardobranco777 left a comment

okurz left a comment

alvarocarvajald commented Jun 28, 2021 •

edited

alvarocarvajald commented Jun 28, 2021

okurz commented Jun 28, 2021

alvarocarvajald commented Jun 28, 2021

okurz commented Jun 28, 2021

alvarocarvajald commented Jun 28, 2021

alvarocarvajald commented Jun 29, 2021

Schedule test modules to debug poo#88273 #12791

Schedule test modules to debug poo#88273 #12791

Conversation

alvarocarvajald commented Jun 25, 2021

ricardobranco777 left a comment

Choose a reason for hiding this comment

okurz left a comment

Choose a reason for hiding this comment

alvarocarvajald commented Jun 28, 2021 • edited

alvarocarvajald commented Jun 28, 2021

okurz commented Jun 28, 2021

alvarocarvajald commented Jun 28, 2021

okurz commented Jun 28, 2021

alvarocarvajald commented Jun 28, 2021

alvarocarvajald commented Jun 29, 2021

alvarocarvajald commented Jun 28, 2021 •

edited