Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedule test modules to debug poo#88273 #12791

Closed
wants to merge 3 commits into from

Conversation

alvarocarvajald
Copy link
Contributor

The ticket https://progress.opensuse.org/issues/88273 describes a frequent test failure on the QAM TestRepo 12-SP3 HA rolling update tests that requires more investigation.

This PR is adding 2 modules to check some assets' integrity both in the worker and in SUT to verify if changes to these files could explain the frequent issues.

The issue itself can be described as follows: 2 MM jobs are started from the same qcow2 image in the same job group, to eventually configure them as an HA cluster; before any HA configuration is done, both SUT boot from the qcow2, are registered to SCC with migration/register_system and updated with update/zypper_up; then both SUT are rebooted. and even though by this time both tests should be identical, one of them is seen successfully booting, while the other shows an Invalid Magic Number error in grub while booting and fails the test, failing the whole MM job: https://openqa.suse.de/tests/6323075#step/console_reboot/4

Issue is frequent enough in openqa.suse.de to be a concern (https://openqa.suse.de/tests/6323075#next_previous), but cloning the job into our development environment shows the test always passing: http://mango.qa.suse.de/tests/4140#next_previous

Tests were moved from openqaworker10 to openqaworker9 as an initial attempt to fix/gather more details on the failure, but this has not changed the outcome in openqa.suse.de, which is why these changes are now being submitted.

This commit adds the info/show_hdd_info test module. The purpose of the
module is to record in the test details and in autoinst-log information
such as name, size and checksum digest for HDD_# test assets.
This commit adds the console/check_boot_files test module, the purpose
of which is to record in the test details MD5 checksums for files in
/boot. It will focus exclusively on vmlinu*, initrd*, config*, sysctl*
and symver*.
Copy link
Contributor

@ricardobranco777 ricardobranco777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Member

@okurz okurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your git commit message subject s/uptate/update/

But in general, do you actually need this changed merged to master or just want to have it temporarily and use http://open.qa/docs/#_triggering_tests_based_on_an_any_remote_git_refspec_or_open_github_pull_request

@alvarocarvajald alvarocarvajald added the WIP Work in progress label Jun 28, 2021
@alvarocarvajald
Copy link
Contributor Author

alvarocarvajald commented Jun 28, 2021

In your git commit message subject s/uptate/update/

Fixed.

But in general, do you actually need this changed merged to master or just want to have it temporarily and use http://open.qa/docs/#_triggering_tests_based_on_an_any_remote_git_refspec_or_open_github_pull_request

You have a point that this may not be required to be merged to master, as outside of this poo investigation, I don't see how useful the new modules would be and they are not actually testing anything.

I've added the WIP label so this is not accidentally merged while I go the openqa-clone-custom-git-refspec route, and will remove and comment if I think a merge is necessary.

Two things though:

  • Last time I tried openqa-clone-custom-git-refspec with MM jobs, the CASEDIR setting was only applied to the parent job, which is the support server in this case. If this is still the case, testing this way will not work and either a merge or further tweaks to openqa-clone-custom-git-refspec will be necessary, in which case I will propose we merge. Granted, it has been months since the last time I tried this.
  • Ideally, tests with openqa-clone-custom-git-refspec should fail in the same step (Invalid Magic Number in grub) for its results to be relevant to the investigation. However, I fear they may pass and not give us relevant information.

@alvarocarvajald
Copy link
Contributor Author

Testing with openqa-clone-custom-git-refspec --clone-job-args="--host openqa.suse.de --clone-children":

CASEDIR it's only set in the support server, so this will not do anything :(

@okurz
Copy link
Member

okurz commented Jun 28, 2021

Two things though:

* Last time I tried `openqa-clone-custom-git-refspec` with MM jobs, the **CASEDIR** setting was only applied to the parent job, which is the support server in this case. If this is still the case, testing this way will not work and either a merge or further tweaks to `openqa-clone-custom-git-refspec` will be necessary, in which case I will propose we merge. Granted, it has been months since the last time I tried this.

Any test coverage for multi-machine features is still far inferior to single-machine operations so there might still be problems. At least openqa-clone-job can handle multi-machine jobs and openqa-clone-custom-git-refspec is using that so it might work, can't promise though. As alternative you can also post jobs or isos with the parameter CASEDIR pointing to a git repo. My idea behind openqa-clone-custom-git-refspec was merely a simple wrapper to parse a git branch behind github pull requests but if you want to trigger with a simple git repo pointer you don't need more than the test variable CASEDIR, see the description in https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc

@alvarocarvajald
Copy link
Contributor Author

Tried with:

openqa-clone-job --host openqa.suse.de --clone-children https://openqa.suse.de/tests/6342760 CASEDIR=https://github.com/alvarocarvajald/os-autoinst-distri-opensuse.git#poo#88273

But CASEDIR it's only on the parent job as well:

I expected command line settings from openqa-clone-job to get to all cloned tests. Will keep trying to figure out a way to do this.

@okurz
Copy link
Member

okurz commented Jun 28, 2021

I expected command line settings from openqa-clone-job to get to all cloned tests. Will keep trying to figure out a way to do this.

Add --parental-inheritance

@alvarocarvajald
Copy link
Contributor Author

Add --parental-inheritance

That did the trick. Thanks!

@alvarocarvajald
Copy link
Contributor Author

We have some results there, so I don't think there's any need to merge this into master anymore.

Closing the PR.

@alvarocarvajald alvarocarvajald deleted the poo#88273 branch March 4, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress
Projects
None yet
3 participants