New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openqa-investigate: Provide support for multi-machine scenarios #170
Conversation
ef4eec2
to
6d73d5b
Compare
Pushed a new commit. It is still WIP and I need to adjust tests but feedback is appreciated. See the commit message for details. |
904b536
to
37c0869
Compare
It works in production on a simple case, see https://openqa.opensuse.org/tests/2456131#comments. It also skips it correctly on the 2nd run:
It also worked on a failed parallel child, see https://openqa.opensuse.org/tests/2456116#comments. That restarted the cluster correctly (no chained parents, just the cluster as expected). It also created the comment on the first job in the cluster and edited it later, see https://openqa.opensuse.org/tests/2456115#comments. A second run was also skipped as expected, as well as a run on the parent:
(I've cancelled all investigation jobs again manually.) I could still extend the unit tests to cover everything (not just the individual functions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works in production on a simple case, see https://openqa.opensuse.org/tests/2456131#comments. It also skips it correctly on the 2nd run:
$ echo 'https://openqa.opensuse.org/tests/2456131' | env exclude_group_regex='foobar' ./openqa-investigate {"id":285792} $ echo 'https://openqa.opensuse.org/tests/2456131' | env exclude_group_regex='foobar' ./openqa-investigate Skipping investigation of job 2456131: job cluster is already being investigated, see comment on job 2456131
It also worked on a failed parallel child, see https://openqa.opensuse.org/tests/2456116#comments. That restarted the cluster correctly (no chained parents, just the cluster as expected). It also created the comment on the first job in the cluster and edited it later, see https://openqa.opensuse.org/tests/2456115#comments.
Looking at https://openqa.opensuse.org/tests/2456116#comment-285795 the first job pair is:
- salt-minion:investigate:retry: https://openqa.opensuse.org/t2457333 https://openqa.opensuse.org/t2457334
Out of those both the second is correctly called "opensuse-Tumbleweed-DVD-aarch64-salt-minion:investigate:retry@aarch64" and not within any job group, but the first is still with the original name "opensuse-Tumbleweed-DVD-aarch64-Build20220706-salt-master@aarch64" and within the original job group and build. That should be avoided as we don't want to pollute production builds.
I suppose I need to enable parental inheritance for that. However, then we have the same problem as before with the restart approach - we cannot have job-specific settings. For |
I suppose we can also live with using the same I also moved the previously created jobs out of the group/build. |
When checking `openqa.ini` on OSD I've noticed that hooks scripts for incomplete jobs and timeouts are only invoking openqa-label-known-issues. So for consistency with when we currently trigger the investigation I make the investigate script anything but failures.
* Instead of only considering parallel parents, just do the investigation for any job with parallel dependencies * Avoid having to run the investigation script for all job results * Sync via an openQA comment instead (to avoid running the same investigation twice; abort if a concurrent job already does the investigation of the cluster) * Use `--max-depth 0` to clone all jobs in the parallel cluster, regardless whether we're starting from a parallel parent or child * Has no effect on other dependency types since we're * using `--skip-chained-deps` anyways * *not* using `--clone-children` * still excluding directly chained dependencies * Write an investigation comment on the job we're actually investigating and on the first job in the cluster (for the synchronization) * See https://progress.opensuse.org/issues/95783#note-58
Can you trigger a new run on o3 with openqa-investigate and investigation jobs please? |
I did, see the second comment on https://openqa.opensuse.org/tests/2456116#comment-286104. (To be able to trigger it I temporarily edited the first comment.) |
Latest commit: Sync investigation of parallel clusters via openQA comment
for any job with parallel dependencies
investigation twice; abort if a concurrent job already does the
investigation of the cluster)
--max-depth 0
to clone all jobs in the parallel cluster,regardless whether we're starting from a parallel parent or child
--skip-chained-deps
anyways--clone-children
and on the first job in the cluster (for the synchronization)