Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix test_assembler to support parallel runs and run in parallel (HMS-3697) #1641

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Mar 6, 2024

The test_assembler.py hardcods some filesystem and partition UUIDs. This leads to hard to diagnose test failures when the test is run in parallel. The btrfs and xfs filesystem drivers will see the same uuid for multi created images and error sometimes with someting like:

Mar 06 10:22:54 top kernel: BTRFS error: device /dev/loop104 belongs to fsid aff010e9-df95-4f81-be6b-e22317251033, and the fs is already mounted, scanned by mount (123856)

Its a race that only happens when two images are checked at the same time.

This commit fixes the issue by just using a randomized UUID in the test_assemblers.py. It also re-enables running the test in parallel (which make it run a lot faster, from 34min to 14min).

The `test_assembler.py` hardcods some filesystem and partition
UUIDs. This leads to hard to diagnose test failures when the
test is run in parallel. The btrfs and xfs filesystem drivers
will see the same uuid for multi created images and error sometimes with
someting like:
```
Mar 06 10:22:54 top kernel: BTRFS error: device /dev/loop104 belongs to fsid aff010e9-df95-4f81-be6b-e22317251033, and the fs is already mounted, scanned by mount (123856)
```
Its a race that only happens when two images are checked at the
same time.

This commit fixes the issue by just using a randomized UUID in
the test_assemblers.py. It also re-enables running the test in
parallel (which make it run a lot faster, from 34min to 14min).
Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

@achilleas-k achilleas-k enabled auto-merge (rebase) March 6, 2024 11:27
@ochosi ochosi disabled auto-merge March 6, 2024 11:28
@ochosi ochosi merged commit 1278e5d into osbuild:main Mar 6, 2024
63 of 64 checks passed
mvo5 added a commit to mvo5/osbuild that referenced this pull request Mar 11, 2024
There were special cases in the workflow for only running specific
tests in parallel. However how that the test_assemblers parallel
runs are fixed [0] there is really no need for special cases anymore
and we can just run them all in parallel.

[0] osbuild#1641
supakeen pushed a commit that referenced this pull request Mar 11, 2024
There were special cases in the workflow for only running specific
tests in parallel. However how that the test_assemblers parallel
runs are fixed [0] there is really no need for special cases anymore
and we can just run them all in parallel.

[0] #1641
mvo5 added a commit to mvo5/manifest-db that referenced this pull request May 2, 2024
We get regular failures in the manifest-db run. With the extra
visibility from osbuild#123
it seems the errors we get (sometimes) are related to an xfs mount
```
mount/- (org.osbuild.xfs): mounting /dev/98302bc3-1aa3-4eab-96e9-9f1c6eb539ce/rootlv -> /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/
mount/- (org.osbuild.xfs): already unmounted: /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/
```
which looks similar the UUID clashes we observed in
osbuild/osbuild#1641

Upon inspecting the manifests it seems we have quite a few UUIDs in
our manifests. Those *might* explain the issues that we sometimes
see because the UUIDs is global to the host so running multiple tests
in parallel can lead to clashes when xfs/btrfs assume that the
mount is already done when in fact it was a different loopdevice
that just happend to clash with the UUID.

This commit creates a new UUID for each file where a duplicated
UUID is found (but keeps that UUID inside the same file stable).

We could make this smaller by limiting this to only xfs/btfs which
(AFAIK) are the only systems affected by UUID clashes but my
script is not smart enough for this.
mvo5 added a commit to mvo5/manifest-db that referenced this pull request May 2, 2024
We get regular failures in the manifest-db run. With the extra
visibility from osbuild#123
it seems the errors we get (sometimes) are related to an xfs mount
```
mount/- (org.osbuild.xfs): mounting /dev/98302bc3-1aa3-4eab-96e9-9f1c6eb539ce/rootlv -> /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/
mount/- (org.osbuild.xfs): already unmounted: /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/
```
which looks similar the UUID clashes we observed in
osbuild/osbuild#1641

Upon inspecting the manifests it seems we have quite a few UUIDs in
our manifests. Those *might* explain the issues that we sometimes
see because the UUIDs is global to the host so running multiple tests
in parallel can lead to clashes when xfs/btrfs assume that the
mount is already done when in fact it was a different loopdevice
that just happend to clash with the UUID.

This commit creates a new UUID for each file where a duplicated
UUID is found (but keeps that UUID inside the same file stable).

We could make this smaller by limiting this to only xfs/btfs which
(AFAIK) are the only systems affected by UUID clashes but my
script is not smart enough for this.
mvo5 added a commit to mvo5/manifest-db that referenced this pull request May 2, 2024
We get regular failures in the manifest-db run. With the extra
visibility from osbuild#123
it seems the errors we get (sometimes) are related to an xfs mount
```
mount/- (org.osbuild.xfs): mounting /dev/98302bc3-1aa3-4eab-96e9-9f1c6eb539ce/rootlv -> /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/
mount/- (org.osbuild.xfs): already unmounted: /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/
```
which looks similar the UUID clashes we observed in
osbuild/osbuild#1641

Upon inspecting the manifests it seems we have quite a few UUIDs in
our manifests. Those *might* explain the issues that we sometimes
see because the UUIDs is global to the host so running multiple tests
in parallel can lead to clashes when xfs/btrfs assume that the
mount is already done when in fact it was a different loopdevice
that just happend to clash with the UUID.

This commit creates a new UUID for each file where a duplicated
UUID is found (but keeps that UUID inside the same file stable).

We could make this smaller by limiting this to only xfs/btfs which
(AFAIK) are the only systems affected by UUID clashes but my
script is not smart enough for this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants