-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix test_assembler to support parallel runs and run in parallel (HMS-3697) #1641
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `test_assembler.py` hardcods some filesystem and partition UUIDs. This leads to hard to diagnose test failures when the test is run in parallel. The btrfs and xfs filesystem drivers will see the same uuid for multi created images and error sometimes with someting like: ``` Mar 06 10:22:54 top kernel: BTRFS error: device /dev/loop104 belongs to fsid aff010e9-df95-4f81-be6b-e22317251033, and the fs is already mounted, scanned by mount (123856) ``` Its a race that only happens when two images are checked at the same time. This commit fixes the issue by just using a randomized UUID in the test_assemblers.py. It also re-enables running the test in parallel (which make it run a lot faster, from 34min to 14min).
supakeen
approved these changes
Mar 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3
ochosi
approved these changes
Mar 6, 2024
achilleas-k
approved these changes
Mar 6, 2024
mvo5
added a commit
to mvo5/osbuild
that referenced
this pull request
Mar 11, 2024
There were special cases in the workflow for only running specific tests in parallel. However how that the test_assemblers parallel runs are fixed [0] there is really no need for special cases anymore and we can just run them all in parallel. [0] osbuild#1641
supakeen
pushed a commit
that referenced
this pull request
Mar 11, 2024
There were special cases in the workflow for only running specific tests in parallel. However how that the test_assemblers parallel runs are fixed [0] there is really no need for special cases anymore and we can just run them all in parallel. [0] #1641
mvo5
added a commit
to mvo5/manifest-db
that referenced
this pull request
May 2, 2024
We get regular failures in the manifest-db run. With the extra visibility from osbuild#123 it seems the errors we get (sometimes) are related to an xfs mount ``` mount/- (org.osbuild.xfs): mounting /dev/98302bc3-1aa3-4eab-96e9-9f1c6eb539ce/rootlv -> /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/ mount/- (org.osbuild.xfs): already unmounted: /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/ ``` which looks similar the UUID clashes we observed in osbuild/osbuild#1641 Upon inspecting the manifests it seems we have quite a few UUIDs in our manifests. Those *might* explain the issues that we sometimes see because the UUIDs is global to the host so running multiple tests in parallel can lead to clashes when xfs/btrfs assume that the mount is already done when in fact it was a different loopdevice that just happend to clash with the UUID. This commit creates a new UUID for each file where a duplicated UUID is found (but keeps that UUID inside the same file stable). We could make this smaller by limiting this to only xfs/btfs which (AFAIK) are the only systems affected by UUID clashes but my script is not smart enough for this.
mvo5
added a commit
to mvo5/manifest-db
that referenced
this pull request
May 2, 2024
We get regular failures in the manifest-db run. With the extra visibility from osbuild#123 it seems the errors we get (sometimes) are related to an xfs mount ``` mount/- (org.osbuild.xfs): mounting /dev/98302bc3-1aa3-4eab-96e9-9f1c6eb539ce/rootlv -> /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/ mount/- (org.osbuild.xfs): already unmounted: /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/ ``` which looks similar the UUID clashes we observed in osbuild/osbuild#1641 Upon inspecting the manifests it seems we have quite a few UUIDs in our manifests. Those *might* explain the issues that we sometimes see because the UUIDs is global to the host so running multiple tests in parallel can lead to clashes when xfs/btrfs assume that the mount is already done when in fact it was a different loopdevice that just happend to clash with the UUID. This commit creates a new UUID for each file where a duplicated UUID is found (but keeps that UUID inside the same file stable). We could make this smaller by limiting this to only xfs/btfs which (AFAIK) are the only systems affected by UUID clashes but my script is not smart enough for this.
mvo5
added a commit
to mvo5/manifest-db
that referenced
this pull request
May 2, 2024
We get regular failures in the manifest-db run. With the extra visibility from osbuild#123 it seems the errors we get (sometimes) are related to an xfs mount ``` mount/- (org.osbuild.xfs): mounting /dev/98302bc3-1aa3-4eab-96e9-9f1c6eb539ce/rootlv -> /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/ mount/- (org.osbuild.xfs): already unmounted: /var/lib/osbuild/store/tmp/buildroot-tmp-3y2wyhos/mounts/ ``` which looks similar the UUID clashes we observed in osbuild/osbuild#1641 Upon inspecting the manifests it seems we have quite a few UUIDs in our manifests. Those *might* explain the issues that we sometimes see because the UUIDs is global to the host so running multiple tests in parallel can lead to clashes when xfs/btrfs assume that the mount is already done when in fact it was a different loopdevice that just happend to clash with the UUID. This commit creates a new UUID for each file where a duplicated UUID is found (but keeps that UUID inside the same file stable). We could make this smaller by limiting this to only xfs/btfs which (AFAIK) are the only systems affected by UUID clashes but my script is not smart enough for this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
test_assembler.py
hardcods some filesystem and partition UUIDs. This leads to hard to diagnose test failures when the test is run in parallel. The btrfs and xfs filesystem drivers will see the same uuid for multi created images and error sometimes with someting like:Its a race that only happens when two images are checked at the same time.
This commit fixes the issue by just using a randomized UUID in the test_assemblers.py. It also re-enables running the test in parallel (which make it run a lot faster, from 34min to 14min).