ci: Packit: Run cockpit storage tests in PRs #1161

martinpitt · 2023-08-04T04:43:53Z

See https://cockpit-project.org/blog/tmt-cross-project-testing.html

I tested this against my fork in martinpitt#2 . That also demonstrates that this "just works" in forks without magic setup, which makes it easy to try out stuff and collaborate.

When done seriously, this is an ongoing commitment. In our recent years the Cockpit team has reported a couple dozen udisks bug reports found by our tests, some of them were actual regressions like these:

udisks_linux_partition_table_handle_create_partition() assertion crash: dev->open_count > 0 #354
segfault in udisks_daemon_util_check_authorization_sync() #422
udisksd dumps core on stopping mdraid device #643
Wrong "Type: block" attribute on VDO pool objects #934
No change notification for o.fd.UDisks2.Filesystem.Size anymore #1008 (that one even identified the particular PR that introduced the bug)

With this setup, we have a good chance of prevent landing such regressions.

This approach is fairly new at least for us in the Fedora world, so let's treat this as an experiment. We do need to talk about commitments and expectations, though. From our side, I propose:

We don't expect you to become experts in our tests. When they fail, it would be nice if you could have a quick look at the log and see if it's something obvious (for example, the screenshot shows an error message, or the journal shows an udisks crash). But in general, we expect that someone from our team will investigate and discuss that with you. The PR where that happened provides a nice place to collect notes.
For now, contacting us is manual, i.e. writing a comment like "hey @martinpitt @marusak @mvollmer please have a look at the cockpit failure here". I put that list into the test plan, I'm happy to add it anywhere else where it's easier for you to find. In the future this can hopefully be automated, see RFE: Notify on failure + tag github user(s) along with custom failure message packit/packit-service#1911
We don't expect you to block PRs on these tests, especially not if the testing farm has infrastructure problems . Each Monday morning they tend to run into provisioning errors. Just ignore these then -- we see them too in our projects, and will usually prod #testing-farm then.
We would like you to at least give us a chance on that workday to look at a failed test (not infra failure, a "real" one) before you land a PR, so that this all makes sense. If something is urgent and you quick-land, then at least we still retain the benefit of having a trace in which PR tests started to fail, but then the damage is possibly already done. Note that this isn't a big new commitment from our side: we already spend many hours every week looking at regressions, and it takes a lot more effort to track them down weeks after they happened. So this will acually reduce both our and your time spent on hunting down regressions, because the context (PR) is fresh and small, as opposed to "whatever changed in Fedora in the last 4 weeks".
I set this up for Fedora latest and rawhide for the time being. If you think that rawhide has too much churn, of course feel free to drop that from .packit.yaml, or I do it here right away when you want to start slow. We have enabled rawhide in cockpit projects a while ago, and it does often fail due to unrelated reasons. Improving this is pretty much exactly why I start this initiative, and at least knowing about problems is still better IMHO than entirely ignoring rawhide, but it can get annoying.

Please let me know about any other question that you may have. I'm happy to discuss them here or in a gmeet for higher bandwidth.

Thanks!

StorageGhoul · 2023-08-04T04:44:01Z

Can one of the admins verify this patch?

martinpitt · 2023-08-04T09:06:35Z

So f38 is fine, rawhide failed because it seems to be uninstallable in the udisks-daily COPR. That may be part of the "too much churn in rawhide" issue from the description, or spot an actual bug.

@vojtechtrefny , @tbzatek I'll leave it at that for now, and let you do a round of review and commenting, in particular of the points in the description. Thanks!

martinpitt · 2023-08-08T07:22:26Z

@vojtechtrefny , @tbzatek : FYI, I'll be on PTO for two weeks from tomorrow on. Please ask @marusak and @jelly for questions around this. Thanks!

martinpitt · 2023-08-24T16:34:55Z

Interesting, I get a consistent LVM failure in F39 and rawhide, while F38 succeeds. Cockpit so far has run the udisks-daily COPR only on Fedora 38 + updates-testing, that's why we didn't see that previously.

That might just be exactly the thing that I'm looking for with these tests. I'll reproduce/investigate tomorrow, maybe we can prevent a bug from landing in the next release :-) In the meantime this is blocked, moving to draft.

This reproduces easily locally. I prepare a fedora-39 cockpit test VM, the test works. I run dnf -y copr enable @storage/udisks-daily; dnf -y --setopt=install_weak_deps=False update. That exactly installs all updates from the COPR (udisks and libblockdev), no other updates from Fedora (my VM is current).

martinpitt · 2023-08-25T04:13:01Z

Indeed that regression is already reported as issue #1170

martinpitt · 2023-08-25T04:16:09Z

So this is actually exactly what I'd like to achieve -- with this in place, we would have noticed the regression right in PR #1158 which introduced it.

@vojtechtrefny , @tbzatek : So I propose to block this PR until #1170 is fixed, so that you don't start with red statuses. But could we already discuss this approach in general? I have a bunch of questions in the description that we should agree on. Happy to schedule a gmeet as well!

martinpitt · 2023-08-25T07:06:41Z

Ah, I was confused -- in fact, #1170 does affect Fedora 38 as well, but @mvollmer already added a "naughty pattern" (failure suppression) for it. I now applied that to F39/40 as well, so that they should succeed now. So this is once again unblocked.

But it nicely demonstrates that we can avoid introducing such regression in the first place with this revdeps testing 😁

See https://cockpit-project.org/blog/tmt-cross-project-testing.html

vojtechtrefny

Looks good to me, thank you.

martinpitt · 2023-08-29T08:36:32Z

@tbzatek I discussed this mostly with @vojtechtrefny in slack yesterday (maybe you didn't see these messages? bridge may not work?). Please let me know if you have further questions, or doubts about the expectations laid out in the description. Thanks!

tbzatek · 2023-08-30T15:27:21Z

So this is actually exactly what I'd like to achieve -- with this in place, we would have noticed the regression right in PR #1158 which introduced it.

Exactly, I believe this regression would have been caught right in the iscsi fixes pull request. On the other hand, the failure in the lvm2 module was not covered by udisks tests either but well, tests will never cover every line of code. At least not at the time of their creation, one is expecting some stuff would never fail...

There's another bugreport that would have been caught the same way I believe: #1172

So running Cockpit tests is always a great asset.

To be fair, I rarely look at our Jenkins nightly runs. I also tend to ignore the unstable tests in our PR runs. I do however check rawhide even though there's always some known issue, it is also the first place where regressions introduced by other packages or the kernel do show up first.

Fixing tests in UDisks is often time consuming and not always failing test means a problem in the real world use. There are number of random failures caused by timing or race conditions (often on the tests side) or simply a slow system that can't keep up with uevent processing. In the past couple of years we've invested quite a bit of time to fix tests and various corner cases while the time spent on this may have been invested somewhere else. It's always about the tradeoff of quality vs. quantity. On an understaffed project like UDisks there's nobody else that would take care of such bugfixing however.

I would say: let's try this and let's see what our habit of checking Cockpit tests will be.

We don't expect you to block PRs on these tests, especially not if the testing farm has infrastructure problems . Each Monday morning they tend to run into provisioning errors. Just ignore these then -- we see them too in our projects, and will usually prod #testing-farm then.

That's fine, I have absolutely no problem with ignoring unknown errors :-)

We would like you to at least give us a chance on that workday to look at a failed test (not infra failure, a "real" one) before you land a PR, so that this all makes sense. If something is urgent and you quick-land, then at least we still retain the benefit of having a trace in which PR tests started to fail, but then the damage is possibly already done. Note that this isn't a big new commitment from our side: we already spend many hours every week looking at regressions, and it takes a lot more effort to track them down weeks after they happened. So this will acually reduce both our and your time spent on hunting down regressions, because the context (PR) is fresh and small, as opposed to "whatever changed in Fedora in the last 4 weeks".

Urgent PRs are quite an exception in UDisks, perhaps only shortly before upcoming release. Backports in Fedora and RHEL are always manual, not even tracking upstream releases. Upstream PRs can stay open for a while.

I set this up for Fedora latest and rawhide for the time being. If you think that rawhide has too much churn, of course feel free to drop that from .packit.yaml, or I do it here right away when you want to start slow. We have enabled rawhide in cockpit projects a while ago, and it does often fail due to unrelated reasons. Improving this is pretty much exactly why I start this initiative, and at least knowing about problems is still better IMHO than entirely ignoring rawhide, but it can get annoying.

Rawhide is important. It lets us spot kernel changes that would soon get to RHEL. Block layer, device drivers, filesystem drives - there have been various changes that looked innocent from kernel POV but made a big difference in userspace and UDisks especially. Same thing with userspace package rebases - e.g. util-linux. Even c9s often gets ahead of rawhide and is a good candidate to the most bleeding edge distro.

tbzatek · 2023-08-30T15:27:50Z

Jenkins, ok to test.

martinpitt · 2023-08-31T12:57:30Z

The integration test results look rather sad, but I suppose these are unrelated? can they be retried?

tbzatek · 2023-08-31T13:06:01Z

The integration test results look rather sad, but I suppose these are unrelated? can they be retried?

Welcome to our world. Actually the number of failures this time is rather low.

These might be partially fixed (or silenced or removed) once #1168 and #1162 are merged. However tests are failing randomly and as I said above, not every test failure indicates an issue in the real world. It's also incredibly time consuming chasing every single test failure like these.

martinpitt · 2023-09-01T09:18:03Z

So is anything still blocking this PR?

tbzatek · 2023-09-01T13:52:59Z

Nope, let's go for it!

martinpitt · 2023-09-01T15:12:37Z

🚀 yay! Please holler if you see any trouble with this.

martinpitt mentioned this pull request Aug 4, 2023

ci: Packit: Run cockpit storage tests in PRs martinpitt/udisks#2

Closed

2 tasks

This comment was marked as resolved.

Sign in to view

martinpitt force-pushed the run-cockpit-storage branch from b96740b to 1f1d698 Compare August 4, 2023 08:24

martinpitt force-pushed the run-cockpit-storage branch from 1f1d698 to b89a505 Compare August 23, 2023 05:30

martinpitt marked this pull request as draft August 24, 2023 16:34

martinpitt mentioned this pull request Aug 25, 2023

lvm2 module is not noticing deactivations of logical volumes #1170

Closed

martinpitt mentioned this pull request Aug 25, 2023

naughty: udisks regression #5104 also affects newer Fedoras cockpit-project/bots#5153

Merged

ci: Packit: Run cockpit storage tests in PRs

c64e526

See https://cockpit-project.org/blog/tmt-cross-project-testing.html

martinpitt marked this pull request as ready for review August 25, 2023 07:07

martinpitt force-pushed the run-cockpit-storage branch from b89a505 to c64e526 Compare August 25, 2023 07:07

vojtechtrefny approved these changes Aug 28, 2023

View reviewed changes

tbzatek approved these changes Aug 30, 2023

View reviewed changes

tbzatek merged commit c640ad7 into storaged-project:master Sep 1, 2023
20 of 23 checks passed

martinpitt deleted the run-cockpit-storage branch September 1, 2023 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Packit: Run cockpit storage tests in PRs #1161

ci: Packit: Run cockpit storage tests in PRs #1161

martinpitt commented Aug 4, 2023 •

edited

StorageGhoul commented Aug 4, 2023

This comment was marked as resolved.

martinpitt commented Aug 4, 2023

martinpitt commented Aug 8, 2023

martinpitt commented Aug 24, 2023 •

edited

martinpitt commented Aug 25, 2023

martinpitt commented Aug 25, 2023

martinpitt commented Aug 25, 2023

vojtechtrefny left a comment

martinpitt commented Aug 29, 2023

tbzatek commented Aug 30, 2023

tbzatek commented Aug 30, 2023

martinpitt commented Aug 31, 2023

tbzatek commented Aug 31, 2023

martinpitt commented Sep 1, 2023

tbzatek commented Sep 1, 2023

martinpitt commented Sep 1, 2023

ci: Packit: Run cockpit storage tests in PRs #1161

ci: Packit: Run cockpit storage tests in PRs #1161

Conversation

martinpitt commented Aug 4, 2023 • edited

StorageGhoul commented Aug 4, 2023

This comment was marked as resolved.

martinpitt commented Aug 4, 2023

martinpitt commented Aug 8, 2023

martinpitt commented Aug 24, 2023 • edited

martinpitt commented Aug 25, 2023

martinpitt commented Aug 25, 2023

martinpitt commented Aug 25, 2023

vojtechtrefny left a comment

Choose a reason for hiding this comment

martinpitt commented Aug 29, 2023

tbzatek commented Aug 30, 2023

tbzatek commented Aug 30, 2023

martinpitt commented Aug 31, 2023

tbzatek commented Aug 31, 2023

martinpitt commented Sep 1, 2023

tbzatek commented Sep 1, 2023

martinpitt commented Sep 1, 2023

martinpitt commented Aug 4, 2023 •

edited

martinpitt commented Aug 24, 2023 •

edited