Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation #5538

AdamWill · 2023-11-21T23:59:20Z

This PR does three things that overall aim to address the issues discussed in https://pagure.io/fedora-ci/general/issue/436 (it's rather complicated for test systems to trigger tests appropriately for updates, and Fedora CI doesn't do the best job of it right now):

It simplifies the UpdateReadyForTesting message (as a new V3), dropping most of the additional data that was included in the message initially at the request of the CI team in https://pagure.io/fedora-ci/general/issue/70 , but which the CI triggers aren't really using.
It no longer publishes UpdateReadyForTesting when an (RPM-based) update's status is set to testing. Instead:
It publishes UpdateReadyForTesting any time an (RPM-based) update is edited and the builds in it change.
It publishes UpdateReadyForTesting when an (RPM-based) update is created.

Taken together, these changes should allow test systems simply to run tests any time they see an UpdateReadyForTesting message, and get optimal scheduling behavior (tests run on update creation, edit with changed builds, and 'Re-Trigger Tests' request from the web UI). Currently Fedora CI relies solely on the UpdateReadyForTesting messages, but this means it doesn't run tests when an update is created or edited until up to 24 hours later (because UpdateReadyForTesting is only published when the update is pushed to testing). openQA and FCOS CI handle scheduling better, but have to listen to multiple messages and implement complex logic to do it; this will allow us to radically simplify their scheduling code.

bodhi-messages/bodhi/messages/schemas/update.py

AdamWill · 2023-11-22T01:23:02Z

sigh, why is this so hard. okay, i know what's going on, it's just, ugh. will poke some more tomorrow.

AdamWill · 2023-11-22T01:29:12Z

what's going on is, we just basically use the entire Update instance to 'be' the update property of the message (mediated through database model tomfoolery) and the builds attribute of an Update instance is a list of Build model instances, and Build model instances don't have a task_id attribute, so this just isn't working. i wasn't totally grokking that that's really really how this works. i will think about it again tomorrow.

AdamWill · 2023-11-22T07:15:37Z

on the whole, having thought it over a bit, I decided this approach is basically garbage. but I needed to go through it to work that out! gonna try a different approach tomorrow.

AdamWill · 2023-11-24T02:59:02Z

goddamnit, sending a message on update creation turns out to be weirdly difficult!

AdamWill · 2023-11-24T03:14:27Z

sigh, current attempt is going insanely slow as it's attempting to publish messages. working on a workaround for this locally.

AdamWill · 2023-11-25T01:21:04Z

This time?!

AdamWill · 2023-11-25T01:37:24Z

woohoo! Still gonna keep this marked as WIP for now as I need to look over it for anything that needs tidying up, make sure there's enough testing of the new behaviour, and write a changelog note.

AdamWill · 2023-11-25T01:47:58Z

Note: we shouldn't merge this, or at least shouldn't deploy it to production, without merging these PRs for the CI triggers:

@msrb

mattiaverga · 2023-11-29T09:29:33Z

I've just branched 8.0 and tagged this to be backported in it... let me know if the needed changes will land in time or if we should move this to 8.1

BTW can you add a note for the release notes to the PR?

AdamWill · 2023-11-29T16:44:02Z

yeah, sorry, I've just been distracted by a couple of different things the last couple days so didn't get to polishing this. I did mention above that writing a release note is one of the things that needs doing. edit: duh, I see the trigger PRs were merged! yay for that.

See https://pagure.io/fedora-ci/general/issue/436 . This isn't strictly necessary, but it simplifies things and makes them more consistent. We introduce UpdateReadyForTestingV3, which now inherits from UpdateMessage and is more similar again to all the other messages. It drops quite a lot of the extra data that V1 and V2 contained because, AFAICS, the Fedora CI triggers never actually used it. It retains only what they need - a stripped version of the artifact dict with stripped build dicts. Really the triggers only need the task ID and build ID from these build dicts. Signed-off-by: Adam Williamson <awilliam@redhat.com>

AdamWill · 2023-11-29T18:00:16Z

For the historical record here, I'll note #3428 was the original ticket that led to the initial design here (where we publish the message on status change to 'testing').

That ticket states "Updates are manually created in a "Pending" state, where they are signed before they are pushed to a "Testing" where they are tested by the CI system. So we can't trigger the CI system off new updates being created."

I tried to find any further background behind that statement, but couldn't. My best read is that it expects "the CI system" either needs updates to be in updates-testing, or to be signed, before they can be tested.

This is definitely not the case for openQA, and per discussion in https://pagure.io/fedora-ci/general/issue/436 and review of the Fedora CI code, I don't believe it's true for Fedora CI either. AFAICS, like openQA, Fedora CI finds the builds in the update and retrieves them from Koji for testing, which does not require that they are in updates-testing or signed. I believe FCOS CI is similar also.

See https://pagure.io/fedora-ci/general/issue/436 . This is the behaviour test systems want, based on real-world experience. We do not want test systems to wait for updates to reach the updates-testing repo before they test; we want them to test as soon as the update is created. All the test systems that exist pull the builds to be tested from Koji, they do not rely on them being present in updates-testing. Waiting for the update to reach updates-testing before tests run can mean a wait of up to 24 hours, which is obviously not ideal when people want test results fast. It also means we can't gate push to updates-testing on tests that trigger on this message. openQA currently listens out for other messages to use as a proxy for 'update was created' in order to mitigate this problem, but Fedora CI does not do so, and consequently its tests often trigger late. This change, along with the earlier change to publish this message when an update's builds are changed, will allow us to simplify openQA's logic considerably (it can just trigger any time it sees this message) and improve CI's scheduling considerably without any need to make changes to the trigger logic on the CI side. Signed-off-by: Adam Williamson <awilliam@redhat.com>

Test systems should re-test an update if it's edited and its builds change. Before b33609d we could rely on UpdateReadyForTesting being sent (after a delay) when this happened, because editing the builds causes the update to be sent back to pending state and then to testing state, and we were publishing the message any time an update hit testing state. But now we publish the message on update creation instead of when the update reaches testing status, that is no longer the case. So instead we should directly publish UpdateReadyForTesting when an update is edited and the builds change. Signed-off-by: Adam Williamson <awilliam@redhat.com>

Signed-off-by: Adam Williamson <awilliam@redhat.com>

AdamWill · 2023-11-29T18:36:33Z

OK, I think (hope?) this is ready now.

mattiaverga

LGTM!

jlebon · 2024-02-01T19:50:16Z

Do I understand correctly that this is removing the .artifacts.builds[].component field? Was hoping as part of the simplification of coreos/coreos-ci#53 that I'd be able to always rely on that (which previously wasn't in all the messages that we triggered on). Currently using a gnarly regex on the nvr field. I could keep doing that, but... seems like a desirable field to have for anything listening and triggering on these messages?

AdamWill · 2024-02-01T20:05:00Z

yes, it does do that; I wasn't aware you were using it.

You shouldn't need to use a gnarly regex to parse an NVR. RPM does not allow dashes in version or release fields - see the Version and Release paragraphs under Preamble - so you can always just split it from the right. In Python, this should always get the N:

nvr.rsplit("-", 2)[0]

is it difficult to do that in Jenkinsfile syntax?

jlebon · 2024-02-01T20:11:25Z

is it difficult to do that in Jenkinsfile syntax?

Right, exactly. The problem is working within the constraints of the message selector syntax. It's the difference between having the job trigger on every Bodhi message and then have imperative code doing this type of checking and exiting early vs having the messaging plugin do the filtering for us.

jlebon · 2024-02-01T20:17:40Z

yes, it does do that; I wasn't aware you were using it.

To clarify, we're not using it currently. I had plans to use it but it's definitely not a blocker.

Happy to peel this off into an RFE ticket for now.

AdamWill · 2024-02-01T20:19:45Z

Okay, I see.

The behind-the-scenes here is: Bodhi has a really nice, elegant system for producing the update dict in its messages. When constructing the message you literally just pass an instance of its Update class and it all gets serialized for you, mostly magically.

The artifact dict is, uh. Not that! It gets pulled together kind of awkwardly on the fly when publishing just this one specific message which is different from all the other dozen or so messages Bodhi publishes that relate to updates. That function has to exist, anything that publishes this message has to call it, it's just kinda ugly compared to the nice way it works for all the other messages.

In this PR I really actually wanted to throw the artifact dict away entirely, but sadly I decided it was a bit too difficult for now because the CI pipelines use some info from it which isn't easily available otherwise. Your situation is similar, I guess, so we could add it back, I just kinda hate to because it's going in the opposite direction of the one I still do ultimately wish we could go in (get rid of this special dict entirely and have UpdateReadyForTesting be just like all the other messages).

jlebon · 2024-02-05T16:06:24Z

Gotcha. I'm not particularly attached to the artifact dict myself. Could it be added to .update.builds[] instead? We already have .nvr there, so we have that info handy. :)

AdamWill · 2024-02-05T16:15:55Z

possibly, but the issue with that is because of the neat way the message dict for the update is created, adding things to it isn't necessarily straightforward (it can mean a database schema change). I'll try and find a minute to look into how that works in more detail and see what the implications of adding it there are.

AdamWill · 2024-02-05T22:46:43Z

For anyone playing along at home, this is now deployed to prod. jlebon and I have updated the FCOS and openQA schedulers respectively. Fedora CI doesn't need any scheduler changes but should start exhibiting improved behaviour (scheduling tests faster when updates are created or edited).

AdamWill requested a review from a team as a code owner November 21, 2023 23:59

github-advanced-security bot found potential problems Nov 22, 2023

View reviewed changes

bodhi-messages/bodhi/messages/schemas/update.py Fixed Show fixed Hide fixed

AdamWill force-pushed the consistent-test-scheduling-message branch 5 times, most recently from b5fed07 to 08fee8b Compare November 22, 2023 00:31

github-advanced-security bot found potential problems Nov 22, 2023

View reviewed changes

bodhi-messages/bodhi/messages/schemas/update.py Fixed Show fixed Hide fixed

AdamWill force-pushed the consistent-test-scheduling-message branch 2 times, most recently from 47554aa to 06e9113 Compare November 22, 2023 00:48

github-advanced-security bot found potential problems Nov 22, 2023

View reviewed changes

bodhi-messages/bodhi/messages/schemas/update.py Fixed Show fixed Hide fixed

AdamWill force-pushed the consistent-test-scheduling-message branch from 06e9113 to 6c5a0b0 Compare November 22, 2023 00:53

AdamWill force-pushed the consistent-test-scheduling-message branch 10 times, most recently from 6350be1 to defba00 Compare November 24, 2023 02:42

AdamWill force-pushed the consistent-test-scheduling-message branch 3 times, most recently from 78d6cd1 to 6cdf46b Compare November 24, 2023 05:02

AdamWill changed the title ~~WIP DO NOT MERGE add task ID to UpdateReadyForTesting~~ WIP: Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation Nov 25, 2023

mattiaverga added the 8.0-backports Mergify uses this to backport patches to 8.0 label Nov 29, 2023

AdamWill force-pushed the consistent-test-scheduling-message branch from becd8ae to db362b1 Compare November 29, 2023 17:27

AdamWill changed the title ~~WIP: Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation~~ Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation Nov 29, 2023

AdamWill added 3 commits November 29, 2023 10:04

Add a news entry for UpdateReadyForTesting changes

d6c257a

Signed-off-by: Adam Williamson <awilliam@redhat.com>

AdamWill force-pushed the consistent-test-scheduling-message branch from 85b9b53 to d6c257a Compare November 29, 2023 18:08

mattiaverga approved these changes Dec 2, 2023

View reviewed changes

Merge branch 'develop' into consistent-test-scheduling-message

4df3edf

mergify bot merged commit 681119c into fedora-infra:develop Dec 2, 2023
31 checks passed

mergify bot mentioned this pull request Dec 2, 2023

Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation (backport #5538) #5549

Merged

mattiaverga mentioned this pull request Dec 3, 2023

bodhi_update_push_testing gating decision context is never honored #5476

Open

AdamWill mentioned this pull request Dec 7, 2023

Simplify the Bodhi trigger (when Bodhi 8.0 hits production) coreos/coreos-ci#53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation #5538

Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation #5538

AdamWill commented Nov 21, 2023 •

edited

AdamWill commented Nov 22, 2023

AdamWill commented Nov 22, 2023 •

edited

AdamWill commented Nov 22, 2023

AdamWill commented Nov 24, 2023

AdamWill commented Nov 24, 2023

AdamWill commented Nov 25, 2023

AdamWill commented Nov 25, 2023

AdamWill commented Nov 25, 2023 •

edited

mattiaverga commented Nov 29, 2023

AdamWill commented Nov 29, 2023 •

edited

AdamWill commented Nov 29, 2023

AdamWill commented Nov 29, 2023

mattiaverga left a comment

jlebon commented Feb 1, 2024

AdamWill commented Feb 1, 2024 •

edited

jlebon commented Feb 1, 2024

jlebon commented Feb 1, 2024

AdamWill commented Feb 1, 2024

jlebon commented Feb 5, 2024

AdamWill commented Feb 5, 2024

AdamWill commented Feb 5, 2024

Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation #5538

Simplify UpdateReadyForTesting, publish it on edits that change builds and on update creation #5538

Conversation

AdamWill commented Nov 21, 2023 • edited

AdamWill commented Nov 22, 2023

AdamWill commented Nov 22, 2023 • edited

AdamWill commented Nov 22, 2023

AdamWill commented Nov 24, 2023

AdamWill commented Nov 24, 2023

AdamWill commented Nov 25, 2023

AdamWill commented Nov 25, 2023

AdamWill commented Nov 25, 2023 • edited

mattiaverga commented Nov 29, 2023

AdamWill commented Nov 29, 2023 • edited

AdamWill commented Nov 29, 2023

AdamWill commented Nov 29, 2023

mattiaverga left a comment

Choose a reason for hiding this comment

jlebon commented Feb 1, 2024

AdamWill commented Feb 1, 2024 • edited

jlebon commented Feb 1, 2024

jlebon commented Feb 1, 2024

AdamWill commented Feb 1, 2024

jlebon commented Feb 5, 2024

AdamWill commented Feb 5, 2024

AdamWill commented Feb 5, 2024

AdamWill commented Nov 21, 2023 •

edited

AdamWill commented Nov 22, 2023 •

edited

AdamWill commented Nov 25, 2023 •

edited

AdamWill commented Nov 29, 2023 •

edited

AdamWill commented Feb 1, 2024 •

edited