add landoscript #1135

bhearsum · 2025-02-13T02:10:16Z

This (fairly hefty) pull request adds a new scriptworker: landoscript. This new scriptworker will be replacing treescript when writes to hg.mozilla.org for Gecko repositories cease to happen.

Because it ultimately fulfills the same use cases as treescript, I've chosen to keep the payload as similar as possible. There are some necessary changes (mostly around using lando repository names instead of references to hg.mozilla.org repositories). I've also removed support for some things that are not used (eg: ignore_closed_tree and dontbuild are either not present for most actions, or always have the same value).

Some notes on the tests included here:

Most of the success tests have payloads taken directly from real world treescript tasks (with some small tweaks, per the above note). I fished around and tried to find all variations of these tasks that exist (most notably for merge_day and version_bump tasks, which are often slightly different depending on which repository they are running from).
The approach I've taken to testing here is the same as what I've been attempting to do for signingscript in WIP: add tests that operate at the API level #1097: execute tests by running async_main, rather than deeply unit testing individual components. This does end up making the tests notable bigger, which can be intimidating, but I've found that once written they need little to no touching, and has made refactoring as I go along significantly easier. There is probably an argument to be made that certain helper pure functions (like diff_contents) could do with some separate unit testing - but I've not gone to that length yet.
I'm not terribly happy with how the responses for the github client are set-up. Because the GraphQL API uses a single endpoint for all requests, and I'm simply returning responses in a fixed order, the ordering of the aioresponses calls is important. In an ideal world we would be able to key responses off of parts of the requests, but I haven't had time to make this happen.

At the time of writing I have not yet been able to test it in a dev environment - I've merely been working off of the spec at https://docs.google.com/document/d/1F3xYp6v4YLLHsX7zFkYoygDyvIUPzujxeMW7W4qte0o/edit?tab=t.0#heading=h.c29tth48xps4. Seeing as this is completely unused so far, I'm hoping to get what's here merged, and then follow up with more targeted pull requests as I fix issues that I encounter when testing in the real world.

landoscript/src/landoscript/script.py

landoscript/src/landoscript/github.py

landoscript/src/landoscript/actions/version_bump.py

…key names A few things going on here, mostly boring: * Move the existing GithubClient to scriptworker_client * Move the associated test fixtures to a new `pytest-scriptworker-client` package * Necessary adjustments in treescript to let it be able to find and use the new packages Although not ideal, this also includes a minor fix to allow slashes in GraphQL key names that I had already incorporated into the version of this code in mozilla-releng#1135. I can pull it out if necessary, but I think it's probably small and safe enough to just keep in here?

…key names (#1147) A few things going on here, mostly boring: * Move the existing GithubClient to scriptworker_client * Move the associated test fixtures to a new `pytest-scriptworker-client` package * Necessary adjustments in treescript to let it be able to find and use the new packages Although not ideal, this also includes a minor fix to allow slashes in GraphQL key names that I had already incorporated into the version of this code in #1135. I can pull it out if necessary, but I think it's probably small and safe enough to just keep in here?

Goes with the landoscript work from mozilla-releng/scriptworker-scripts#1135

ahal

Looking good! Did a first pass

scriptworker_client/src/scriptworker_client/github_client.py

ahal · 2025-04-16T18:41:54Z

landoscript/src/landoscript/actions/version_bump.py

+
+def log_file_contents(contents):
+    for line in contents.splitlines():
+        log.info(line)


It's not clear to me why this is better than just logging contents directly. Having the logger preamble in front of each line just makes it harder to copy/paste imo.

Feel free to ignore this if this was just copy/pasted and you want to keep it as similar as possible.

I don't recall now...my only goal is to make sure newlines are rendered. That can probably be done without the splitting though!

Oh, I recall why we have this now: it's to ensure that we get one log line per line in the file, while ensuring each line is prefixed. (Arguably we could simply log with a leading newline to achieve a similar effect...)

Just for clarity: I plan to leave this as is unless you feel it's preferable to log with a preceding newline.

ahal · 2025-04-16T18:51:58Z

landoscript/src/landoscript/lando.py

+
+
+async def poll_until_complete(session: ClientSession, poll_time: int, status_url: str):
+    while True:


So I guess the idea is that we poll until we hit the Taskcluster timeout?

I think we might be losing some of the idempotent-ness of the old approach here. It will certainly be possible for this task to fail due to timeout, but Lando could still land it afterwards. This means a failed task doesn't necessarily indicate a failed landing.

Do you know what would happen if we re-ran the task in such a case? For version bump, it might be worth ensuring that if the requested version is already what's committed, that we exit successfully. I'm not really sure what to do in the general case..

Another approach might be to timeout within the task, and abort the landing prior to failing. That way there's more of a guarantee that a failed landoscript task means the push didn't get landed. I don't think this needs to be fully solved for the MVP

Yeah, polling until we timeout is what is happening here, and that's a really good point you've made.

And to be honest, I'm not actually sure what the best thing to do here is. I think in the short term, attempting to cancel the request at shutdown for any non-success makes sense. But this is also less than ideal if, eg: we get spot terminated and have to redo the work because of it. There's also no guarantees that we'll be able to cancel the request in time in the event of a spot termination.

Another obvious option (for the future) would be to take the notarization approach, and do the polling in a separate task. But I also wonder if we even need to bother to poll in some cases? For example, l10n bumps don't have anything downstream of them...maybe we could just spit out the status URL, and leave it at that for cases like that?

Spot termination is not a concern for scriptworkers. We might get killed if we exceed terminationGracePeriodSeconds though, in which case we should still aim to preserve idempotence. It looks like we only ever have a single lando request, which hopefully is handled atomically at the other end (TIL git push has a --atomic option, nice, I'm only 9 years late to notice)? (I was wondering how this worked for merge day. Currently we have 2 hg pushes, e.g. once to tag central and another to merge to beta. I guess that goes away because we know they're the same underlying git repo and so it doesn't matter where the tag goes?)

We could try and cancel the submission on shutdown (bitrisescript seems to do something similar:

scriptworker-scripts/bitrisescript/src/bitrisescript/script.py

Line 47 in 9867329

loop.add_signal_handler(signal.SIGTERM, handle_sigterm, future_group)

), and there should be a couple of minutes between SIGTERM and SIGKILL. I would be a bit worried about races between the attempt to cancel and the request landing (or maybe the cancel attempt itself times out, or ...).

So there's a few scenarios that could happen AFAICT:

maybe the initial request didn't go through, we can rerun everything, all good

maybe the initial request failed, or was canceled, we can safely retry.

maybe the initial request succeeded before the rerun, we should detect that and not do anything

maybe the initial request is still in flight. We could potentially find it and poll on it, or cancel it and retry? If we can't find it, we may have 2 requests in flight, hopefully only one can succeed?

In treescript we also had the case where one push went through but not the other, and the case where when the next run checked version numbers, the first push had gone through to hgssh but was not yet mirrored out to hgweb.

Thanks for the additional thoughts and strategies, @jcristau! Not needing to worry about spot termination certainly simplifies things, as we can make some safer assumptions.

I'm going to give this a bit more thought - but clearly we need some sort of special handling for the rerun scenario, if not separating polling altogether.

In treescript we also had the case where one push went through but not the other, and the case where when the next run checked version numbers, the first push had gone through to hgssh but was not yet mirrored out to hgweb.

If we're talking about two pushes to the same repository, this case will go away AFAIK - all changes to a given repository (which in the new world means branch within the new github repo) will either happen or not.

Currently we have 2 hg pushes, e.g. once to tag central and another to merge to beta. I guess that goes away because we know they're the same underlying git repo and so it doesn't matter where the tag goes?)

Yeah, that's right. Although that does bring to mind something a bit odd about the API: tag actions are applied to a repository+branch - not to a repository overall. @cgsheeh - is this something you've thought about? (I don't think we necessarily need to change anything here, I'm just noting it because it's a bit odd.)

The more I think about how to handle termination, the more it seems like moving polling out of the landoscript tasks might be the best thing to do. Something like:

Return immediately from landoscript after a successful submission; attaching the status url in a well enough artifact

Add a new task downstream of each landoscript task which does the polling. This doesn't need to be a scriptworker task AFAIK, because polling doesn't require credentials.

It would also avoid the need for any cancellation in landoscript, and thus any race conditions associated with that. If we trust that a landoscript task would never be rerun unless a human has inspected the state of things first, this would eliminate the need for special rerun handling in landoscript, I think. (Although perhaps it's too much a leap to assume it would never be errantly rerun, and we'd want at least some of the rerun safety checks?)

The main downside here is that if the polling task finds that the landoscript submission has failed, we'd need to rerun the upstream landoscript task, and then force a new polling task afterwards.

For posterity, I discovered after writing this that all Lando endpoints (including status polling) require authentication. So at least in the immediate term, moving polling outside of landoscript is not an option. (I guess we could poll in a separate landoscript task, but that probably doesn't provide any benefits...)

landoscript/src/landoscript/actions/android_l10n_sync.py

ahal · 2025-04-16T20:09:48Z

landoscript/src/landoscript/actions/android_l10n_sync.py

+    diff = ""
+    for l10n_file in l10n_files:
+        if l10n_file["dst_name"] not in orig_files:
+            log.warning(f"WEIRD: {l10n_file['dst_name']} not in dst_files, continuing anyways...")


Lol, I approve of this new log level.

landoscript/tests/conftest.py

ahal · 2025-04-16T20:12:58Z

landoscript/docker.d/init_worker.sh

+  fi
+}
+
+# TODO: real URLs


Comment just to help remember this.

ahal

Thanks, lgtm!

…bump action This adds the rough structure for landoscript as well as implementing the `version_bump` action (necessary to make it practical to test the initial code).

Most notably, this moves common set-up into conftest, and tests that aren't testing action-specific logic (eg: lando submission) into test_script.

The helper functions here are copied out of treescript (which will soon be EOL'ed). Also included here is some minor refactoring to avoid duplication of common of `create-commit` logic.

Most of the helpers here are copied out of treescript, with some tweaks and simplications where it was possible (mostly to get rid of now-unnecessary logic). Some refactoring of other actions was done here as well, to make it possible to call them from the `merge_day` action. Most notably: the `version_bump` action has been updated to support multiple version bumps in one run, which allows us to do all the merge day version bumps in a single commit, as we do now with treescript. Additional test refactoring/movement was also done to make many of the helpers available to merge day tests.

…of the happy path tests This new helper allows a huge simplication for most of the action-specific tests.

The implementation of this is a fairly significant departure from the treescript one in two ways: * It supports these actions without having a Gecko or `android-l10n` tree available. This necessitated fetching the remote file listings of these repositories, and using `moz.l10n` instead of `compare-locales`. * I've fully separated out implementation of the actions. Although at a very high level they look similar, the details are different enough that IMO it's much easier and better to duplicate some of the code rather than add the indirection to avoid it. (I found it very difficult to read the treescript implementation because of this, and I didn't want to do the same here). A necessary part of doing this was some enhancements to `diff_contents` to support added and removed files properly.

…ation outside of l10n bump As it turns out, these are static for all other types of actions.

…uests Also ensure that LANDO_API and LANDO_TOKEN are set up during startup.

…of config

This required a little bit of massaging for places where there's nested dataclasses, but it was otherwise straightforward. This change also uncovered some places where tests were including unnecessary data, which have been fixed. It also replaced some existing null checks (because the dataclass will throw if a required member is not present during construction).

bhearsum · 2025-04-17T23:59:38Z

To help make additional changes easier to review I'm going to merge this initial patch. There are still a few things to address from comments here (most notably - how to handle unfinished lando submissions) that will come in follow-up PRs next week. Thanks to everyone who reviewed this giant patch so far!

@ahal

…ocessing any of them This is a follow to address something [@ahal called out in the initial review of landoscript](mozilla-releng#1135 (comment)).

As pointed out by ahal in mozilla-releng#1135, we're using a new enough Python that we can use the built in parser.

@ahal

…ocessing any of them This is a follow to address something [@ahal called out in the initial review of landoscript](mozilla-releng#1135 (comment)).

As pointed out by ahal in mozilla-releng#1135, we're using a new enough Python that we can use the built in parser.

As pointed out by ahal in #1135, we're using a new enough Python that we can use the built in parser.

@ahal

…ocessing any of them This is a follow to address something [@ahal called out in the initial review of landoscript](mozilla-releng#1135 (comment)).

@ahal

…ocessing any of them This is a follow to address something [@ahal called out in the initial review of landoscript](mozilla-releng#1135 (comment)).

@ahal

…ocessing any of them (#1165) This is a follow to address something [@ahal called out in the initial review of landoscript](#1135 (comment)).

bhearsum mentioned this pull request Feb 13, 2025

feat: add landoscript for gecko to nonprod mozilla-releng/k8s-autoscale#168

Merged

bhearsum force-pushed the lando branch from c605ec5 to b5ffaf4 Compare February 14, 2025 14:10

bhearsum force-pushed the lando branch 5 times, most recently from f65011c to 58a14c5 Compare March 6, 2025 21:49

bhearsum force-pushed the lando branch 2 times, most recently from 6cd34a0 to f880a85 Compare March 17, 2025 17:56

ahal reviewed Mar 17, 2025

View reviewed changes

bhearsum mentioned this pull request Mar 18, 2025

move GithubClient to scriptworker_client; add support for slashes in key names #1147

Merged

bhearsum force-pushed the lando branch 3 times, most recently from 4c7c77b to 78d629c Compare March 20, 2025 01:22

bhearsum force-pushed the lando branch 4 times, most recently from ce7cc7a to 2e0da34 Compare April 2, 2025 17:19

bhearsum force-pushed the lando branch 2 times, most recently from a73c334 to 8d4bc25 Compare April 3, 2025 20:48

bhearsum added a commit to bhearsum/scriptworker that referenced this pull request Apr 14, 2025

feat: add cot_restricted_scopes for landoscript

fb300a7

Goes with the landoscript work from mozilla-releng/scriptworker-scripts#1135

bhearsum mentioned this pull request Apr 14, 2025

feat: add cot_restricted_scopes for landoscript mozilla-releng/scriptworker#691

Merged

bhearsum force-pushed the lando branch 2 times, most recently from c93d3ae to 39fdb66 Compare April 15, 2025 16:51

ahal requested changes Apr 16, 2025

View reviewed changes

bhearsum force-pushed the lando branch from cdedc38 to bf09eec Compare April 17, 2025 14:30

bhearsum requested a review from ahal April 17, 2025 17:01

ahal approved these changes Apr 17, 2025

View reviewed changes

bhearsum added 13 commits April 17, 2025 19:57

feat(landoscript): initial landoscript code with support for version …

65aaf1b

…bump action This adds the rough structure for landoscript as well as implementing the `version_bump` action (necessary to make it practical to test the initial code).

feat(landoscript): implement support for tag action

4bcc599

refactor(landoscript): add tests that run multiple actions in one run

37c105a

Most notably, this moves common set-up into conftest, and tests that aren't testing action-specific logic (eg: lando submission) into test_script.

feat(landoscript): implement l10n_bump action

31c0e0b

The helper functions here are copied out of treescript (which will soon be EOL'ed). Also included here is some minor refactoring to avoid duplication of common of `create-commit` logic.

refactor(landoscript): create run_test helper that simplifies most …

5ef149f

…of the happy path tests This new helper allows a huge simplication for most of the action-specific tests.

fix(landoscript): don't support dontbuild/ignore closed tree configur…

f662760

…ation outside of l10n bump As it turns out, these are static for all other types of actions.

feat(landoscript): add authorization header when submitting lando req…

a05cb53

…uests Also ensure that LANDO_API and LANDO_TOKEN are set up during startup.

fix(landoscript): address review comments around lando api requirements

daff1d9

feat(landoscript): pull repository url and branch from lando instead …

4f24f01

…of config

fix(landoscript): improve comment around some test code

5439054

bhearsum force-pushed the lando branch from bf09eec to 5439054 Compare April 17, 2025 23:57

bhearsum merged commit 118b709 into mozilla-releng:master Apr 18, 2025
30 checks passed

bhearsum added a commit to bhearsum/scriptworker-scripts that referenced this pull request Apr 18, 2025

fix(landoscript): use tomllib instead of tomli

239bfa0

As pointed out by ahal in mozilla-releng#1135, we're using a new enough Python that we can use the built in parser.

This was referenced Apr 18, 2025

fix(landoscript): check all android toml files at one time, before pr… #1165

Merged

fix(landoscript): use tomllib instead of tomli #1166

Merged

bhearsum added a commit to bhearsum/scriptworker-scripts that referenced this pull request Apr 21, 2025

fix(landoscript): use tomllib instead of tomli

3379930

As pointed out by ahal in mozilla-releng#1135, we're using a new enough Python that we can use the built in parser.

bhearsum added a commit that referenced this pull request Apr 21, 2025

fix(landoscript): use tomllib instead of tomli (#1166)

c1b23ab

As pointed out by ahal in #1135, we're using a new enough Python that we can use the built in parser.



		async def poll_until_complete(session: ClientSession, poll_time: int, status_url: str):
		while True:

add landoscript #1135

add landoscript #1135

Uh oh!

Conversation

bhearsum commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahal left a comment

Choose a reason for hiding this comment

Uh oh!

bhearsum commented Apr 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bhearsum commented Feb 13, 2025 •

edited

Loading