tools: migrate to Toolforge buildservice by lgelauff · Pull Request #564 · hatnote/montage

lgelauff · 2026-05-04T14:29:48Z

Summary

Migrates Montage from toolforge webservice python3.13 to the Toolforge buildservice (Cloud Native Buildpacks). Adds Procfile, .python-version, and a root-level package.json shim so the Python and Node buildpacks both activate — baking the Vue frontend into the container image and eliminating the separate node20 frontend-build job.

Background and design rationale: #563.

Also incorporates the Python 3.13 dependency fixes from #512 (cffi 1.17.1, setuptools<82 pin).

What changed

Procfile: defines the gunicorn web process; binds to $PORT (Toolforge injects PORT=8000)
.python-version: pins Python 3.13 for the buildpack; removed from .gitignore
package.json (repo root): shim to trigger Node buildpack; build script delegates to frontend/
montage/utils.py: get_env_name() checks MONTAGE_ENV env var first (LDAP unavailable in containers); load_env_config() reads secrets from individual MONTAGE_* env vars when MONTAGE_OAUTH_CONSUMER_TOKEN is set, falls back to YAML for local dev
app.py: removes urllib3.contrib.pyopenssl shim (deprecated since urllib3 2.0, unnecessary on Python 3.13)
requirements.in: adds gunicorn; removes pyopenssl; unpins cryptography
requirements.txt: cffi 1.16.0 → 1.17.1 (adds cp313 wheels, no source compilation needed); setuptools pinned <82 (setuptools 82 removed pkg_resources, needed by python-graph-core — tracked in Automate setuptools installation #421)
dockerfile: Python 3.9 → 3.13 to match deployment; adds gcc libffi-dev build deps
deployment.md: rewritten for the buildservice workflow

Secrets

All credentials are now set via toolforge envvars (Kubernetes Secrets) — no config YAML file needed at runtime. See deployment.md step 3 for the full list of env vars and the warning about shell history on the shared bastion.

Test plan

Note: PR #534 (tools/build-frontend-script) is kept open as a fallback until the buildservice is confirmed working on Toolforge.

Replace toolforge webservice python3.13 with the Toolforge buildservice (Cloud Native Buildpacks). Adds Procfile, .python-version, and a root-level package.json shim so both Python and Node buildpacks activate, baking the Vue frontend into the container image and eliminating the separate node20 frontend-build job. Config and env detection are updated in utils.py to work inside containers: - get_env_name() checks MONTAGE_ENV first (LDAP/getpass.getuser unavailable in containers per Toolforge docs) - load_env_config() loads secrets from individual MONTAGE_* environment variables (stored via toolforge envvars) when MONTAGE_OAUTH_CONSUMER_TOKEN is set; falls back to YAML for local dev Removes the urllib3/pyopenssl SSL shim (deprecated since urllib3 2.0, no longer needed on Python 3.13) and its dependency from requirements.in. Removes the cryptography version pin, which was a uwsgi+PyO3 workaround that does not apply to gunicorn. Adds gunicorn to requirements.in. Rewrites deployment.md for the buildservice workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Matches the Python version used on Toolforge (.python-version). Required by gunicorn>=20 which dropped support for Python <3.10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cffi 1.16.0 ships a cp39 manylinux wheel but not a cp313 one, so pip falls back to compiling from source on Python 3.13. The slim image has no C toolchain. gcc and libffi-dev are the standard build requirements for cffi. The underlying fix (cffi>=1.17.1, which has cp313 wheels) is tracked in #512. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cffi 1.17.1 adds cp313 wheels, eliminating source compilation on Python 3.13 (and the need for gcc in the CI image). setuptools is pinned <82 because setuptools 82 removed pkg_resources, which python-graph-core depends on via namespace packages. Remove once python-graph-core is replaced (#421). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Heroku Node.js buildpack v5 requires a package-lock.json to run npm ci and npm run build. Without it, the buildpack installs Node.js but skips the install/build steps, leaving montage/static/ empty. The build script runs: cd frontend && npm ci && npm run toolforge:build which builds the Vite frontend and copies assets to montage/static/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Toolforge buildservice startup probe checks port 8000. The earlier change to 5000 was incorrect and caused the startup probe to fail with "connection refused", putting pods into CrashLoopBackOff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NFS must be mounted so Montage can write logs to /data/project/<toolname>/. Also adds URLs to the multi-environment table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

frontend/.env is gitignored, so the buildpack never had it. Without it, VITE_API_ENDPOINT was undefined at build time, making every Axios call go to /undefined/v1/... and fail with 404. .env.production sets it to empty string so Vite produces a relative /v1/ base URL, correct for same-origin Toolforge deployments. Also adds build verification and debugging guidance to deployment.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

**Important for deployers**: Montage now uses OAuth 2.0. If you have an existing OAuth 1.0a consumer registered on Meta-Wiki, you will need to register a new OAuth 2.0 consumer at Special:OAuthConsumerRegistration. Changes: - Replace mwoauth Handshaker/RequestToken with direct OAuth 2.0 PKCE flow using requests against rest.php/oauth2/ endpoints - New env vars: MONTAGE_OAUTH_CLIENT_ID, MONTAGE_OAUTH_CLIENT_SECRET, MONTAGE_OAUTH_REDIRECT_URI (replaces MONTAGE_OAUTH_CONSUMER_TOKEN / MONTAGE_OAUTH_SECRET_TOKEN) - Remove mwoauth, oauthlib, requests-oauthlib, pyjwt from dependencies - debug mode in complete_login now reads debug_username/debug_userid from config instead of hardcoded values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The juror round listing was filtering on campaign status only, so finalized and cancelled rounds inside active campaigns still appeared. Jurors clicking them got a confusing 400 "round not active" error. Now only active and paused rounds are shown in the active view. Suspected cause: e3069dd (PR #350, merged 2025-10-23) added a finalize_round endpoint that can finalize individual rounds without advancing, creating a state the juror listing never filtered for. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Automates: build → poll for completion → verify SHA + port → warn if running SHA already matches → restart → smoke-test /meta/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- git pull --ff-only at start so the script stays current - default ref changed to master (was tools/buildservice) - document one-time clone in script header Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Paused rounds were still appearing in the juror round list because the only_active filter allowed both ACTIVE_STATUS and PAUSED_STATUS. However confirm_active() only permits ACTIVE_STATUS, so clicking a paused round always produced a 400 "round not active" error. Fix restricts the juror index to strictly active rounds only. Suspected introduced by e3069dd (#350). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- _load_config_from_env now raises ValueError if MONTAGE_DEBUG=true in prod, beta, or devlabs; prevents a single envvar command from disabling auth on a live Toolforge tool - UserMiddleware auto-login path split: devtest keeps the test fixture user (6024474); debug mode uses userid=0 / '__montage_debug__' which has no match in any real DB, producing an immediate and obvious error if debug mode somehow reaches a production database - complete_login debug defaults changed to the same sentinel values so config.dev.yaml must explicitly set debug_userid / debug_username Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Unvalidated ?next= parameter allowed open redirects: any URL passed to /login?next= or /logout?next= was stored in a signed cookie and redirected to after auth, enabling phishing via crafted links. _safe_next() accepts only same-origin paths (starts with / but not //) and falls back to root_path for anything else. Applied at all three sites: login (cookie store), logout (direct redirect), and complete_login (cookie read, defence-in-depth). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

get_index() accepted an only_active parameter but hard-coded True when calling get_all_rounds_task_counts, so get_all_campaigns() (the archive endpoint at /juror/campaigns/all) was always returning an empty list instead of finalized rounds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously scm_secure was only True when env_name == 'prod', so beta and devlabs deployments (both served over HTTPS on Toolforge) shipped session cookies without the Secure flag. A MONTAGE_ENV typo (e.g. 'production' instead of 'prod') would silently drop it on prod. - scm_secure is now True for all envs except dev and devtest - app startup raises on unknown env names so typos fail loudly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tests - LoggingMiddleware now redacts code/state/access_token/token from request.args before writing to api_log; these are auth material and should never appear on disk - complete_login no longer swallows exceptions silently; OAuth exchange failures are now logged via api_log so operators can distinguish 'user cancelled' from 'client secret is wrong' or 'MW is down' - Four new tests cover the PKCE flow using mocked MW endpoints: login redirect params, state mismatch, token exchange failure, and successful login with cookie verification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- app.py: refuse to start if cookie_secret is the default placeholder or shorter than 32 chars - public_endpoints.py: use hmac.compare_digest for state comparison (constant-time, prevents timing oracle on the CSRF token) - utils.py: print effective superuser list at startup so misconfigured MONTAGE_SUPERUSERS is immediately visible in logs - deploy.sh: fail explicitly when git ls-remote cannot resolve the ref instead of silently using the literal branch name as the SHA - dev.md: update OAuth keys to oauth_client_id/secret/redirect_uri and note that debug_userid/debug_username must be set for a useful local dev session Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mahmoud

Big PR! First impressions: There are a few other things in the yaml that aren't secret per se? I wonder if that's too many env vars... and what's the deal with the huge .secrets.baseline ?

mahmoud · 2026-05-23T20:24:37Z

@@ -0,0 +1 @@
+VITE_API_ENDPOINT=


hmm does this need to be in here?

try removing it - I recall that it keeps crashing if you do. Needs to be declared? I came across this earlier in #516

mahmoud · 2026-05-23T20:28:01Z

-
-    cookie['return_to_url'] = request.args.get('next', root_path)
-    return redirect(redirect_url)
+def login(request, oauth_config, cookie, root_path):


curious why switching buildpacks would require changing this?

oauth is upgrading here from v1 to v2. This was because i had to register a new consumer after deleting the old credentials accidentally.
Perhaps not the most lean implementation but made life easier.
Of course, if there is some reason to go back to v1, happy to register that instead.

mahmoud · 2026-05-23T20:31:21Z

            users_rounds_query = users_rounds_query.where(campaigns_t.c.status == 'active')
            users_rounds_query = users_rounds_query.where(campaigns_t.c.is_archived == False)
+            users_rounds_query = users_rounds_query.where(
+                rounds_t.c.status == ACTIVE_STATUS


again, seems unrelated to buildpacks?

separate bug, that I think got in the way during testing. Was easier to fix than work around.

Can cherrypick it as I can see there might be a conversation whether to hide paused (or completed) rounds now i think more about it.

mahmoud · 2026-05-23T20:36:11Z

    elif env_name == 'devtest':
        return dict(DEVTEST_CONFIG)

+    if os.environ.get('MONTAGE_OAUTH_CLIENT_ID'):


seems like this is being used as a detector for if env vars are in use, might be worth commenting that or something?

OAuth 1.0a → 2.0 PKCE rewrite (413ea20)

mahmoud · 2026-05-23T20:37:47Z

+fi
+
+echo ""
+echo "==> Deploy complete: $TOOL_URL"


this is one of those scripts where if it works, it works, and if it doesn't we'll just fix it :P if there's a way to test it, it'll be pretty involved. I wonder if it makes sense to have a skill that runs it so that some automatic diagnosis and such can happen.

yeah, personal preference, I guess :) Especially the smoke test was helpful to me, as it is explicit about when it's really ready.

mahmoud · 2026-05-23T20:55:17Z

@@ -0,0 +1 @@
+3.13


nice. i assume it passes tox and your clicktesting?

Good pointer, upgraded tox, and confirmed this passes. See #573

mahmoud · 2026-05-23T20:56:31Z

@@ -0,0 +1,7 @@
+{


we want this at the root? not in the frontend dir?

yes, see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_container_images

mahmoud · 2026-05-23T20:57:14Z

+    # via python-graph-core (pkg_resources namespace; implicit dep on Python 3.13)
+    # Pinned <82: setuptools 82 removed pkg_resources entirely; python-graph-core
+    # depends on it via namespace packages. Remove this pin once python-graph-core
+    # no longer uses pkg_resources (tracked in hatnote/montage#421).


good to know!

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lgelauff · 2026-05-24T07:52:35Z

Big PR! First impressions: There are a few other things in the yaml that aren't secret per se? I wonder if that's too many env vars... and what's the deal with the huge .secrets.baseline ?

Thanks for the comments and questions. In the new setup, we don't use a persistent file system any longer. It now also includes environment-specific variables. Moved it to env var. If you want, we can make that distinction clearer and move them to the USER_ENV_MAP.

The secrets baseline was for a pre-commit hook that I installed just in case Claude goes overboard and wants to commit something with a secret. The baseline are all known false positives. I realize now that I look more carefully that this can easily be condensed by skipping the test data.

test_data/wlm2015_fr_12k.csv contains 1,798 Wikimedia img_sha1 hashes that detect-secrets flagged as high-entropy strings. Excluding the directory shrinks the baseline from 12,745 lines to 159. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gunicorn 25.x (added in this PR) requires Python >=3.10, so Python 3.9 is no longer supported. The minimum supported version is now Python 3.11 (local dev / CI); the deployment target is Python 3.13 (buildservice). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lgelauff · 2026-05-24T09:26:15Z

Python version note: This PR adds gunicorn 25.3.0, which requires Python >=3.10 — Python 3.9 is no longer supported as of this change. Minimum supported version is now Python 3.11 (local dev/CI); deployment target remains Python 3.13 (buildservice). Dropped py39 from the tox matrix in ec20524.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lgelauff · 2026-05-24T09:37:45Z

Merge dependency: deployment.md references tools/migrate_prod_db.sql (line 92), which is added by #505. This PR should not be merged before #505, otherwise the upgrade deployment instructions will reference a file that doesn't exist yet.

lgelauff and others added 13 commits May 4, 2026 15:47

ci: update test Docker image from Python 3.9 to 3.13

632498b

Matches the Python version used on Toolforge (.python-version). Required by gunicorn>=20 which dropped support for Python <3.10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Create .envrc

c66df89

fix: bind gunicorn to port 5000 for Toolforge buildservice

23eb8ac

docs: add --mount all to webservice commands and improve multi-env table

2796dc7

NFS must be mounted so Montage can write logs to /data/project/<toolname>/. Also adds URLs to the multi-environment table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: remove unused router import in RoundEdit.vue

f210c91

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lgelauff force-pushed the tools/buildservice branch from 1ada08b to 6b68758 Compare May 21, 2026 18:41

lgelauff and others added 10 commits May 21, 2026 20:43

feat: add tools/deploy.sh for buildservice deployments

21515b4

Automates: build → poll for completion → verify SHA + port → warn if running SHA already matches → restart → smoke-test /meta/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: self-update and default to master in deploy.sh

49d2b79

- git pull --ff-only at start so the script stays current - default ref changed to master (was tools/buildservice) - document one-time clone in script header Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update deployment.md with deploy.sh usage and one-time clone steps

573f557

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mahmoud reviewed May 23, 2026

View reviewed changes

lgelauff mentioned this pull request May 24, 2026

ci: upgrade tox to 4.x, target Python 3.13, document local test workflow #573

Open

4 tasks

docs: explain root package.json is a Toolforge buildservice trigger

825db35

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lgelauff and others added 2 commits May 24, 2026 09:58

fix: remove debug print from _load_config_from_env

d928026

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lgelauff mentioned this pull request May 24, 2026

fix: juror round listing shows wrong rounds #571

Open

4 tasks

		@@ -0,0 +1 @@
		VITE_API_ENDPOINT=

Conversation

lgelauff commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Secrets

Test plan

Uh oh!

mahmoud left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgelauff May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgelauff commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgelauff commented May 24, 2026

Uh oh!

lgelauff commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lgelauff commented May 4, 2026 •

edited

Loading

lgelauff May 23, 2026 •

edited

Loading

lgelauff commented May 24, 2026 •

edited

Loading