tools: migrate to Toolforge buildservice#564
Conversation
Replace toolforge webservice python3.13 with the Toolforge buildservice (Cloud Native Buildpacks). Adds Procfile, .python-version, and a root-level package.json shim so both Python and Node buildpacks activate, baking the Vue frontend into the container image and eliminating the separate node20 frontend-build job. Config and env detection are updated in utils.py to work inside containers: - get_env_name() checks MONTAGE_ENV first (LDAP/getpass.getuser unavailable in containers per Toolforge docs) - load_env_config() loads secrets from individual MONTAGE_* environment variables (stored via toolforge envvars) when MONTAGE_OAUTH_CONSUMER_TOKEN is set; falls back to YAML for local dev Removes the urllib3/pyopenssl SSL shim (deprecated since urllib3 2.0, no longer needed on Python 3.13) and its dependency from requirements.in. Removes the cryptography version pin, which was a uwsgi+PyO3 workaround that does not apply to gunicorn. Adds gunicorn to requirements.in. Rewrites deployment.md for the buildservice workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matches the Python version used on Toolforge (.python-version). Required by gunicorn>=20 which dropped support for Python <3.10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cffi 1.16.0 ships a cp39 manylinux wheel but not a cp313 one, so pip falls back to compiling from source on Python 3.13. The slim image has no C toolchain. gcc and libffi-dev are the standard build requirements for cffi. The underlying fix (cffi>=1.17.1, which has cp313 wheels) is tracked in #512. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cffi 1.17.1 adds cp313 wheels, eliminating source compilation on Python 3.13 (and the need for gcc in the CI image). setuptools is pinned <82 because setuptools 82 removed pkg_resources, which python-graph-core depends on via namespace packages. Remove once python-graph-core is replaced (#421). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Heroku Node.js buildpack v5 requires a package-lock.json to run npm ci and npm run build. Without it, the buildpack installs Node.js but skips the install/build steps, leaving montage/static/ empty. The build script runs: cd frontend && npm ci && npm run toolforge:build which builds the Vite frontend and copies assets to montage/static/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Toolforge buildservice startup probe checks port 8000. The earlier change to 5000 was incorrect and caused the startup probe to fail with "connection refused", putting pods into CrashLoopBackOff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NFS must be mounted so Montage can write logs to /data/project/<toolname>/. Also adds URLs to the multi-environment table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
frontend/.env is gitignored, so the buildpack never had it. Without it, VITE_API_ENDPOINT was undefined at build time, making every Axios call go to /undefined/v1/... and fail with 404. .env.production sets it to empty string so Vite produces a relative /v1/ base URL, correct for same-origin Toolforge deployments. Also adds build verification and debugging guidance to deployment.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**Important for deployers**: Montage now uses OAuth 2.0. If you have an existing OAuth 1.0a consumer registered on Meta-Wiki, you will need to register a new OAuth 2.0 consumer at Special:OAuthConsumerRegistration. Changes: - Replace mwoauth Handshaker/RequestToken with direct OAuth 2.0 PKCE flow using requests against rest.php/oauth2/ endpoints - New env vars: MONTAGE_OAUTH_CLIENT_ID, MONTAGE_OAUTH_CLIENT_SECRET, MONTAGE_OAUTH_REDIRECT_URI (replaces MONTAGE_OAUTH_CONSUMER_TOKEN / MONTAGE_OAUTH_SECRET_TOKEN) - Remove mwoauth, oauthlib, requests-oauthlib, pyjwt from dependencies - debug mode in complete_login now reads debug_username/debug_userid from config instead of hardcoded values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The juror round listing was filtering on campaign status only, so finalized and cancelled rounds inside active campaigns still appeared. Jurors clicking them got a confusing 400 "round not active" error. Now only active and paused rounds are shown in the active view. Suspected cause: e3069dd (PR #350, merged 2025-10-23) added a finalize_round endpoint that can finalize individual rounds without advancing, creating a state the juror listing never filtered for. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1ada08b to
6b68758
Compare
Automates: build → poll for completion → verify SHA + port → warn if running SHA already matches → restart → smoke-test /meta/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- git pull --ff-only at start so the script stays current - default ref changed to master (was tools/buildservice) - document one-time clone in script header Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Paused rounds were still appearing in the juror round list because the only_active filter allowed both ACTIVE_STATUS and PAUSED_STATUS. However confirm_active() only permits ACTIVE_STATUS, so clicking a paused round always produced a 400 "round not active" error. Fix restricts the juror index to strictly active rounds only. Suspected introduced by e3069dd (#350). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _load_config_from_env now raises ValueError if MONTAGE_DEBUG=true in prod, beta, or devlabs; prevents a single envvar command from disabling auth on a live Toolforge tool - UserMiddleware auto-login path split: devtest keeps the test fixture user (6024474); debug mode uses userid=0 / '__montage_debug__' which has no match in any real DB, producing an immediate and obvious error if debug mode somehow reaches a production database - complete_login debug defaults changed to the same sentinel values so config.dev.yaml must explicitly set debug_userid / debug_username Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Unvalidated ?next= parameter allowed open redirects: any URL passed to /login?next= or /logout?next= was stored in a signed cookie and redirected to after auth, enabling phishing via crafted links. _safe_next() accepts only same-origin paths (starts with / but not //) and falls back to root_path for anything else. Applied at all three sites: login (cookie store), logout (direct redirect), and complete_login (cookie read, defence-in-depth). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
get_index() accepted an only_active parameter but hard-coded True when calling get_all_rounds_task_counts, so get_all_campaigns() (the archive endpoint at /juror/campaigns/all) was always returning an empty list instead of finalized rounds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously scm_secure was only True when env_name == 'prod', so beta and devlabs deployments (both served over HTTPS on Toolforge) shipped session cookies without the Secure flag. A MONTAGE_ENV typo (e.g. 'production' instead of 'prod') would silently drop it on prod. - scm_secure is now True for all envs except dev and devtest - app startup raises on unknown env names so typos fail loudly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tests - LoggingMiddleware now redacts code/state/access_token/token from request.args before writing to api_log; these are auth material and should never appear on disk - complete_login no longer swallows exceptions silently; OAuth exchange failures are now logged via api_log so operators can distinguish 'user cancelled' from 'client secret is wrong' or 'MW is down' - Four new tests cover the PKCE flow using mocked MW endpoints: login redirect params, state mismatch, token exchange failure, and successful login with cookie verification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- app.py: refuse to start if cookie_secret is the default placeholder or shorter than 32 chars - public_endpoints.py: use hmac.compare_digest for state comparison (constant-time, prevents timing oracle on the CSRF token) - utils.py: print effective superuser list at startup so misconfigured MONTAGE_SUPERUSERS is immediately visible in logs - deploy.sh: fail explicitly when git ls-remote cannot resolve the ref instead of silently using the literal branch name as the SHA - dev.md: update OAuth keys to oauth_client_id/secret/redirect_uri and note that debug_userid/debug_username must be set for a useful local dev session Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mahmoud
left a comment
There was a problem hiding this comment.
Big PR! First impressions: There are a few other things in the yaml that aren't secret per se? I wonder if that's too many env vars... and what's the deal with the huge .secrets.baseline ?
| @@ -0,0 +1 @@ | |||
| VITE_API_ENDPOINT= | |||
There was a problem hiding this comment.
hmm does this need to be in here?
There was a problem hiding this comment.
try removing it - I recall that it keeps crashing if you do. Needs to be declared? I came across this earlier in #516
|
|
||
| cookie['return_to_url'] = request.args.get('next', root_path) | ||
| return redirect(redirect_url) | ||
| def login(request, oauth_config, cookie, root_path): |
There was a problem hiding this comment.
curious why switching buildpacks would require changing this?
There was a problem hiding this comment.
oauth is upgrading here from v1 to v2. This was because i had to register a new consumer after deleting the old credentials accidentally.
Perhaps not the most lean implementation but made life easier.
Of course, if there is some reason to go back to v1, happy to register that instead.
| users_rounds_query = users_rounds_query.where(campaigns_t.c.status == 'active') | ||
| users_rounds_query = users_rounds_query.where(campaigns_t.c.is_archived == False) | ||
| users_rounds_query = users_rounds_query.where( | ||
| rounds_t.c.status == ACTIVE_STATUS |
There was a problem hiding this comment.
again, seems unrelated to buildpacks?
There was a problem hiding this comment.
separate bug, that I think got in the way during testing. Was easier to fix than work around.
Can cherrypick it as I can see there might be a conversation whether to hide paused (or completed) rounds now i think more about it.
| elif env_name == 'devtest': | ||
| return dict(DEVTEST_CONFIG) | ||
|
|
||
| if os.environ.get('MONTAGE_OAUTH_CLIENT_ID'): |
There was a problem hiding this comment.
seems like this is being used as a detector for if env vars are in use, might be worth commenting that or something?
There was a problem hiding this comment.
OAuth 1.0a → 2.0 PKCE rewrite (413ea20)
| fi | ||
|
|
||
| echo "" | ||
| echo "==> Deploy complete: $TOOL_URL" |
There was a problem hiding this comment.
this is one of those scripts where if it works, it works, and if it doesn't we'll just fix it :P if there's a way to test it, it'll be pretty involved. I wonder if it makes sense to have a skill that runs it so that some automatic diagnosis and such can happen.
There was a problem hiding this comment.
yeah, personal preference, I guess :) Especially the smoke test was helpful to me, as it is explicit about when it's really ready.
| @@ -0,0 +1 @@ | |||
| 3.13 | |||
There was a problem hiding this comment.
nice. i assume it passes tox and your clicktesting?
There was a problem hiding this comment.
Good pointer, upgraded tox, and confirmed this passes. See #573
| @@ -0,0 +1,7 @@ | |||
| { | |||
There was a problem hiding this comment.
we want this at the root? not in the frontend dir?
There was a problem hiding this comment.
| # via python-graph-core (pkg_resources namespace; implicit dep on Python 3.13) | ||
| # Pinned <82: setuptools 82 removed pkg_resources entirely; python-graph-core | ||
| # depends on it via namespace packages. Remove this pin once python-graph-core | ||
| # no longer uses pkg_resources (tracked in hatnote/montage#421). |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thanks for the comments and questions. In the new setup, we don't use a persistent file system any longer. It now also includes environment-specific variables. Moved it to env var. If you want, we can make that distinction clearer and move them to the The secrets baseline was for a pre-commit hook that I installed just in case Claude goes overboard and wants to commit something with a secret. The baseline are all known false positives. I realize now that I look more carefully that this can easily be condensed by skipping the test data. |
test_data/wlm2015_fr_12k.csv contains 1,798 Wikimedia img_sha1 hashes that detect-secrets flagged as high-entropy strings. Excluding the directory shrinks the baseline from 12,745 lines to 159. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gunicorn 25.x (added in this PR) requires Python >=3.10, so Python 3.9 is no longer supported. The minimum supported version is now Python 3.11 (local dev / CI); the deployment target is Python 3.13 (buildservice). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Python version note: This PR adds gunicorn 25.3.0, which requires Python >=3.10 — Python 3.9 is no longer supported as of this change. Minimum supported version is now Python 3.11 (local dev/CI); deployment target remains Python 3.13 (buildservice). Dropped |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Migrates Montage from
toolforge webservice python3.13to the Toolforge buildservice (Cloud Native Buildpacks). AddsProcfile,.python-version, and a root-levelpackage.jsonshim so the Python and Node buildpacks both activate — baking the Vue frontend into the container image and eliminating the separate node20 frontend-build job.Background and design rationale: #563.
Also incorporates the Python 3.13 dependency fixes from #512 (cffi 1.17.1, setuptools<82 pin).
What changed
Procfile: defines the gunicorn web process; binds to$PORT(Toolforge injectsPORT=8000).python-version: pins Python 3.13 for the buildpack; removed from.gitignorepackage.json(repo root): shim to trigger Node buildpack;buildscript delegates tofrontend/montage/utils.py:get_env_name()checksMONTAGE_ENVenv var first (LDAP unavailable in containers);load_env_config()reads secrets from individualMONTAGE_*env vars whenMONTAGE_OAUTH_CONSUMER_TOKENis set, falls back to YAML for local devapp.py: removesurllib3.contrib.pyopensslshim (deprecated since urllib3 2.0, unnecessary on Python 3.13)requirements.in: addsgunicorn; removespyopenssl; unpinscryptographyrequirements.txt:cffi1.16.0 → 1.17.1 (adds cp313 wheels, no source compilation needed);setuptoolspinned <82 (setuptools 82 removedpkg_resources, needed bypython-graph-core— tracked in Automate setuptools installation #421)dockerfile: Python 3.9 → 3.13 to match deployment; addsgcc libffi-devbuild depsdeployment.md: rewritten for the buildservice workflowSecrets
All credentials are now set via
toolforge envvars(Kubernetes Secrets) — no config YAML file needed at runtime. Seedeployment.mdstep 3 for the full list of env vars and the warning about shell history on the shared bastion.Test plan
npm run buildfrom repo root (package.json shim) — assets land inmontage/static/cd frontend && npm run dev— Vite dev server startspytest montage/tests/— 6 passedload_env_config()reads fromMONTAGE_*env vars when setgunicorn app:appstarts and serves —/→ 200,/meta/→ route listpackage.jsonshim on Toolforge (needs working kube)MONTAGE_ENV+ secret envvars correctly configure the running service (needs working kube)$TOOL_DATA_DIRis NFS-mounted in the webservice pod; log paths resolve (needs working kube)Note: PR #534 (
tools/build-frontend-script) is kept open as a fallback until the buildservice is confirmed working on Toolforge.