SRE-733: Build backend images only when affected (turbo-driven matrix)#8784
Conversation
Replace the static service list in the backend build/manifest/deploy
matrices with a turbo affected-package query, so PRs and main pushes
only build/publish the images whose source actually changed.
- `setup` job holds a single `CATALOG` (per service: package, service,
push targets, dockerfile, context, optional build_args, ecs[]) as the
source of truth, and emits three matrices — `backend-images`,
`backend-manifests`, `backend-deploys` — via jq. workflow_dispatch
rebuilds everything; PR/push/merge_group filter to turbo's affected
set. A MISSING guard hard-fails if turbo flags a build:docker package
absent from the catalog (prevents a silent no-publish).
- build/manifest/deploy consume the dynamic matrices and gate on
`fromJSON(...).…[0] != null`. amd64 PR builds are skipped at
matrix-construction time; the `passed` gate tolerates skipped jobs.
- New workspace stub packages `infra/compose/{kratos,hydra}` (each a
`build:docker` script + LICENSE.md) let turbo attribute Dockerfile /
context changes to a package; registered in root workspaces.
- `set -euo pipefail` on the setup step so a failing `turbo query | jq`
fails loud instead of yielding an empty matrix and a green no-op run.
The standalone `notify-slack-deploy` job keyed on `failure()` of the `deploy` job only. When `build` fails, `deploy` is *skipped* (its needs aren't met) rather than failing — and a skipped job is not a failure — so no Slack alert fired. `setup` and `manifest` failures were never covered either. Fold the notification into the `passed` gate instead. Its Check steps already span setup/sourcemaps/build/manifest/deploy, so a step-level `failure()` fires whenever any of them fails. Two mutually-exclusive Slack steps gate by event: main push/dispatch → @infra, merge_group → @devops. Removes the now-redundant `notify-slack-deploy` job.
PR SummaryHigh Risk Overview On normal runs, Turbo Build no longer uses per-job Reviewed by Cursor Bugbot for commit f48ff08. Bugbot is set up for automated code reviews on this repo. Configure here. |
🤖 Augment PR SummarySummary: This PR updates the backend deploy workflow to only build/publish/deploy Docker images for services affected by the change set, using Turbo’s affected-package query. Changes:
🤖 Was this summary useful? React with 👍 or 👎 |
The stub packages `@infra/compose-{kratos,hydra}` were added so turbo's
affected-package query would attribute Dockerfile/context changes to a
package. But forcing non-JS infra directories into the yarn workspace
collided with the monorepo tooling:
- `prune-repository` does `cp -R infra/ out/`, so the pruned workspace
contained the stub package.json files while `turbo prune` omitted them
from the pruned lockfile → `yarn install --immutable` wanted to add
them → YN0028 ("lockfile would be modified"), failing every Lint and
Package job.
- They also spawned spurious `Package (@infra/compose-*)` lint jobs.
Kratos and Hydra aren't JS packages — their image build context is their
own directory, so `git diff --name-only HEAD^ HEAD` of that dir is a
complete, correct change signal (no dependency closure to track). Detect
app images via turbo (dependency-aware) and infra images via git diff;
catalog entries carry either `package` or `paths`.
Also filter the workspace root `//` out of the turbo affected list so the
MISSING guard can't hard-fail on it (matches the sourcemaps query).
Removes the four stub files and the two workspace entries; regenerates
the lockfile without the `@infra/compose-*` resolutions.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## t/sre-728-publish-images-to-ghcr #8784 +/- ##
=================================================================
Coverage 59.01% 59.01%
=================================================================
Files 1342 1342
Lines 129455 129455
Branches 5849 5849
=================================================================
+ Hits 76395 76396 +1
+ Misses 52159 52158 -1
Partials 901 901 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |
🌟 What is the purpose of this PR?
Make the backend image pipeline (introduced in #8781 / SRE-728) only build, publish, and deploy the services whose source actually changed, instead of rebuilding all of them on every event. The build / manifest / deploy matrices are now derived from a turbo
affectedPackagesquery against a single in-workflow service catalog.🔗 Related links
🚫 Blocked by
🔍 What does this change?
setupjob holds a singleCATALOGas the source of truth — per service:service,pushtargets,dockerfile,context, optionalbuild_args,ecs[](the ECS task definitions that service drives), and a detection key:packagefor app images (turbo workspace package) orpathsfor infra images. From it,setupemits three matrices via jq:backend-images,backend-manifests,backend-deploys.build:dockertask filter, dependency-aware — a shared-lib change rebuilds its dependents); Kratos/Hydra viagit diffof their context directory. Kratos and Hydra aren't JS packages, so turbo can't see them — but their image build context is their own directory, so a git diff ofpathsis a complete, correct change signal. The workspace-root//is filtered out of the turbo result.workflow_dispatchrebuilds everything (manual re-trigger);pull_request/push(main) /merge_groupfilter to the affected set. amd64 builds are dropped at matrix-construction time on PRs.build:dockerpackage that isn't in the catalog — so a newly-added image can never silently fail to publish.build/manifest/deployconsume the dynamic matrices and gate onfromJSON(needs.setup.outputs.…).…[0] != null; thepassedgate tolerates skipped matrix jobs (a PR that touches no backend image leaves them all skipped and still passes).set -euo pipefailon the setup step: a failingturbo query | jqwould otherwise be masked by jq's exit 0 on empty input, yielding an empty matrix and a green run that silently builds/deploys nothing.Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
📜 Does this require a change to the docs?
🕸️ Does this require a change to the Turbo Graph?
build:dockerpackages), but noturbo.jsonchange was needed — the stub packages inherit the rootbuild:dockertask config and turbo's default inputs.CATALOGuses jq's unquoted-key object syntax (it's a jq program, not a JSON file) for readability — intentional, parses correctly.🧹 Also in this PR
main. The old separatenotify-slack-deployjob keyed onfailure()ofdeploy, so abuildfailure (which skipsdeployrather than failing it) produced no Slack alert — andsetup/manifestfailures were never covered at all. Folded the notification into thepassedgate, whose Check steps span every pipeline job, so any failure onmainnow pings @infra (and merge-queue failures still ping @devops).🐾 Next steps
base: HEAD^turbo base is correct forpush/PR but under-reports inmerge_groupwith multiple stacked PRs; worth revisiting if the queue regularly batches backend changes.🛡 What tests cover this?
No automated tests — the workflow is the test surface. The jq matrix pipelines, the empty-set gates, the ECS-target mapping (vs. the pre-refactor static matrix), and turbo's stub-package detection were validated by executing jq/turbo/git against the real catalog across PR / push / dispatch / single-service / ECR-only / empty scenarios.
❓ How to test this?
manifest/deployskip;Deployments passedis green.apps/hash-api/**should build onlyapi; one touching onlyinfra/compose/kratos/**should build onlykratos.gh workflow run deploy.yml --ref <branch>rebuilds the full catalog and (on non-main) pushes the multi-arch GHCR tags end-to-end.main: only changed services build → push ECR/GHCR → ECS redeploys their mapped task definitions.