docs + fix: working terraform example, IAM policy, and silent-failure docs#8
Merged
jcastiarena merged 11 commits intomainfrom Apr 21, 2026
Merged
docs + fix: working terraform example, IAM policy, and silent-failure docs#8jcastiarena merged 11 commits intomainfrom
jcastiarena merged 11 commits intomainfrom
Conversation
The terraform example in `specs/terraform/` had three issues that prevented it
from working end-to-end:
1. `attributes = jsonencode({ region = "us-east-1" })` was missing the
nested `provider` / `network` / `distribution` blocks that the scope
workflow requires. The nullplatform API accepts the incomplete config
at create time, but `start-initial` rolls back with
`"network layer is not configured for provider 'aws'"` on the first
deployment — surfacing the problem four or five steps after the
mistake.
2. The `type` field was set to `provider_specification_id` (a UUID), but
the API treats this as a slug. Apply failed with
`"no specification found for slug: <UUID>"`. Switched to
`provider_specification_slug`.
3. Both `scope_definition` modules were pinned to
`ref=feature/remove-org-nrn`, which no longer exists in
`nullplatform/tofu-modules` (it was merged into main). `tofu init`
failed outright. Pinned both references to `main`.
Added three variables (`aws_region`, `aws_state_bucket`,
`aws_hosted_public_zone_id`) so the operator can fill them via tfvars and
populated `terraform.tfvars.example` with placeholders.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `static-files/docs/agent-iam-policy-example.json` — a ready-to-attach IAM policy for the nullplatform agent role (IRSA on EKS) covering the full scope lifecycle on AWS: - S3 (state bucket + per-scope asset bucket) lifecycle + bucket-policy management + object Get/Put/Delete - CloudFront distribution lifecycle and invalidations - Route 53 record management (ChangeResourceRecordSets, GetHostedZone, ListHostedZones, ListResourceRecordSets) - Route 53 `GetChange` on `change/*` — easy to miss, the AWS provider needs it for propagation polling. Without it, `start-initial` fails with `AccessDenied` *after* successfully creating the record. - ACM `DescribeCertificate`, `GetCertificate`, `ListCertificates`, `ListTagsForCertificate` — `GetCertificate` is required in addition to `DescribeCertificate`. - STS `GetCallerIdentity` Resources are set to `*` for simplicity; operators should scope them to specific buckets/zones/distributions once the first deployment succeeds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s sections The README was written for contributors extending the scope with new layer implementations, but had nothing for operators trying to *use* the scope on their own nullplatform account. Adds a new top-level section "Registering and Using the Scope" right after the architecture overview, with five subsections: - **Pre-requisites (AWS)** — the five resources the scope expects to find but does not create: state bucket, hosted zone, ACM certificate in us-east-1, app-assets bucket with CloudFront OAC policy, and the agent IAM role. The CloudFront OAC bucket policy is given inline. - **Registration (Terraform)** — step-by-step for copying the example in `specs/terraform/` into an infra repo and wiring up tfvars, plus a table of required inputs. - **Agent IAM permissions** — the full permissions matrix (links to the example policy added in the previous commit), calling out the two actions that are easy to miss: Route 53 `GetChange` and ACM `GetCertificate`. - **State management** — explains that each scope has its own OpenTofu state file in `aws_state_bucket`, with recommended bucket layouts for POC vs production. - **Gotchas** — three pitfalls documented: (1) `nullplatform_provider_config.type` expects a slug and silently fails with a UUID; (2) `scope_type.description` has a 60-character cap; (3) `provider_config.attributes` is validated on first deploy, not on create. Also adds the new subsections to the table of contents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ge in IRSA label The Agent Credentials (IRSA) label shown in the UI during scope provider- config setup listed IAM actions per service, but was missing two that the agent actually needs at deploy time: - `route53:GetChange` on `arn:aws:route53:::change/*` — the AWS provider polls this while waiting for DNS propagation. Without it, the deploy fails with `AccessDenied` *after* successfully creating the record, which is a confusing failure mode (the record shows up in the console but the deploy rolls back). - `acm:GetCertificate` — needed in addition to `DescribeCertificate` for the CloudFront distribution's certificate lookup. Without it the deploy fails with a cryptic 400 on `GetCertificate` even though the cert exists and is validated. Added both to the label text and pointed the operator at the new `static-files/docs/agent-iam-policy-example.json` file for a ready-to-use policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closed
3 tasks
Addresses review feedback on PR #8: 1. **Route 53 list actions on wrong Resource** — route53:ListHostedZones and route53:ListHostedZonesByName do not support resource-level permissions per AWS IAM docs. Scoping them to arn:aws:route53:::hostedzone/* silently denies both actions, and the AWS Terraform provider calls ListHostedZones during normal Route 53 data resolution, which would fail with AccessDenied. Split the statement: GetHostedZone / ChangeResourceRecordSets / ListResourceRecordSets stay on hostedzone/*; ListHostedZones / ListHostedZonesByName move to their own statement with Resource "*". 2. **Broken relative link in README** — the IAM example policy link was `[docs/agent-iam-policy-example.json](../docs/agent-iam-policy-example.json)`, which resolves to `<repo-root>/docs/` — a nonexistent path. Since the README lives at static-files/README.md and the policy file at static-files/docs/..., the correct relative path is `docs/agent-iam-policy-example.json` (no `../` prefix). 3. **README IAM matrix updated** to reflect the split: added a Resource column and broke Route 53 into three rows (record management on hostedzone/*, zone listing on *, change polling on change/*). 4. **scope-configuration.json.tpl IRSA label** now mentions the ListHostedZones/ListHostedZonesByName actions and the Resource "*" requirement explicitly, so operators building a hand-crafted policy from the UI don't miss them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… Azure/GCP The scope itself is multi-cloud (AWS + Azure in the schema today, GCP anticipated in the layer diagram), but the previous commits on this PR shipped operator documentation and a reference Terraform wiring that were AWS-only without saying so. That's misleading: someone arriving to install this scope on Azure would read "Pre-requisites" and see only AWS items, "Agent IAM permissions" and see only an AWS IAM policy, "Registration" and see an AWS-only example under `specs/terraform/` (no cloud marker in the path). This commit makes the AWS-specific nature of the operator guide explicit without pretending to cover clouds we did not actually install against. - **Move TF example into `specs/terraform/aws/`** — the three files stay unchanged; only the path moves. Adds a new `specs/terraform/README.md` explaining the layout and inviting PRs for a sibling `specs/terraform/azure/` once someone validates the scope end-to-end on Azure. - **Rename** `docs/agent-iam-policy-example.json` → `docs/agent-iam-policy-aws-example.json`. The filename now carries the cloud marker, mirroring how the `infrastructure/aws/iam/...` modules in the `nullplatform/tofu-modules` repo encode the cloud in the path. - **README preamble** added to the "Registering and Using the Scope" section explicitly stating the operator guide is AWS-only and pointing at the schema for the Azure fields the scope already supports. - **Restructured Pre-requisites** from a flat `### Pre-requisites (AWS)` heading into `### Pre-requisites` + `#### AWS` sub-heading. This leaves room for a sibling `#### Azure` sub-heading to be added without renaming / reshuffling once someone contributes the Azure guidance. - **Agent IAM permissions** section now names the AWS scoping explicitly and lists what the Azure equivalent would look like at a role- assignments level (pointers, not a full guide — the scope UI label in `scope-configuration.json.tpl` already covers Azure inline). - **Updated all internal links** to the new paths (TF example references in Gotchas, IAM policy link, `cp -r` command in the install walk-through). - **TOC** updated to match the new sub-heading structure. PR description will be updated separately to reflect that the scope-level fixes (completing `attributes`, fixing `type` to a slug, correcting the module `ref`) apply to all clouds, while the operator documentation targets AWS only until someone contributes Azure/GCP. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 3 IAM statements that were missing from the updated policy: - route53:GetChange on change/* — AWS provider polls this for DNS propagation; without it, deploy fails with AccessDenied AFTER creating the record (discovered in POC deploy #5) - route53:ListHostedZones + ListHostedZonesByName on * — these two actions don't support resource-level permissions; Resource must be * (the only unavoidable wildcard in the policy) - acm:GetCertificate on arn:aws:acm:us-east-1:YOUR_ACCOUNT_ID:certificate/* — provider calls both GetCertificate and DescribeCertificate; scoped to us-east-1 (CloudFront requirement) + account Also adds a placeholder replacement table to the README IAM section: YOUR_STATE_BUCKET, YOUR_ASSETS_BUCKET, YOUR_HOSTED_ZONE_ID, YOUR_ACCOUNT_ID with sources and examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fedemaleh
reviewed
Apr 17, 2026
fedemaleh
reviewed
Apr 17, 2026
fedemaleh
reviewed
Apr 17, 2026
…rovider configs Addresses review feedback on #8: - Reword AWS pre-requisite 1 so it no longer reads as "one bucket per scope" (the bucket is a single shared bucket with one state file per scope). - Refactor the AWS Terraform example to accept `provider_configs` as a `list(object(...))` and iterate it with `for_each`, so operators can register one `nullplatform_provider_config` per environment (or region) in a single apply. `aws_state_bucket` stays as a top-level variable because it is shared across every entry; `aws_region` and `aws_hosted_public_zone_id` move into each entry. - Use the entry `nrn` as the `for_each` key so adding or removing an entry does not reorder existing resources in state. - README: update the minimum-inputs table and add a "Registering multiple environments" subsection that shows the list shape and explains what varies vs. what does not. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
agustincelentano
approved these changes
Apr 20, 2026
The scope_definition module fetches spec templates via data.http from
raw.githubusercontent.com. For public repos that endpoint serves content
anonymously, so the Basic-auth prefix `${github_token}@` was a no-op.
Removed: github_token variable, its two URL interpolations, the
tfvars.example entry, and the README references (prereq + table row +
HCL snippet).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The README's IAM matrix already lists `ListTagsForCertificate` under the ACM actions, but the JSON policy example and the UI-label markdown in `scope-configuration.json.tpl` both list only `ListCertificates`, `DescribeCertificate`, and `GetCertificate` — drift between the docs. Discovered during the first real-world install on the Banco Galicia POC. The scope's `start-initial` workflow progressed cleanly past every prior gate (bucket policy read, CloudFront module copy, OpenTofu init, plan), then died at apply with: Error: listing tags for ACM Certificate (...): operation error ACM: ListTagsForCertificate, ...AccessDeniedException... is not authorized to perform: acm:ListTagsForCertificate Cause: the AWS Terraform provider refreshes tags on every `aws_acm_certificate` reference on every plan/apply — including via `data "aws_acm_certificate"` lookups in the scope's modules — even when the module doesn't declare tags itself. Without the action in the IAM role, the tag-refresh call fails and the whole apply rolls back. Adding `acm:ListTagsForCertificate` to the existing `ACMCertificateLookup` statement (same `Resource: "*"` scope — ACM supports resource-level permissions on `ListTagsForCertificate` but the other lookups in the same Sid are intentionally wide). Also updates the UI-label markdown in `scope-configuration.json.tpl` so the text shown to operators in the nullplatform UI matches the JSON template + README.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
During a real-world installation of this scope on AWS (Banco Galicia POC, end-to-end SPA deploy), the terraform example in
specs/terraform/did not work out of the box and several failure modes had no docs. This PR makes the example work and documents the gaps.All problems were found in order by trying to register the scope and roll a SPA through
start-initial; the deploy succeeded on the sixth attempt, each retry surfacing the next gap.Commits
fix(example)— completeprovider_config.attributeswith nestedprovider/network/distribution, switchtypefrom UUID to slug, pin modulereftomain(the previousfeature/remove-org-nrnref no longer exists innullplatform/tofu-modules).docs— addstatic-files/docs/agent-iam-policy-example.json, a ready-to-attach IAM policy for the agent role.docs(readme)— add a new top-level section "Registering and Using the Scope" (pre-requisites, registration walk-through, IAM matrix, state management, gotchas).fix(scope-configuration)— the IRSA label shown in the UI listed IAM actions per service but was missingroute53:GetChangeandacm:GetCertificate. Added both and pointed the operator at the new example policy.Silent failures documented
Three cases where the scope accepts input that looks valid but fails later in the workflow:
nullplatform_provider_config.typeaccepts anything but only resolves slugs at runtime — passing the UUID fails mid-apply withno specification found for slug: <UUID>.scope_type.descriptionis validated against a 60-char cap by the backend but not by the schema — custom descriptions over that length fail at create with a 400.provider_config.attributesis not validated againstscope-configuration.json.tplat create time. An incomplete config (e.g., missingnetworklayer) only surfaces when the first scope'sstart-initialrolls back with"network layer is not configured for provider 'aws'".Easy-to-miss IAM actions
Two IAM actions the agent needs that were not in the module's default policy nor documented anywhere:
route53:GetChangeonarn:aws:route53:::change/*— the AWS provider polls this while waiting for DNS propagation. Without it, the deploy fails withAccessDeniedafter creating the record. Confusing failure mode.acm:GetCertificate— required in addition toDescribeCertificatefor CloudFront certificate lookup. Without it the deploy fails with a 400 on an existing, validated cert.Pre-requisites that the scope validates but does not create
Documented in the README as pre-requisites:
us-east-1covering the scope's domainTest plan
tofu fmt -recursiveandtofu init -backend=falsework against the updatedspecs/terraform/.🤖 Generated with Claude Code