Skip to content

docs + fix: working terraform example, IAM policy, and silent-failure docs#8

Merged
jcastiarena merged 11 commits intomainfrom
fix/terraform-example-and-docs
Apr 21, 2026
Merged

docs + fix: working terraform example, IAM policy, and silent-failure docs#8
jcastiarena merged 11 commits intomainfrom
fix/terraform-example-and-docs

Conversation

@jcastiarena
Copy link
Copy Markdown
Contributor

Summary

During a real-world installation of this scope on AWS (Banco Galicia POC, end-to-end SPA deploy), the terraform example in specs/terraform/ did not work out of the box and several failure modes had no docs. This PR makes the example work and documents the gaps.

All problems were found in order by trying to register the scope and roll a SPA through start-initial; the deploy succeeded on the sixth attempt, each retry surfacing the next gap.

Commits

  1. fix(example) — complete provider_config.attributes with nested provider/network/distribution, switch type from UUID to slug, pin module ref to main (the previous feature/remove-org-nrn ref no longer exists in nullplatform/tofu-modules).
  2. docs — add static-files/docs/agent-iam-policy-example.json, a ready-to-attach IAM policy for the agent role.
  3. docs(readme) — add a new top-level section "Registering and Using the Scope" (pre-requisites, registration walk-through, IAM matrix, state management, gotchas).
  4. fix(scope-configuration) — the IRSA label shown in the UI listed IAM actions per service but was missing route53:GetChange and acm:GetCertificate. Added both and pointed the operator at the new example policy.

Silent failures documented

Three cases where the scope accepts input that looks valid but fails later in the workflow:

  • nullplatform_provider_config.type accepts anything but only resolves slugs at runtime — passing the UUID fails mid-apply with no specification found for slug: <UUID>.
  • scope_type.description is validated against a 60-char cap by the backend but not by the schema — custom descriptions over that length fail at create with a 400.
  • provider_config.attributes is not validated against scope-configuration.json.tpl at create time. An incomplete config (e.g., missing network layer) only surfaces when the first scope's start-initial rolls back with "network layer is not configured for provider 'aws'".

Easy-to-miss IAM actions

Two IAM actions the agent needs that were not in the module's default policy nor documented anywhere:

  • route53:GetChange on arn:aws:route53:::change/* — the AWS provider polls this while waiting for DNS propagation. Without it, the deploy fails with AccessDenied after creating the record. Confusing failure mode.
  • acm:GetCertificate — required in addition to DescribeCertificate for CloudFront certificate lookup. Without it the deploy fails with a 400 on an existing, validated cert.

Pre-requisites that the scope validates but does not create

Documented in the README as pre-requisites:

  • S3 state bucket
  • Route 53 hosted zone
  • ACM certificate in us-east-1 covering the scope's domain
  • App-assets S3 bucket with a CloudFront OAC bucket policy (policy JSON included inline in the README)
  • Agent IAM role (example policy added as a file)

Test plan

  • tofu fmt -recursive and tofu init -backend=false work against the updated specs/terraform/.
  • Apply the updated example against a clean nullplatform account and AWS account, verify the scope type registers and a test SPA deploys end-to-end.
  • Verify the new README sections render correctly on GitHub.
  • Verify the scope-configuration.json.tpl still parses (UI label renders the updated markdown).

🤖 Generated with Claude Code

jcastiarena and others added 4 commits April 15, 2026 17:55
The terraform example in `specs/terraform/` had three issues that prevented it
from working end-to-end:

1. `attributes = jsonencode({ region = "us-east-1" })` was missing the
   nested `provider` / `network` / `distribution` blocks that the scope
   workflow requires. The nullplatform API accepts the incomplete config
   at create time, but `start-initial` rolls back with
   `"network layer is not configured for provider 'aws'"` on the first
   deployment — surfacing the problem four or five steps after the
   mistake.

2. The `type` field was set to `provider_specification_id` (a UUID), but
   the API treats this as a slug. Apply failed with
   `"no specification found for slug: <UUID>"`. Switched to
   `provider_specification_slug`.

3. Both `scope_definition` modules were pinned to
   `ref=feature/remove-org-nrn`, which no longer exists in
   `nullplatform/tofu-modules` (it was merged into main). `tofu init`
   failed outright. Pinned both references to `main`.

Added three variables (`aws_region`, `aws_state_bucket`,
`aws_hosted_public_zone_id`) so the operator can fill them via tfvars and
populated `terraform.tfvars.example` with placeholders.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `static-files/docs/agent-iam-policy-example.json` — a ready-to-attach
IAM policy for the nullplatform agent role (IRSA on EKS) covering the full
scope lifecycle on AWS:

- S3 (state bucket + per-scope asset bucket) lifecycle + bucket-policy
  management + object Get/Put/Delete
- CloudFront distribution lifecycle and invalidations
- Route 53 record management (ChangeResourceRecordSets, GetHostedZone,
  ListHostedZones, ListResourceRecordSets)
- Route 53 `GetChange` on `change/*` — easy to miss, the AWS provider
  needs it for propagation polling. Without it, `start-initial` fails
  with `AccessDenied` *after* successfully creating the record.
- ACM `DescribeCertificate`, `GetCertificate`, `ListCertificates`,
  `ListTagsForCertificate` — `GetCertificate` is required in addition
  to `DescribeCertificate`.
- STS `GetCallerIdentity`

Resources are set to `*` for simplicity; operators should scope them to
specific buckets/zones/distributions once the first deployment succeeds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s sections

The README was written for contributors extending the scope with new layer
implementations, but had nothing for operators trying to *use* the scope
on their own nullplatform account. Adds a new top-level section
"Registering and Using the Scope" right after the architecture overview,
with five subsections:

- **Pre-requisites (AWS)** — the five resources the scope expects to find
  but does not create: state bucket, hosted zone, ACM certificate in
  us-east-1, app-assets bucket with CloudFront OAC policy, and the agent
  IAM role. The CloudFront OAC bucket policy is given inline.
- **Registration (Terraform)** — step-by-step for copying the example in
  `specs/terraform/` into an infra repo and wiring up tfvars, plus a
  table of required inputs.
- **Agent IAM permissions** — the full permissions matrix (links to the
  example policy added in the previous commit), calling out the two
  actions that are easy to miss: Route 53 `GetChange` and ACM
  `GetCertificate`.
- **State management** — explains that each scope has its own OpenTofu
  state file in `aws_state_bucket`, with recommended bucket layouts for
  POC vs production.
- **Gotchas** — three pitfalls documented: (1) `nullplatform_provider_config.type`
  expects a slug and silently fails with a UUID; (2) `scope_type.description`
  has a 60-character cap; (3) `provider_config.attributes` is validated
  on first deploy, not on create.

Also adds the new subsections to the table of contents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ge in IRSA label

The Agent Credentials (IRSA) label shown in the UI during scope provider-
config setup listed IAM actions per service, but was missing two that the
agent actually needs at deploy time:

- `route53:GetChange` on `arn:aws:route53:::change/*` — the AWS provider
  polls this while waiting for DNS propagation. Without it, the deploy
  fails with `AccessDenied` *after* successfully creating the record,
  which is a confusing failure mode (the record shows up in the console
  but the deploy rolls back).
- `acm:GetCertificate` — needed in addition to `DescribeCertificate`
  for the CloudFront distribution's certificate lookup. Without it the
  deploy fails with a cryptic 400 on `GetCertificate` even though the
  cert exists and is validated.

Added both to the label text and pointed the operator at the new
`static-files/docs/agent-iam-policy-example.json` file for a ready-to-use
policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jcastiarena and others added 4 commits April 15, 2026 18:08
Addresses review feedback on PR #8:

1. **Route 53 list actions on wrong Resource** — route53:ListHostedZones
   and route53:ListHostedZonesByName do not support resource-level
   permissions per AWS IAM docs. Scoping them to
   arn:aws:route53:::hostedzone/* silently denies both actions, and the
   AWS Terraform provider calls ListHostedZones during normal Route 53
   data resolution, which would fail with AccessDenied. Split the
   statement: GetHostedZone / ChangeResourceRecordSets /
   ListResourceRecordSets stay on hostedzone/*; ListHostedZones /
   ListHostedZonesByName move to their own statement with Resource "*".

2. **Broken relative link in README** — the IAM example policy link was
   `[docs/agent-iam-policy-example.json](../docs/agent-iam-policy-example.json)`,
   which resolves to `<repo-root>/docs/` — a nonexistent path. Since
   the README lives at static-files/README.md and the policy file at
   static-files/docs/..., the correct relative path is
   `docs/agent-iam-policy-example.json` (no `../` prefix).

3. **README IAM matrix updated** to reflect the split: added a Resource
   column and broke Route 53 into three rows (record management on
   hostedzone/*, zone listing on *, change polling on change/*).

4. **scope-configuration.json.tpl IRSA label** now mentions the
   ListHostedZones/ListHostedZonesByName actions and the Resource "*"
   requirement explicitly, so operators building a hand-crafted policy
   from the UI don't miss them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… Azure/GCP

The scope itself is multi-cloud (AWS + Azure in the schema today, GCP
anticipated in the layer diagram), but the previous commits on this PR
shipped operator documentation and a reference Terraform wiring that were
AWS-only without saying so. That's misleading: someone arriving to install
this scope on Azure would read "Pre-requisites" and see only AWS items,
"Agent IAM permissions" and see only an AWS IAM policy, "Registration"
and see an AWS-only example under `specs/terraform/` (no cloud marker in
the path).

This commit makes the AWS-specific nature of the operator guide explicit
without pretending to cover clouds we did not actually install against.

- **Move TF example into `specs/terraform/aws/`** — the three files stay
  unchanged; only the path moves. Adds a new `specs/terraform/README.md`
  explaining the layout and inviting PRs for a sibling `specs/terraform/azure/`
  once someone validates the scope end-to-end on Azure.
- **Rename** `docs/agent-iam-policy-example.json` →
  `docs/agent-iam-policy-aws-example.json`. The filename now carries the
  cloud marker, mirroring how the `infrastructure/aws/iam/...` modules in
  the `nullplatform/tofu-modules` repo encode the cloud in the path.
- **README preamble** added to the "Registering and Using the Scope"
  section explicitly stating the operator guide is AWS-only and pointing
  at the schema for the Azure fields the scope already supports.
- **Restructured Pre-requisites** from a flat `### Pre-requisites (AWS)`
  heading into `### Pre-requisites` + `#### AWS` sub-heading. This leaves
  room for a sibling `#### Azure` sub-heading to be added without
  renaming / reshuffling once someone contributes the Azure guidance.
- **Agent IAM permissions** section now names the AWS scoping explicitly
  and lists what the Azure equivalent would look like at a role-
  assignments level (pointers, not a full guide — the scope UI label in
  `scope-configuration.json.tpl` already covers Azure inline).
- **Updated all internal links** to the new paths (TF example references
  in Gotchas, IAM policy link, `cp -r` command in the install walk-through).
- **TOC** updated to match the new sub-heading structure.

PR description will be updated separately to reflect that the scope-level
fixes (completing `attributes`, fixing `type` to a slug, correcting the
module `ref`) apply to all clouds, while the operator documentation
targets AWS only until someone contributes Azure/GCP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 3 IAM statements that were missing from the updated policy:

- route53:GetChange on change/* — AWS provider polls this for DNS
  propagation; without it, deploy fails with AccessDenied AFTER creating
  the record (discovered in POC deploy #5)
- route53:ListHostedZones + ListHostedZonesByName on * — these two
  actions don't support resource-level permissions; Resource must be *
  (the only unavoidable wildcard in the policy)
- acm:GetCertificate on arn:aws:acm:us-east-1:YOUR_ACCOUNT_ID:certificate/*
  — provider calls both GetCertificate and DescribeCertificate; scoped
  to us-east-1 (CloudFront requirement) + account

Also adds a placeholder replacement table to the README IAM section:
YOUR_STATE_BUCKET, YOUR_ASSETS_BUCKET, YOUR_HOSTED_ZONE_ID, YOUR_ACCOUNT_ID
with sources and examples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread static-files/README.md Outdated
Comment thread static-files/README.md Outdated
Comment thread static-files/README.md
…rovider configs

Addresses review feedback on #8:

- Reword AWS pre-requisite 1 so it no longer reads as "one bucket per scope"
  (the bucket is a single shared bucket with one state file per scope).
- Refactor the AWS Terraform example to accept `provider_configs` as a
  `list(object(...))` and iterate it with `for_each`, so operators can
  register one `nullplatform_provider_config` per environment (or region)
  in a single apply. `aws_state_bucket` stays as a top-level variable
  because it is shared across every entry; `aws_region` and
  `aws_hosted_public_zone_id` move into each entry.
- Use the entry `nrn` as the `for_each` key so adding or removing an entry
  does not reorder existing resources in state.
- README: update the minimum-inputs table and add a "Registering multiple
  environments" subsection that shows the list shape and explains what
  varies vs. what does not.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jcastiarena and others added 2 commits April 20, 2026 15:02
The scope_definition module fetches spec templates via data.http from
raw.githubusercontent.com. For public repos that endpoint serves content
anonymously, so the Basic-auth prefix `${github_token}@` was a no-op.

Removed: github_token variable, its two URL interpolations, the
tfvars.example entry, and the README references (prereq + table row +
HCL snippet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The README's IAM matrix already lists `ListTagsForCertificate` under
the ACM actions, but the JSON policy example and the UI-label markdown
in `scope-configuration.json.tpl` both list only
`ListCertificates`, `DescribeCertificate`, and `GetCertificate` —
drift between the docs.

Discovered during the first real-world install on the Banco Galicia
POC. The scope's `start-initial` workflow progressed cleanly past
every prior gate (bucket policy read, CloudFront module copy,
OpenTofu init, plan), then died at apply with:

  Error: listing tags for ACM Certificate (...): operation error
  ACM: ListTagsForCertificate, ...AccessDeniedException... is not
  authorized to perform: acm:ListTagsForCertificate

Cause: the AWS Terraform provider refreshes tags on every
`aws_acm_certificate` reference on every plan/apply — including via
`data "aws_acm_certificate"` lookups in the scope's modules — even
when the module doesn't declare tags itself. Without the action in
the IAM role, the tag-refresh call fails and the whole apply rolls
back.

Adding `acm:ListTagsForCertificate` to the existing
`ACMCertificateLookup` statement (same `Resource: "*"` scope — ACM
supports resource-level permissions on `ListTagsForCertificate` but
the other lookups in the same Sid are intentionally wide).

Also updates the UI-label markdown in
`scope-configuration.json.tpl` so the text shown to operators in the
nullplatform UI matches the JSON template + README.
@jcastiarena jcastiarena merged commit b8425bc into main Apr 21, 2026
@jcastiarena jcastiarena deleted the fix/terraform-example-and-docs branch April 21, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants