Skip to content

Self-service private-link configuration#2944

Merged
jshearer merged 3 commits into
masterfrom
jshearer/private_links_gql
Jun 16, 2026
Merged

Self-service private-link configuration#2944
jshearer merged 3 commits into
masterfrom
jshearer/private_links_gql

Conversation

@jshearer

@jshearer jshearer commented May 14, 2026

Copy link
Copy Markdown
Contributor

This PR exposes the data_planes.private_links configuration through the GraphQL API. Today private data-plane customers file a support request and we edit the column by hand.

dataPlanes query: private-networking fields

  • privateLinks returns the configured endpoints as a typed union (AWSPrivateLink | AzurePrivateLink | GCPPrivateServiceConnect).
  • awsLinkEndpoints / azureLinkEndpoints / gcpPscEndpoints expose the provisioning results the data-plane controller writes back, as opaque JSON.
  • All four fields are gated on the ViewDataPlanePrivateNetworking capability and fail closed to an empty list.

updateDataPlanePrivateLinks mutation

  • Takes [PrivateLinkInput!]!, a oneof input mirroring the union, and replaces the entire private_links array on a private data plane; partial updates are intentionally not supported. The DPC converges to the new configuration on its next poll.
  • Requires ModifyDataPlanePrivateNetworking on the data-plane name, after a structural check that the name is under ops/dp/private/<tenant>/.

Authorization model

  • Two new capability bits: ViewDataPlanePrivateNetworking and ModifyDataPlanePrivateNetworking.
  • The Viewer bundle (and therefore legacy read) carries the View bit: read on a data-plane prefix already conveys deploy-level trust, so viewing the plane's networking configuration comes with it.
  • Modify reaches a tenant admin through the conjunction of two grants: the ops/dp/private/<tenant>/ role_grant must carry the new ManageDataPlane bundle (create_data_plane installs it at provisioning time; a migration backfills existing private data planes), and the user's own bits at the tenant must include the bit, which today means admin since Admin folds ManageDataPlane in. Bits intersect along the delegation chain, so neither grant alone conveys Modify.
  • To grant ModifyDataPlanePrivateNetworking to a non-admin, create a user_grant on the tenant carrying {manage_data_plane, delegate} (Editor already carries Delegate, so editors need only manage_data_plane).

Shared models

  • The PrivateLink types moved from data-plane-controller::shared::stack to models, with feature-gated async-graphql derives so one set of structs backs the DB column, the GraphQL schema, and the DPC (which re-exports them).
  • Azure's dns_name / resource_type are now Option<String>. The previous shape was String with skip_serializing_if = "String::is_empty", so "" and absent were already indistinguishable on the wire: both serialized as a missing field. The new deserializer normalizes an incoming "" to None, and serialization stays field-absent, so the stored bytes and the stack config est-dry-dock reads are identical for every historical row shape (absent, "", or a value); a round-trip test in models::private_links pins this.
  • est-dry-dock cannot observe the difference either: its Pydantic models default these fields to "" and use truthiness checks. The visible effect is confined to GraphQL, where the fields are nullable and rows that stored "" surface as null.

Reviewer notes

Trade-offs discussed during review, recorded so they don't need to be re-litigated:

  • The four private-networking resolvers are deliberately not deduplicated further. The non-trivial shared piece, the capability check, is the authorized() helper in graphql/mod.rs; what remains per resolver is the gate call plus a clone-and-wrap body. A gated_endpoints helper bundling the gate and the JSON mapping was tried and dropped as trivial-body indirection under a vague name. Keeping the gate visible inside each resolver keeps the authorization audit local; the cost is that a change to the denial behavior must be edited in four places.
  • Azure Option<String> was chosen over two alternatives: keeping String (zero glue, but the "" sentinel becomes part of the public GraphQL contract as String!), and mapping "" to null at the GraphQL layer only (DPC untouched, but it requires an Azure-only split of input and output types while the model keeps the sentinel). Normalizing in models costs a small serde adapter plus a round-trip test, and makes the nullable contract hold everywhere the type is used.
  • ViewDataPlanePrivateNetworking is kept as its own bit even though read currently implies it via Viewer. The rejected alternative was to drop the bit and let DP-node visibility be the gate, which would have deleted the field gates entirely. Keeping the bit means each field gate names its capability explicitly, and the View / Modify split survives future changes to what read implies.

Deployment note

Migrations are applied manually; the sequencing matters:

  • Apply the enum migration before deploying: the new create_data_plane binds the manage_data_plane value, so provisioning a private data plane on the new code errors if the value is missing. (Two migration files because alter type add value cannot be referenced in the same transaction.)
  • Run the backfill only after the rollout completes: binaries built before this PR decode bundles strictly in the snapshot loader, so any row carrying manage_data_plane breaks their role_grants fetch. For the same reason, do not provision a private data plane mid-rollout while old pods are live.
  • In the window between deploy and backfill, tenant admins of existing private data planes can view private links but not modify them. The backfill is idempotent and skips rows that already carry the bundle, so it is safe to run any time after the rollout, and planes provisioned on the new code are unaffected by it.

Fixes #2773

@jshearer jshearer force-pushed the jshearer/private_links_gql branch from d41b7a5 to e9736cb Compare May 27, 2026 21:14
@jshearer jshearer changed the base branch from master to greg/authz/1 May 27, 2026 21:14
@jshearer jshearer force-pushed the jshearer/private_links_gql branch from 4a99da9 to 6ab0520 Compare May 27, 2026 21:50
@jshearer jshearer changed the base branch from greg/authz/1 to master May 28, 2026 18:38
@jshearer jshearer force-pushed the jshearer/private_links_gql branch from 6ab0520 to 55164df Compare May 28, 2026 18:41
@jshearer jshearer force-pushed the jshearer/private_links_gql branch from 6ec128a to 5bf9a1f Compare June 10, 2026 21:25
@jshearer jshearer changed the title Privatelinks GraphQL Self-service private-link configuration Jun 10, 2026
@jshearer jshearer marked this pull request as ready for review June 10, 2026 22:22
@jshearer jshearer requested a review from GregorShear June 10, 2026 22:22
Comment thread crates/control-plane-api/src/server/public/graphql/data_planes.rs
Comment thread crates/control-plane-api/src/server/public/graphql/data_planes.rs Outdated
Comment thread crates/control-plane-api/src/server/public/graphql/data_planes.rs
Comment thread crates/control-plane-api/src/server/public/graphql/invite_links.rs Outdated
Comment thread crates/control-plane-api/src/server/public/graphql/mod.rs Outdated
Comment thread crates/models/src/private_links.rs
@jshearer jshearer force-pushed the jshearer/private_links_gql branch from 5bf9a1f to 6e4b40c Compare June 15, 2026 20:01
…async-graphql derives

The GraphQL API and the data-plane controller need to speak the same `PrivateLink` shape: DPC reads the `private_links json[]` column off `data_planes` and ships it into the est-dry-dock Pulumi stack, and the GraphQL surface needs both typed read and a typed input for the mutation that replaces the column. Defining the types once in `models` is the only way to share them without making DPC depend on the GraphQL crate.

* Move `PrivateLink`, `AWSPrivateLink`, `AzurePrivateLink`, `GCPPrivateServiceConnect` to `models::private_links`, re-export from `data-plane-controller::shared::stack` so existing DPC callers keep compiling unchanged.
* Add async-graphql derives under the existing `async-graphql` feature: `Union` + `OneofObject` on `PrivateLink`, `SimpleObject` + `InputObject` on each provider struct. Same Rust enum drives `union PrivateLink` on output and `input PrivateLinkInput @oneOf` on input, so the two GraphQL surfaces cannot drift.
* Azure `dns_name` and `resource_type` flip from `String`-with-empty-as-sentinel to `Option<String>` with a serde adapter that normalizes both an absent field and an empty string to `None` on read and omits both `None` and `Some("")` on write. Wire format byte-for-byte unchanged; the GraphQL surface exposes them as nullable instead of requiring clients to know the empty-string convention.
* AWS `service_region: Option<String>` carries the master-side change (commit b1d51ed) into the moved type, with `skip_serializing_if = "Option::is_none"` so the column stays absent (not `null`) when unset, preserving est-dry-dock's existing fallback to `region`.
@jshearer jshearer force-pushed the jshearer/private_links_gql branch from 6e4b40c to 9c34298 Compare June 15, 2026 20:38
jshearer added 2 commits June 15, 2026 19:20
…d `updateDataPlanePrivateLinks` mutation

Exposes the configured private-link endpoints on the `dataPlanes` query as a typed union against `models::PrivateLink`, alongside the three provisioning-result endpoint columns (`awsLinkEndpoints`, `azureLinkEndpoints`, `gcpPscEndpoints`) as opaque JSON arrays. Adds a mutation that replaces the entire `private_links` column on a private data plane; the data-plane controller picks up the new configuration on its next poll and converges via Pulumi.

* `dataPlanes` selects the raw `private_links` JSON once per page and parses each entry lazily in a `ComplexObject` resolver, so a malformed historical row produces a descriptive field-level error naming the data plane and the failing index rather than breaking the whole query selection.
* `updateDataPlanePrivateLinks(dataPlaneName, privateLinks)` accepts `[PrivateLinkInput!]!` directly against the same `models::PrivateLink` types via async-graphql's `@oneOf` input; supplying multiple branches in a single element is rejected by GraphQL validation before the handler runs.
* Name gating is a structural check: the name must sit under `ops/dp/private/<tenant>/...`. Anything more specific (cluster suffix shape, owning prefix shape) is the data plane's problem; an unknown name falls out as "not found" when the UPDATE matches zero rows. Per-provider validation is intentionally not duplicated here: the authoritative schema is the est-dry-dock Pydantic model on the consumer side, and a bad shape surfaces via DPC convergence.
* Interim authorization requires `read` on the private data-plane name, mirroring the existing deployment mutation shape. This will move to `manage_dataplane` once the orthogonal capability model lands.
* Returns `Boolean!` so the mutation surface is minimal; callers that need the post-write state re-query `dataPlanes`.
Adds two capability bits, `ViewDataPlanePrivateNetworking` and `ModifyDataPlanePrivateNetworking`, composed by a new `ManageDataPlane` bundle. The bundle is granted on the role-grant edge `<tenant>/ -> ops/dp/private/<tenant>/` at provisioning time, so tenant admins (whose `Admin` bundle composes `ManageDataPlane`) inherit both bits on their private data planes; non-admin tenant grantees intersect to their own bundle at the edge and never reach these bits.

* `private_links` resolver on `DataPlane` returns an empty list to callers that lack `ViewDataPlanePrivateNetworking` on the data plane name.
* `updateDataPlanePrivateLinks` mutation requires `ModifyDataPlanePrivateNetworking` (replacing the interim legacy `read` check).
* `evaluate_names_authorization` accepts any `Into<CapabilitySet> + Display + Copy`, letting call sites pass either legacy `models::Capability` or a single `models::authz::Capability` bit; legacy callers keep working unchanged via the `From<legacy>` impl.
* `models::authz::Capability` gets a `Display` impl that delegates to `Debug`, so error messages reference each bit by its Rust identifier.
* `create_data_plane.rs` stamps `bundles = '{manage_data_plane}'` on the new role grant. The accompanying two-step migration adds the enum value and backfills existing rows -- `alter type add value` cannot be used in the same transaction that subsequently references the new value.
@jshearer jshearer force-pushed the jshearer/private_links_gql branch from 9c34298 to 4095dbd Compare June 15, 2026 23:21
@jshearer jshearer requested a review from GregorShear June 15, 2026 23:37
GregorShear added a commit that referenced this pull request Jun 16, 2026
…lities

The service-account management surface was asymmetric: listing authorized on
the fine-grained `ManageServiceAccounts` capability while every mutation
required the full `Admin` bundle. That produced a class of principal (a
`TeamAdmin` without `Admin`) who could see service accounts but manage none of
them, and it meant the named capability the feature defines was never the
actual gate for any write.

Make the surface track the capabilities it defines:

- Anchor checks (create/add-grant/remove-grant/create-key/revoke-key, all on
  the account's `catalog_name`) now require `ManageServiceAccounts`, matching
  the listing query. A `TeamAdmin` can fully manage accounts and their keys
  without holding `Admin`.
- The per-grant prefix check now requires `CreateGrant` on each granted prefix
  rather than `Admin`. This is the anti-escalation guard — a caller still
  can't hand a service account reach they couldn't grant anyone — but it keys
  off the capability that authorizes granting, not the whole Admin bundle.
  Human-user grant creation still lives in PostgREST; when it migrates to
  GraphQL it should adopt this same `CreateGrant` gate.

To express fine-grained capabilities at these call sites, generalize
`evaluate_names_authorization` and `verify_authorization` to accept anything
that converts into a `CapabilitySet` (legacy `models::Capability`, a single
`models::authz::Capability` bit, or an explicit set). The BFS primitive
already operated on `CapabilitySet`, so this only lifts the wrapper signatures;
existing legacy-capability callers are unaffected. Add a `Display` impl for
`models::authz::Capability` so denial messages render the required capability.
These three changes mirror the same generalization in #2944 so the two
branches converge cleanly on whichever merges second.

Add `test_team_admin_manages_without_full_admin`, which seeds a caller holding
only the `team_admin` bundle (no `Admin`) and asserts both the positive path
(manage accounts, mint keys, grant prefixes within reach) and the
anti-escalation boundary (cannot grant a prefix they lack `CreateGrant` on).

Also correct the migration comments, which claimed API keys are never
exchanged for a JWT — the token-exchange endpoint mints a short-lived access
token from a key.
@jshearer jshearer merged commit e63dfe3 into master Jun 16, 2026
15 of 17 checks passed
GregorShear added a commit that referenced this pull request Jun 17, 2026
…lities

The service-account management surface was asymmetric: listing authorized on
the fine-grained `ManageServiceAccounts` capability while every mutation
required the full `Admin` bundle. That produced a class of principal (a
`TeamAdmin` without `Admin`) who could see service accounts but manage none of
them, and it meant the named capability the feature defines was never the
actual gate for any write.

Make the surface track the capabilities it defines:

- Anchor checks (create/add-grant/remove-grant/create-key/revoke-key, all on
  the account's `catalog_name`) now require `ManageServiceAccounts`, matching
  the listing query. A `TeamAdmin` can fully manage accounts and their keys
  without holding `Admin`.
- The per-grant prefix check now requires `CreateGrant` on each granted prefix
  rather than `Admin`. This is the anti-escalation guard — a caller still
  can't hand a service account reach they couldn't grant anyone — but it keys
  off the capability that authorizes granting, not the whole Admin bundle.
  Human-user grant creation still lives in PostgREST; when it migrates to
  GraphQL it should adopt this same `CreateGrant` gate.

To express fine-grained capabilities at these call sites, generalize
`evaluate_names_authorization` and `verify_authorization` to accept anything
that converts into a `CapabilitySet` (legacy `models::Capability`, a single
`models::authz::Capability` bit, or an explicit set). The BFS primitive
already operated on `CapabilitySet`, so this only lifts the wrapper signatures;
existing legacy-capability callers are unaffected. Add a `Display` impl for
`models::authz::Capability` so denial messages render the required capability.
These three changes mirror the same generalization in #2944 so the two
branches converge cleanly on whichever merges second.

Add `test_team_admin_manages_without_full_admin`, which seeds a caller holding
only the `team_admin` bundle (no `Admin`) and asserts both the positive path
(manage accounts, mint keys, grant prefixes within reach) and the
anti-escalation boundary (cannot grant a prefix they lack `CreateGrant` on).

Also correct the migration comments, which claimed API keys are never
exchanged for a JWT — the token-exchange endpoint mints a short-lived access
token from a key.
GregorShear added a commit that referenced this pull request Jun 17, 2026
…lities

The service-account management surface was asymmetric: listing authorized on
the fine-grained `ManageServiceAccounts` capability while every mutation
required the full `Admin` bundle. That produced a class of principal (a
`TeamAdmin` without `Admin`) who could see service accounts but manage none of
them, and it meant the named capability the feature defines was never the
actual gate for any write.

Make the surface track the capabilities it defines:

- Anchor checks (create/add-grant/remove-grant/create-key/revoke-key, all on
  the account's `catalog_name`) now require `ManageServiceAccounts`, matching
  the listing query. A `TeamAdmin` can fully manage accounts and their keys
  without holding `Admin`.
- The per-grant prefix check now requires `CreateGrant` on each granted prefix
  rather than `Admin`. This is the anti-escalation guard — a caller still
  can't hand a service account reach they couldn't grant anyone — but it keys
  off the capability that authorizes granting, not the whole Admin bundle.
  Human-user grant creation still lives in PostgREST; when it migrates to
  GraphQL it should adopt this same `CreateGrant` gate.

To express fine-grained capabilities at these call sites, generalize
`evaluate_names_authorization` and `verify_authorization` to accept anything
that converts into a `CapabilitySet` (legacy `models::Capability`, a single
`models::authz::Capability` bit, or an explicit set). The BFS primitive
already operated on `CapabilitySet`, so this only lifts the wrapper signatures;
existing legacy-capability callers are unaffected. Add a `Display` impl for
`models::authz::Capability` so denial messages render the required capability.
These three changes mirror the same generalization in #2944 so the two
branches converge cleanly on whichever merges second.

Add `test_team_admin_manages_without_full_admin`, which seeds a caller holding
only the `team_admin` bundle (no `Admin`) and asserts both the positive path
(manage accounts, mint keys, grant prefixes within reach) and the
anti-escalation boundary (cannot grant a prefix they lack `CreateGrant` on).

Also correct the migration comments, which claimed API keys are never
exchanged for a JWT — the token-exchange endpoint mints a short-lived access
token from a key.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow customers to create privatelinks

2 participants