feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248
Open
dm36 wants to merge 2 commits into
Open
feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248dm36 wants to merge 2 commits into
dm36 wants to merge 2 commits into
Conversation
…AGENT_API_KEYS_DUAL_WRITE flag Mirrors the AGX1-274 task dual-write pattern (PR #246) for agent_api_keys. - Adds creator_user_id / creator_service_account_id / spark_authz_zedtoken columns to agent_api_keys, with CHECK constraint and concurrent indexes. - On create, when FGAC_AGENT_API_KEYS_DUAL_WRITE is enabled for the caller's account, calls authorization_service.grant(AgentexResource.api_key(id)) BEFORE the Postgres write. Grant failure aborts the create. - On delete, best-effort revoke after the Postgres delete. Failures are logged but do not block the delete. - Adds AgentexResourceType.api_key and AgentexResource.api_key(...) factory. - Creates src/utils/feature_flags.py with both FGAC_TASKS_DUAL_WRITE and FGAC_AGENT_API_KEYS_DUAL_WRITE (file does not exist on main yet; if PR #246 lands first this becomes a rebase concern). Structural divergence from tasks: agent_api_keys have no service layer, so the dual-write logic lives in AgentAPIKeysUseCase rather than a separate service. This keeps the call site simple and avoids inventing a new layer. Route layer (read-side auth checks) is out of scope; that's PR B (AGX1-273). agentex-auth spark_mapping.py update is a sibling-repo concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment on lines
+237
to
+246
| existing = await self.agent_api_key_repo.get_by_agent_id_and_name( | ||
| agent_id=agent_id, name=key_name, api_key_type=api_key_type | ||
| ) | ||
| await self.agent_api_key_repo.delete_by_agent_id_and_key_name( | ||
| agent_id=agent_id, key_name=key_name, api_key_type=api_key_type | ||
| ) | ||
| if existing is not None: | ||
| await self._deregister_api_key_from_spark_authz( | ||
| api_key_id=existing.id, account_id=account_id | ||
| ) |
There was a problem hiding this comment.
Unconditional pre-fetch adds a DB round-trip on every delete-by-name regardless of flag state
get_by_agent_id_and_name is called before the delete on every invocation, even when account_id is None or the FGAC_AGENT_API_KEYS_DUAL_WRITE flag is off for this account. _deregister_api_key_from_spark_authz will return immediately in those cases, making the existing fetch pure overhead. The same applies to delete_by_agent_name_and_key_name. Gate the pre-fetch behind a flag check to avoid the extra query on every delete-by-name in non-FGAC accounts.
Prompt To Fix With AI
This is a comment left during a code review.
Path: agentex/src/domain/use_cases/agent_api_keys_use_case.py
Line: 237-246
Comment:
**Unconditional pre-fetch adds a DB round-trip on every delete-by-name regardless of flag state**
`get_by_agent_id_and_name` is called before the delete on every invocation, even when `account_id` is `None` or the `FGAC_AGENT_API_KEYS_DUAL_WRITE` flag is off for this account. `_deregister_api_key_from_spark_authz` will return immediately in those cases, making the `existing` fetch pure overhead. The same applies to `delete_by_agent_name_and_key_name`. Gate the pre-fetch behind a flag check to avoid the extra query on every delete-by-name in non-FGAC accounts.
How can I resolve this? If you propose a fix, please make it concise.
4 tasks
Closes the parent_agent cascade gap surfaced on scaleapi/agentex#354. The api_key dual-write (AGX1-272, PR #248) currently calls grant() which writes the owner edge in SpiceDB but NOT the parent_agent edge. The agent_api_key schema requires `read = ... & parent_agent->read & ...`, so every downstream read/update fails closed without that edge. This PR adds register_resource/deregister_resource (Port + adapter + service) and swaps the api_keys use case from grant→register_resource with parent=AgentexResource.agent(agent_id). Now the owner edge and parent_agent edge are written atomically. Stack: - scaleapi/scaleapi#144657 — sgp-authz 0.7.0 (parent_resource kwarg). - scaleapi/agentex#355 — agentex-auth Port + adapter + HTTP routes. - #248 — original AGX1-272 dual-write (this stacks on it). - THIS PR — extends #248 to use the parent-aware path. Changes: - Port: abstract register_resource(resource, parent=None) and deregister_resource(resource). - Adapter proxy: POST /v1/authz/register and /v1/authz/deregister. - Service: mirror existing grant/revoke pattern (principal_context override, _bypass support, parent in log line for cascade debugging). - Use case: swap grant→register_resource passing parent=agent; swap revoke→deregister_resource. except Exception wrappers preserved (fail-closed on register, best-effort on deregister). - Tests: rename mocks to register_resource/deregister_resource; assert the parent edge is passed correctly. Test plan: - pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py → 8 / 8 pass. - New test ``test_create_api_key_calls_grant_when_flag_on`` asserts parent.type == AgentexResourceType.agent and parent.selector == agent.id. Other resource types' grant→register_resource swap is out of scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mirrors the AGX1-274 task dual-write pattern (#246) for
agent_api_keys, with theparent_agentcascade properly wired end-to-end. When the per-accountFGAC_AGENT_API_KEYS_DUAL_WRITEflag is on, everyagent_api_keycreate callsregister_resource(api_key, parent=agent)(writing both the owner edge AND theparent_agentedge atomically in SpiceDB) BEFORE the Postgres write; every delete best-effort callsderegister_resourceafter.PR A of two — PR B (AGX1-273, #252) migrates the api_key routes to authoritative auth checks against Spark.
Consolidation note
This PR was originally split into two: the dual-write (#248 itself, using
grant()) and a follow-up parent-edge fix (formerly #253, usingregister_resource(parent=...)). Reviewer surfaced thatgrant()alone doesn't write theparent_agentedge — and without it, the SpiceDB schema'sread = ... & parent_agent->read & ...cascade fails closed for every reader, even the owner.Folded #253's commit into this branch via cherry-pick. PR #253 has been closed. The use case now calls
register_resource(parent=AgentexResource.agent(agent_id))instead ofgrant(), which writes both edges in one round-trip.Cross-repo stack
parent_resourcekwarg onregister_resource+ cross-tenant validationapi_keymapping; pin to 0.7.0register_resource(parent=agent)Stack must merge in order: #1 → #2 → this.
Changes
Dual-write infrastructure (commit
7e4d50e)agent_api_keys: addcreator_user_id,creator_service_account_id,spark_authz_zedtoken. CHECK constraint +CREATE INDEX CONCURRENTLYper Asher's task migration.AgentAPIKeyORMandAgentAPIKeyEntity.AgentexResourceType.api_key,AgentexResource.api_key(...),AgentexResourceOptionalSelector.api_key(...).src/utils/feature_flags.py(NEW):FeatureFlagProviderenv-var allowlist; flagsFGAC_TASKS,FGAC_TASKS_DUAL_WRITE,FGAC_AGENT_API_KEYS_DUAL_WRITE. Matches Asher's feat(AGX1-274): record task creator identity and FGAC migration safety #246 shape. If feat(AGX1-274): record task creator identity and FGAC migration safety #246 lands first this becomes a rebase concern.AgentAPIKeysUseCase: acceptsauthorization_service+feature_flags; threadsaccount_idthroughcreate/delete*; best-effort revoke-after-delete that logs but doesn't block;creator_user_id/creator_service_account_idread fromprincipal_context.create_api_key,delete_agent_api_key,delete_agent_api_key_by_nameresolveaccount_idfromauthorization_service.principal_contextand pass it down. No read-side auth checks (PR B scope).tests/integration/services/test_agent_api_key_service_dual_write.pymirrorstest_task_service_dual_write.pywith 8 cases.Parent-edge work (commit
9668f1a, folded in from #253)src/adapters/authorization/port.py— abstractregister_resource(resource, parent=None)andderegister_resource(resource)onAuthorizationGateway.src/adapters/authorization/adapter_agentex_authz_proxy.py— POST/v1/authz/registerand/v1/authz/deregisterto agentex-auth (these endpoints are added in #354).src/domain/services/authorization_service.py—register_resource/deregister_resourceservice methods mirror the grant/revoke pattern (principal_contextoverride,_bypasssupport, parent identity in log line for cascade debugging).AgentAPIKeysUseCase:grant → register_resourcein_register_api_key_in_spark_authz, passingparent=AgentexResource.agent(agent_id). The parent edge is load-bearing.revoke → deregister_resourcein_deregister_api_key_from_spark_authz. Atomically removes the resource + all of its relationships.except Exceptionwrappers: fail-closed on register, best-effort on deregister.grant/revoke→register_resource/deregister_resource); add explicit assertion thatparent=AgentexResource.agent(agent.id)is passed correctly on every create (the contract that prevents silently dropping the parent edge in future changes).Structural divergence from #246
agent_api_keyshave no service layer, so the dual-write logic lives inAgentAPIKeysUseCaserather than a parallelAgentAPIKeyService. Keeps the call sites simple — open to refactoring to a service layer if reviewers want strict layering parity with tasks.Test plan
pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py). New assertion intest_create_api_key_calls_grant_when_flag_onpins the parent edge contract:AUTH_PROVIDER=sparkis exercised in dev.Out of scope / follow-ups
Nonefromregister_resource; column stays NULL for now (same as tasks). A follow-up will surface the token once the adapter exposes it.task,build,deployment,schedule). Each owns its own follow-up; Asher's task PR feat(AGX1-274): record task creator identity and FGAC migration safety #246 has the same parent_agent gap and is the highest-priority follow-up.Linked: AGX1-272.
Closes #253 (consolidated here).
🤖 Generated with Claude Code
Greptile Summary
This PR introduces dual-write of
agent_api_keysto Spark AuthZ behind a per-accountFGAC_AGENT_API_KEYS_DUAL_WRITEfeature flag, mirroring the task dual-write pattern from PR #246. The dual-write logic lives inAgentAPIKeysUseCase(no dedicated service layer), with grant-before-write on create and best-effort deregister-after-delete.creator_user_id,creator_service_account_id, andspark_authz_zedtokencolumns toagent_api_keys, creates CONCURRENTLY indexes in anautocommit_block, and enforces a CHECK constraint ensuring at most one creator identity is set.FeatureFlagProvider(newsrc/utils/feature_flags.py) resolves per-account flags via env-var allowlists; the env key is derived from the flag name at call time so no caching layer is needed.AgentAPIKeysUseCasecallsregister_resource(owner + parent edge atomically) before the Postgres write and best-effortderegister_resourceafter the delete; eight new integration tests cover the main flag-on/off and failure paths.Confidence Score: 5/5
Safe to merge — all new code paths are guarded by a per-account feature flag, all new columns are nullable with no schema breakage, and the dual-write contract is correctly implemented and well-tested.
The grant-before-write and best-effort-deregister patterns are implemented consistently and match the documented design. The migration uses CONCURRENTLY indexes with IF NOT EXISTS and a correct autocommit_block, safe to run against a live database. The feature flag short-circuits everything when disabled, keeping existing behaviour unchanged for all non-FGAC accounts. No logic defects were found beyond what has already been discussed in earlier review threads.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant R as Route Handler participant UC as AgentAPIKeysUseCase participant FF as FeatureFlagProvider participant AS as AuthorizationService participant DB as AgentAPIKeyRepo Note over R,DB: CREATE flow R->>UC: create(name, agent_id, api_key, account_id) UC->>FF: is_enabled(FGAC_AGENT_API_KEYS_DUAL_WRITE, account_id) alt flag ON and creator resolvable UC->>AS: "register_resource(api_key_id, parent=agent_id)" AS-->>UC: None else flag OFF or no creator UC-->>UC: skip Spark registration end UC->>DB: create(AgentAPIKeyEntity) DB-->>UC: persisted entity UC-->>R: AgentAPIKeyEntity Note over R,DB: DELETE-by-ID flow R->>UC: delete(id, account_id) UC->>DB: delete(id) DB-->>UC: ok UC->>FF: is_enabled(FGAC_AGENT_API_KEYS_DUAL_WRITE, account_id) alt flag ON UC->>AS: deregister_resource(api_key_id) AS-->>UC: ok or logged failure end UC-->>R: void Note over R,DB: DELETE-by-name flow R->>UC: delete_by_agent_id_and_key_name(agent_id, key_name, account_id) UC->>DB: get_by_agent_id_and_name(agent_id, key_name) DB-->>UC: existing or None UC->>DB: delete_by_agent_id_and_key_name(agent_id, key_name) DB-->>UC: ok alt existing found AND flag ON UC->>AS: deregister_resource(existing.id) AS-->>UC: ok or logged failure end UC-->>R: voidComments Outside Diff (1)
agentex/tests/integration/services/test_agent_api_key_service_dual_write.py, line 703-727 (link)The suite covers
grantfailure preventing the DB write, but not the reverse:grantsucceeding followed byrepo.createraising (e.g., a duplicate key or transient DB error). In that case a Spark tuple exists for anapi_key_idthat never lands in Postgres. Adding a test for this case would make the documented known-limitation explicit and guard against accidental regression if compensating logic is added later.Prompt To Fix With AI
Reviews (2): Last reviewed commit: "feat: register_resource with parent edge..." | Re-trigger Greptile