Skip to content

Add Vault JWT local CRE coverage and topology docs#22048

Merged
prashantkumar1982 merged 28 commits intodevelopfrom
codex/vault-jwt-e2e
Apr 22, 2026
Merged

Add Vault JWT local CRE coverage and topology docs#22048
prashantkumar1982 merged 28 commits intodevelopfrom
codex/vault-jwt-e2e

Conversation

@prashantkumar1982
Copy link
Copy Markdown
Contributor

@prashantkumar1982 prashantkumar1982 commented Apr 16, 2026

What Changed

  • added Vault JWT auth runtime wiring across the Vault gateway handler, Vault DON handler, OCR2 delegate, and CRE job/deployment config paths
  • aligned JWT auth with the existing client request-digest flow and now require workflow_owner in authorization_details
  • updated Vault OCR plugin CRUD responses to canonicalize the returned owner to org_id when VaultOrgIdAsSecretOwnerEnabled is on
  • added a JWT-enabled local CRE Vault topology on top of workflow-gateway-capabilities-don, documented the flags it turns on, and use the default workflow-gateway-capabilities-don topology for the flags-off path
  • added auth0 and [CRE.Linking] config to the relevant Vault topologies so tests rely on flag behavior instead of config presence
  • expanded Bucket B Vault smoke coverage to cover allowlist auth with JWT disabled, JWT auth with JWT enabled, JWT rejection when JWT auth is disabled, and mixed allowlist/JWT CRUD flows
  • refactored the Vault smoke helpers to share CRUD/auth plumbing, moved pure helper logic out of the main test file, and cleaned up scenario/subtest names to match actual behavior
  • added a local Vault JWT test harness with a fake JWKS issuer, mock linking service, Docker host-rewrite support for static topologies, and direct use of shared Vault label-encryption helpers
  • added repo-local Local CRE agent guidance under docs/local-cre/agent-skills/local-cre-e2e/ and linked it from docs/local-cre/index.md

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

CORA - Pending Reviewers

All codeowners have approved! ✅

Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown

For more details, see the full review summary.

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

✅ No conflicts with other open PRs targeting develop

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Apr 16, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

Failed Test Failure Summary Logs
Test_CCIPProgrammableTokenTransfer_EVM2Sui_BurnMintTokenPool Logs ↗︎

View Full Report ↗︎Docs

Comment thread .github/workflows/cre-system-tests.yaml Outdated
Comment thread system-tests/lib/cre/don/jobs/jobs.go Outdated

return retry.RetryableError(err)
})
if retryErr != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why is that needed? I have never run into job approval errors. Can you share the ones you encountered? Also, should we retry or is it possible that these errors reveal an actual problem/bug, which should be fixed either in the node or in JD?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this was AI trying something.
Removed it now

// NewTestJWTIssuer creates a fake issuer with one generated RSA key and starts serving JWKS immediately.
func NewTestJWTIssuer() (*TestJWTIssuer, error) {
return NewTestJWTIssuerOnAddr("0.0.0.0:0")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now we can leave this as-is, but in the future we might want to Dockerise it. Especially if one day these tests will run in k8s clusters (dev, stage, etc) as envisioned by Dev Journeys.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, happy to do it.
I assume you mean LinkingService too?
Also want to understand how exactly?
Do you mean a new container image, and then support it in local cre lifecycle?

Copy link
Copy Markdown
Contributor

@Tofel Tofel Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, new container image supported in the local CRE lifecycle similar to how Chip Router is supported:

Comment thread system-tests/lib/cre/vault/linking_service.go
@prashantkumar1982 prashantkumar1982 force-pushed the codex/vault-jwt-e2e branch 2 times, most recently from 5d4b8ed to 7c9b32a Compare April 17, 2026 18:41
Comment thread .github/workflows/sigscanner.yml Fixed
@prashantkumar1982 prashantkumar1982 force-pushed the codex/vault-jwt-e2e branch 2 times, most recently from 70d73f2 to 7d1f3a9 Compare April 17, 2026 21:51
Comment thread docs/local-cre/agent-skills/local-cre-e2e/SKILL.md Outdated
Copy link
Copy Markdown
Contributor

@Tofel Tofel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above all you need make the vault system test faster, currently the JWT-based one takes 2x more time than the next slowest test, i.e. ~10 minutes

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a038867c4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if expectedWorkflowOwner == "" && orgID == "" {
expectedWorkflowOwner = req.Id.Owner
}
err := EnsureRightLabelOnSecret(publicKey, req.EncryptedValue, expectedWorkflowOwner, orgID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate org-label validation behind OrgID owner setting

validateWriteRequest now always allows ciphertext labels that match orgID, but the Vault plugin still drops orgID when VaultOrgIdAsSecretOwnerEnabled is off (see core/services/ocr2/plugins/vault/plugin.go around orgID = "" before EnsureRightLabelOnSecret). In that configuration, gateway-side validation accepts org-labeled secrets and forwards them, then node/plugin validation rejects the same request later, causing avoidable runtime failures for JWT-enabled deployments that haven’t enabled org-as-owner. Gateway validation should be aligned with the same gate (or enforce the gate when org labels are accepted).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateWriteRequest allows orgID or workflowOwner.
We have many tests that use this scenario and are passing.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b656838295

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +160 to +162
if isVaultJWTAuthEnabledTopology(topology) {
t.Run(jwtSubtestName, func(t *testing.T) {
ExecuteVaultMixedAuthTest(t, fixture.TestEnv, fixture.Issuer, fixture.LinkingService)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid concurrent Vault auth subtests on shared fixture

ExecuteVaultAllowListBasedTests contains an inner t.Parallel() subtest, so this t.Run can return before the allowlist flow has finished; starting the JWT subtest immediately afterwards reuses the same fixture.TestEnv and fixture.LinkingService while the allowlist flow may still be running. When parallelEnabled is true, this can make allowlist and JWT CRUD operations race against the same Vault state and produce flaky, order-dependent failures.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race against same vault state should be ok

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e991a9559f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +55 to +59
runtime.Logger().Info(fmt.Sprintf("Vault secret workflow phase completed: %s", phase.Name),
"phaseName", phase.Name,
"checkCount", len(phase.Checks),
)
return fmt.Sprintf("Validated phase %s", phase.Name), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Track completed phase before returning from onTrigger

onTrigger returns as soon as the first phase evaluates successfully, so multi-phase workflows can get stuck repeatedly reporting an earlier phase if its checks remain true after state advances. In that case the later phase log ("Vault secret workflow phase completed: <phase>") is never emitted, and callers like waitForVaultWorkflowPhase will time out waiting for later lifecycle phases. Persisting/advancing the current phase (or evaluating only the next expected phase) avoids this starvation.

Useful? React with 👍 / 👎.

@prashantkumar1982
Copy link
Copy Markdown
Contributor Author

above all you need make the vault system test faster, currently the JWT-based one takes 2x more time than the next slowest test, i.e. ~10 minutes

Dramatically optimized now. Should be super fast on most recent runs now.

@prashantkumar1982 prashantkumar1982 requested a review from Tofel April 21, 2026 21:39
Comment thread core/scripts/cre/environment/configs/workflow-gateway-don-grpc-source.toml Outdated
@prashantkumar1982 prashantkumar1982 requested a review from Tofel April 22, 2026 15:27
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e29f274014

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

handlers = append(handlers, newDefaultWebAPICapabilitiesHandler())
case GatewayHandlerTypeVault:
handlers = append(handlers, newDefaultVaultHandler(g.RequestTimeoutSec))
handlers = append(handlers, newDefaultVaultHandler(g.RequestTimeoutSec, svcCfg.Auth0))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate Auth0 fields before emitting vault gateway specs

svcCfg.Auth0 is forwarded into the vault handler config without any schema validation, so a service config that sets auth0 but omits issuerURL or audience will still produce a job spec and only fail later at runtime when core/services/gateway/handlers/vault.NewHandler calls NewJWTBasedAuth (which hard-fails on missing issuer/audience). This turns a deterministic config error into a delayed startup failure for gateway jobs; the gateway job builder should reject incomplete Auth0 config up front.

Useful? React with 👍 / 👎.

@cl-sonarqube-production
Copy link
Copy Markdown

@prashantkumar1982 prashantkumar1982 added this pull request to the merge queue Apr 22, 2026
Merged via the queue into develop with commit 444ae28 Apr 22, 2026
213 of 215 checks passed
@prashantkumar1982 prashantkumar1982 deleted the codex/vault-jwt-e2e branch April 22, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants