Add Vault JWT local CRE coverage and topology docs#22048
Add Vault JWT local CRE coverage and topology docs#22048prashantkumar1982 merged 28 commits intodevelopfrom
Conversation
CORA - Pending ReviewersAll codeowners have approved! ✅ Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown For more details, see the full review summary. |
|
I see you updated files related to
|
|
✅ No conflicts with other open PRs targeting |
31cb602 to
077fcbe
Compare
|
|
|
||
| return retry.RetryableError(err) | ||
| }) | ||
| if retryErr != nil { |
There was a problem hiding this comment.
curious, why is that needed? I have never run into job approval errors. Can you share the ones you encountered? Also, should we retry or is it possible that these errors reveal an actual problem/bug, which should be fixed either in the node or in JD?
There was a problem hiding this comment.
Yes this was AI trying something.
Removed it now
| // NewTestJWTIssuer creates a fake issuer with one generated RSA key and starts serving JWKS immediately. | ||
| func NewTestJWTIssuer() (*TestJWTIssuer, error) { | ||
| return NewTestJWTIssuerOnAddr("0.0.0.0:0") | ||
| } |
There was a problem hiding this comment.
I think for now we can leave this as-is, but in the future we might want to Dockerise it. Especially if one day these tests will run in k8s clusters (dev, stage, etc) as envisioned by Dev Journeys.
There was a problem hiding this comment.
Yes, happy to do it.
I assume you mean LinkingService too?
Also want to understand how exactly?
Do you mean a new container image, and then support it in local cre lifecycle?
There was a problem hiding this comment.
Exactly, new container image supported in the local CRE lifecycle similar to how Chip Router is supported:
- component defined in https://github.com/smartcontractkit/chainlink-testing-framework/tree/main/framework/components
- explicitly declared in the env TOML: https://github.com/smartcontractkit/chainlink/blob/develop/core/scripts/cre/environment/configs/workflow-don-solana.toml#L2-L3
- started in environment.go
5d4b8ed to
7c9b32a
Compare
70d73f2 to
7d1f3a9
Compare
… into codex/vault-jwt-e2e
0ca1b0c to
e3b6ca0
Compare
Tofel
left a comment
There was a problem hiding this comment.
above all you need make the vault system test faster, currently the JWT-based one takes 2x more time than the next slowest test, i.e. ~10 minutes
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a038867c4a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if expectedWorkflowOwner == "" && orgID == "" { | ||
| expectedWorkflowOwner = req.Id.Owner | ||
| } | ||
| err := EnsureRightLabelOnSecret(publicKey, req.EncryptedValue, expectedWorkflowOwner, orgID) |
There was a problem hiding this comment.
Gate org-label validation behind OrgID owner setting
validateWriteRequest now always allows ciphertext labels that match orgID, but the Vault plugin still drops orgID when VaultOrgIdAsSecretOwnerEnabled is off (see core/services/ocr2/plugins/vault/plugin.go around orgID = "" before EnsureRightLabelOnSecret). In that configuration, gateway-side validation accepts org-labeled secrets and forwards them, then node/plugin validation rejects the same request later, causing avoidable runtime failures for JWT-enabled deployments that haven’t enabled org-as-owner. Gateway validation should be aligned with the same gate (or enforce the gate when org labels are accepted).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
validateWriteRequest allows orgID or workflowOwner.
We have many tests that use this scenario and are passing.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b656838295
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if isVaultJWTAuthEnabledTopology(topology) { | ||
| t.Run(jwtSubtestName, func(t *testing.T) { | ||
| ExecuteVaultMixedAuthTest(t, fixture.TestEnv, fixture.Issuer, fixture.LinkingService) |
There was a problem hiding this comment.
Avoid concurrent Vault auth subtests on shared fixture
ExecuteVaultAllowListBasedTests contains an inner t.Parallel() subtest, so this t.Run can return before the allowlist flow has finished; starting the JWT subtest immediately afterwards reuses the same fixture.TestEnv and fixture.LinkingService while the allowlist flow may still be running. When parallelEnabled is true, this can make allowlist and JWT CRUD operations race against the same Vault state and produce flaky, order-dependent failures.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Race against same vault state should be ok
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e991a9559f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| runtime.Logger().Info(fmt.Sprintf("Vault secret workflow phase completed: %s", phase.Name), | ||
| "phaseName", phase.Name, | ||
| "checkCount", len(phase.Checks), | ||
| ) | ||
| return fmt.Sprintf("Validated phase %s", phase.Name), nil |
There was a problem hiding this comment.
Track completed phase before returning from onTrigger
onTrigger returns as soon as the first phase evaluates successfully, so multi-phase workflows can get stuck repeatedly reporting an earlier phase if its checks remain true after state advances. In that case the later phase log ("Vault secret workflow phase completed: <phase>") is never emitted, and callers like waitForVaultWorkflowPhase will time out waiting for later lifecycle phases. Persisting/advancing the current phase (or evaluating only the next expected phase) avoids this starvation.
Useful? React with 👍 / 👎.
Dramatically optimized now. Should be super fast on most recent runs now. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e29f274014
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| handlers = append(handlers, newDefaultWebAPICapabilitiesHandler()) | ||
| case GatewayHandlerTypeVault: | ||
| handlers = append(handlers, newDefaultVaultHandler(g.RequestTimeoutSec)) | ||
| handlers = append(handlers, newDefaultVaultHandler(g.RequestTimeoutSec, svcCfg.Auth0)) |
There was a problem hiding this comment.
Validate Auth0 fields before emitting vault gateway specs
svcCfg.Auth0 is forwarded into the vault handler config without any schema validation, so a service config that sets auth0 but omits issuerURL or audience will still produce a job spec and only fail later at runtime when core/services/gateway/handlers/vault.NewHandler calls NewJWTBasedAuth (which hard-fails on missing issuer/audience). This turns a deterministic config error into a delayed startup failure for gateway jobs; the gateway job builder should reject incomplete Auth0 config up front.
Useful? React with 👍 / 👎.
|




What Changed
workflow_ownerinauthorization_detailsorg_idwhenVaultOrgIdAsSecretOwnerEnabledis onworkflow-gateway-capabilities-don, documented the flags it turns on, and use the defaultworkflow-gateway-capabilities-dontopology for the flags-off pathauth0and[CRE.Linking]config to the relevant Vault topologies so tests rely on flag behavior instead of config presencedocs/local-cre/agent-skills/local-cre-e2e/and linked it fromdocs/local-cre/index.md