feat: RAG service API key file management#302
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds host-side RAG API key file management and wiring into RAG instance resource generation, refactors MCP per-node role and PostgREST authenticator generation, introduces a container KeysPath mount option, and adds unit tests for key handling, filename validation, and resource lifecycle. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/orchestrator/swarm/rag_service_keys_resource.go`:
- Around line 121-127: extractRAGAPIKeys currently uses pipeline names directly
to build filenames (e.g., keys[p.Name+"_embedding.key"]) and writeKeyFiles
writes them to disk, allowing path traversal via crafted pipeline names; fix by
sanitizing/validating pipeline names before using them in filenames (e.g.,
reject names containing path separators or non-alphanumeric chars, or replace
unsafe chars with underscores) and build file paths with filepath.Join then
verify the cleaned path is inside KeysDir (ensure filepath.Clean(joinedPath) has
KeysDir as a prefix) before writing; apply the same hardening wherever p.Name is
used to form filenames in extractRAGAPIKeys and writeKeyFiles.
- Around line 76-80: The Refresh loop in Refresh is only checking
os.IsNotExist(err) and ignores other stat errors; update the loop over r.Keys
(using r.KeysDir and filepath.Join) so that after calling os.Stat you: if err ==
nil continue; if os.IsNotExist(err) return resource.ErrNotFound; otherwise
return or wrap the encountered err (preserving context such as the key name) so
permission/I/O errors are surfaced instead of treated as present. Ensure you
reference the os.Stat call and return the real error for non-NotExist cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d7d38e6a-f745-4663-993e-e43256518cb2
📒 Files selected for processing (6)
server/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/swarm/rag_service_keys_resource.goserver/internal/orchestrator/swarm/rag_service_keys_resource_test.goserver/internal/orchestrator/swarm/rag_service_user_role_test.goserver/internal/orchestrator/swarm/resources.goserver/internal/orchestrator/swarm/service_spec.go
server/internal/orchestrator/swarm/rag_service_keys_resource.go
Outdated
Show resolved
Hide resolved
759feca to
497d7b2
Compare
c384b7d to
901e64e
Compare
Up to standards ✅🟢 Issues
|
| Category | Results |
|---|---|
| Complexity | 1 medium |
🟢 Metrics 61 complexity · 14 duplication
Metric Results Complexity 61 Duplication 14
TIP This summary will be updated as you push new changes. Give us feedback
7a20ebb to
da808b3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/orchestrator/swarm/orchestrator.go`:
- Around line 685-703: The code creates a host-side keys directory
(RAGServiceKeysResource) but never appends a
ServiceInstanceSpecResource/ServiceInstanceResource pair so the keys never get
mounted; modify the generator to mirror the MCP/PostgREST flow by creating and
appending the same service spec/resource pair (use the same pattern that
produces ServiceInstanceSpecResource and ServiceInstanceResource) and set the
service's mount to consume the KeysPath from RAGServiceKeysResource (refer to
dataDirID, dataDir, extractRAGAPIKeys and KeysPath) before calling
buildServiceInstanceResources so the returned orchestratorResources include the
service that mounts the keys.
In `@server/internal/orchestrator/swarm/rag_service_keys_resource.go`:
- Around line 103-123: Create/Update currently only set permissions on newly
created entries, leaving existing keys/ directory or files with insecure modes;
ensure directory and files are explicitly chmod'd on every Create and Update. In
Create and Update call os.MkdirAll(keysDir, 0o700) (or ensure existence) and
then os.Chmod(keysDir, 0o700) to enforce directory perms before any writes;
after writing each key file in writeKeyFiles (or immediately after os.WriteFile)
call os.Chmod(filePath, 0o600) to enforce file perms; in Update ensure you
create/chmod the directory prior to calling removeStaleKeyFiles/writeKeyFiles so
permissions are always hardened. Include these changes in the
RAGServiceKeysResource methods Create, Update and in
writeKeyFiles/removeStaleKeyFiles as appropriate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f2473c91-6b36-4833-bcf1-a5fadd99d2ee
📒 Files selected for processing (7)
server/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/swarm/rag_service_keys_resource.goserver/internal/orchestrator/swarm/rag_service_keys_resource_test.goserver/internal/orchestrator/swarm/rag_service_user_role.goserver/internal/orchestrator/swarm/rag_service_user_role_test.goserver/internal/orchestrator/swarm/resources.goserver/internal/orchestrator/swarm/service_spec.go
✅ Files skipped from review due to trivial changes (1)
- server/internal/orchestrator/swarm/rag_service_keys_resource_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- server/internal/orchestrator/swarm/resources.go
- server/internal/orchestrator/swarm/service_spec.go
- server/internal/orchestrator/swarm/rag_service_user_role_test.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
server/internal/orchestrator/swarm/orchestrator.go (1)
686-704:⚠️ Potential issue | 🔴 CriticalRAG still never emits the service resources that would mount these keys.
This path appends only RO-role and filesystem resources, then returns;
buildServiceInstanceResourcesjust serializes that slice. UnlikegenerateMCPInstanceResources, there is noServiceInstanceSpecResource/ServiceInstanceResource, so nothing can passKeysPathinto the container or start the RAG service at all.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/swarm/orchestrator.go` around lines 686 - 704, The RAG path builds only filesystem and keys resources (RAGServiceKeysResource) then returns via buildServiceInstanceResources, so no ServiceInstanceSpecResource/ServiceInstanceResource is emitted and KeysPath never reaches the container; fix by mirroring the pattern used in generateMCPInstanceResources: create and append a ServiceInstanceSpecResource with KeysPath set to the keys subdir and then a ServiceInstanceResource (or ServiceInstanceSpecResource wrapper) that references the same ServiceInstanceID so the orchestrator will mount the keys and start the RAG service; ensure you append these resources to orchestratorResources before calling buildServiceInstanceResources.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@server/internal/orchestrator/swarm/orchestrator.go`:
- Around line 686-704: The RAG path builds only filesystem and keys resources
(RAGServiceKeysResource) then returns via buildServiceInstanceResources, so no
ServiceInstanceSpecResource/ServiceInstanceResource is emitted and KeysPath
never reaches the container; fix by mirroring the pattern used in
generateMCPInstanceResources: create and append a ServiceInstanceSpecResource
with KeysPath set to the keys subdir and then a ServiceInstanceResource (or
ServiceInstanceSpecResource wrapper) that references the same ServiceInstanceID
so the orchestrator will mount the keys and start the RAG service; ensure you
append these resources to orchestratorResources before calling
buildServiceInstanceResources.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 82bd302d-2d12-4869-b757-f61e6e25480b
📒 Files selected for processing (2)
server/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/swarm/rag_service_user_role_test.go
✅ Files skipped from review due to trivial changes (1)
- server/internal/orchestrator/swarm/rag_service_user_role_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
server/internal/orchestrator/swarm/rag_service_keys_resource.go (1)
103-123:⚠️ Potential issue | 🟠 MajorPermission hardening in Create/Update is still incomplete.
Create/Updateshould enforce directory and file modes on every reconciliation andUpdateshould ensurekeysDirexists before stale cleanup/write. This concern was raised earlier and is still unresolved in the current patch.🔒 Proposed hardening
func (r *RAGServiceKeysResource) Create(ctx context.Context, rc *resource.Context) error { keysDir, err := r.keysDir(rc) if err != nil { return err } if err := os.MkdirAll(keysDir, 0o700); err != nil { return fmt.Errorf("failed to create keys directory: %w", err) } + if err := os.Chmod(keysDir, 0o700); err != nil { + return fmt.Errorf("failed to set keys directory permissions: %w", err) + } return r.writeKeyFiles(keysDir) } func (r *RAGServiceKeysResource) Update(ctx context.Context, rc *resource.Context) error { keysDir, err := r.keysDir(rc) if err != nil { return err } + if err := os.MkdirAll(keysDir, 0o700); err != nil { + return fmt.Errorf("failed to create keys directory: %w", err) + } + if err := os.Chmod(keysDir, 0o700); err != nil { + return fmt.Errorf("failed to set keys directory permissions: %w", err) + } if err := r.removeStaleKeyFiles(keysDir); err != nil { return err } return r.writeKeyFiles(keysDir) } @@ path := filepath.Join(keysDir, name) if err := os.WriteFile(path, []byte(key), 0o600); err != nil { return fmt.Errorf("failed to write key file %q: %w", name, err) } + if err := os.Chmod(path, 0o600); err != nil { + return fmt.Errorf("failed to set key file %q permissions: %w", name, err) + } }Based on learnings: "implement idempotent provisioning semantics: re-running ProvisionServices on already-provisioned service instances should be safe and non-destructive."
Also applies to: 140-149
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/swarm/rag_service_keys_resource.go` around lines 103 - 123, Create/Update currently don't enforce idempotent permission hardening: ensure keysDir exists (create with 0o700) on every reconciliation and enforce file modes after writing; in RAGServiceKeysResource.Create and RAGServiceKeysResource.Update call mkdir (os.MkdirAll) with 0o700 before any cleanup/write, have Update create keysDir before removeStaleKeyFiles, and after r.writeKeyFiles ensure each key file written by writeKeyFiles is chmod'ed to a strict mode (e.g., 0o600) or have writeKeyFiles return file paths so caller can set modes; also ensure directory permissions are re-applied (os.Chmod(keysDir, 0o700)) so repeats are idempotent and non-destructive.
🧹 Nitpick comments (1)
server/internal/orchestrator/swarm/rag_service_keys_resource_test.go (1)
297-315: Add a directory-permission assertion in Create test.At Line 297, the test verifies key file modes but not the
keys/directory mode. Add aStat(keysDir)assertion for0700so permission regressions are caught.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/swarm/rag_service_keys_resource_test.go` around lines 297 - 315, Add a directory permission assertion for the keysDir in the test: after constructing keysDir (as in the existing test in rag_service_keys_resource_test.go) call os.Stat(keysDir) and assert its Mode().Perm() equals 0o700 (0700) (use t.Errorf on mismatch), analogous to the existing per-file Stat checks for each key name in r.Keys so directory permission regressions are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/orchestrator/swarm/rag_service_keys_resource.go`:
- Around line 91-97: The loop in Refresh that iterates over r.Keys uses
filepath.Join(keysDir, name) without validating the key filename, allowing
directory traversal; before calling os.Stat or joining the path validate each
key name by calling validateKeyFilename(name) (the same validation used
elsewhere), and if validation fails return resource.ErrNotFound (or an
appropriate error) so malformed names are rejected and you never stat outside
keysDir; update the loop in the Refresh method to perform this validation for
each name.
---
Duplicate comments:
In `@server/internal/orchestrator/swarm/rag_service_keys_resource.go`:
- Around line 103-123: Create/Update currently don't enforce idempotent
permission hardening: ensure keysDir exists (create with 0o700) on every
reconciliation and enforce file modes after writing; in
RAGServiceKeysResource.Create and RAGServiceKeysResource.Update call mkdir
(os.MkdirAll) with 0o700 before any cleanup/write, have Update create keysDir
before removeStaleKeyFiles, and after r.writeKeyFiles ensure each key file
written by writeKeyFiles is chmod'ed to a strict mode (e.g., 0o600) or have
writeKeyFiles return file paths so caller can set modes; also ensure directory
permissions are re-applied (os.Chmod(keysDir, 0o700)) so repeats are idempotent
and non-destructive.
---
Nitpick comments:
In `@server/internal/orchestrator/swarm/rag_service_keys_resource_test.go`:
- Around line 297-315: Add a directory permission assertion for the keysDir in
the test: after constructing keysDir (as in the existing test in
rag_service_keys_resource_test.go) call os.Stat(keysDir) and assert its
Mode().Perm() equals 0o700 (0700) (use t.Errorf on mismatch), analogous to the
existing per-file Stat checks for each key name in r.Keys so directory
permission regressions are caught.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 09319c5f-063b-4cd9-bf98-f87d1bfcb5fd
📒 Files selected for processing (2)
server/internal/orchestrator/swarm/rag_service_keys_resource.goserver/internal/orchestrator/swarm/rag_service_keys_resource_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
server/internal/orchestrator/swarm/rag_service_keys_resource_test.go (1)
323-355: Add a regression test for non-destructiveUpdatefailure path.Current tests validate successful rotation/removal, but not the case where
Updatefails due to an invalid desired filename. A targeted test here would lock in the expectation that existing keys are not removed on validation failure.🧪 Suggested test case
+func TestRAGServiceKeysResource_Update_InvalidFilename_DoesNotDeleteExistingKeys(t *testing.T) { + parentID := "inst1-data" + rc, parentPath := ragKeysRCWithTempDir(t, parentID) + + r := &RAGServiceKeysResource{ + ServiceInstanceID: "inst1", + HostID: "host-1", + ParentID: parentID, + Keys: map[string]string{"default_rag.key": "sk-v1"}, + } + if err := r.Create(context.Background(), rc); err != nil { + t.Fatalf("Create() error = %v", err) + } + + r.Keys = map[string]string{"sub/dir.key": "sk-bad"} // invalid filename + if err := r.Update(context.Background(), rc); err == nil { + t.Fatal("Update() = nil, want error") + } + + keysDir := filepath.Join(parentPath, "keys") + if _, err := os.Stat(filepath.Join(keysDir, "default_rag.key")); err != nil { + t.Fatalf("default_rag.key should remain after failed Update, got err = %v", err) + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/swarm/rag_service_keys_resource_test.go` around lines 323 - 355, Add a regression test that ensures Update is non-destructive when validation fails: use ragKeysRCWithTempDir to create an initial RAGServiceKeysResource (call Create) with a valid key file (e.g., "old_rag.key" -> "sk-old"), then set r.Keys to an invalid desired filename (e.g., "../bad" or another name that will trigger validation in RAGServiceKeysResource.Update) and call Update expecting an error; assert the error is returned and that the original file still exists with unchanged content (check filepath.Join(parentPath,"keys","old_rag.key") exists and contains "sk-old") and that no new invalid file was created. This locks in the guarantee that Update does not remove/overwrite existing keys on validation failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/orchestrator/swarm/rag_service_keys_resource.go`:
- Around line 193-196: The validateKeyFilename function currently allows "." and
".." which can create unsafe path resolution; update validateKeyFilename to
explicitly reject the dot-segments by returning an error when name == "." or
name == ".." (in addition to the existing filepath.Clean/IsAbs/ContainsAny
checks) so Refresh and any write paths cannot use these special segments.
- Around line 120-134: In Update, validate all desired key filenames before
performing any destructive work: after obtaining keysDir (in
RAGServiceKeysResource.Update) and before calling removeStaleKeyFiles, run a
validation step that checks each desired filename produced/used by writeKeyFiles
(e.g., validate keys' names/paths) and return an error on invalid names; only if
validation passes proceed to call removeStaleKeyFiles and then writeKeyFiles.
Apply the same ordering/validation change to the analogous block around the
other Update usage (the code at the 152-156 region) so deletions never happen
before filenames are validated.
---
Nitpick comments:
In `@server/internal/orchestrator/swarm/rag_service_keys_resource_test.go`:
- Around line 323-355: Add a regression test that ensures Update is
non-destructive when validation fails: use ragKeysRCWithTempDir to create an
initial RAGServiceKeysResource (call Create) with a valid key file (e.g.,
"old_rag.key" -> "sk-old"), then set r.Keys to an invalid desired filename
(e.g., "../bad" or another name that will trigger validation in
RAGServiceKeysResource.Update) and call Update expecting an error; assert the
error is returned and that the original file still exists with unchanged content
(check filepath.Join(parentPath,"keys","old_rag.key") exists and contains
"sk-old") and that no new invalid file was created. This locks in the guarantee
that Update does not remove/overwrite existing keys on validation failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 131486d4-8056-4359-9576-78b7da08721a
📒 Files selected for processing (2)
server/internal/orchestrator/swarm/rag_service_keys_resource.goserver/internal/orchestrator/swarm/rag_service_keys_resource_test.go
server/internal/orchestrator/swarm/rag_service_keys_resource.go
Outdated
Show resolved
Hide resolved
jason-lynch
left a comment
There was a problem hiding this comment.
Awesome! Thank you for addressing the other comments about the key file lifecycle. Most of my questions are pretty minor, but I do want to be really careful about the database user stuff.
server/internal/orchestrator/swarm/rag_service_keys_resource.go
Outdated
Show resolved
Hide resolved
server/internal/orchestrator/swarm/rag_service_keys_resource.go
Outdated
Show resolved
Hide resolved
server/internal/orchestrator/swarm/rag_service_keys_resource.go
Outdated
Show resolved
Hide resolved
server/internal/orchestrator/swarm/rag_service_keys_resource.go
Outdated
Show resolved
Hide resolved
| if len(spec.DatabaseNodes) > 1 { | ||
| for _, nodeInst := range spec.DatabaseNodes[1:] { | ||
| perNodeRWID := ServiceUserRolePerNodeIdentifier(spec.ServiceSpec.ServiceID, ServiceUserRoleRW, nodeInst.NodeName) | ||
| for _, nodeInst := range spec.DatabaseNodes { |
There was a problem hiding this comment.
Did you mean to include these changes? If so, could you please request a review from Ryan as well? The way we manage these users is pretty tricky, and I want to make sure we don't accidentally break another service.
There was a problem hiding this comment.
Yes @jason-lynch , this change is intentional. The old DatabaseNodes[1:] assumed the canonical node is always first, but for RAG the canonical is the co-located node (not necessarily index 0) — so we now skip by name instead. This produces identical behavior for MCP/PostgREST since their canonical is always nodeInstances[0]. Requesting @rshoemaker review now.
server/internal/orchestrator/swarm/rag_service_keys_resource.go
Outdated
Show resolved
Hide resolved
| return nil | ||
| } | ||
|
|
||
| // validateKeyFilename rejects filenames that could escape the keys directory via path traversal. |
There was a problem hiding this comment.
Do we validate these in the API layer as well? It might be worth being very restrictive about allowed characters in these key names.
That could definitely be a separate PR if not, I'm mostly curious what the current state is.
There was a problem hiding this comment.
Thanks @jason-lynch — makes sense. I’ve created a separate ticket: https://pgedge.atlassian.net/browse/PLAT-536
However, there are currently no pipeline name restrictions in the RAG server.
Summary
This PR adds
RAGServiceKeysResource, a resource that stores provider API keys (OpenAI, Anthropic, etc.) as files on the host filesystem and mounts them into the RAG container in read-only mode. The keys are not exposed via Docker Swarm configs or API responses.Changes
RAGServiceKeysResource(swarm.rag_service_keys) — manages the{DataDir}/services/{instanceID}/keys/directory: Create writes key files at mode 0600 in a 0700 directory, Update rewrites them for key rotation, Delete removes the directory via os.RemoveAllextractRAGAPIKeyshelper that extracts API keys from a parsedRAGServiceConfiginto a{pipeline_name}_embedding.key / {pipeline_name}_rag.keyfilename mapServiceContainerSpecOptionswithKeysPath; when non-empty a read-only bind mount to/app/keysis added to the container specRAGServiceKeysResourceintogenerateRAGInstanceResourcesafterRAGServiceUserRole; RAG config is now parsed at resource-generation timeRegisterResourceTypesTesting
api_keysin responseChecklist
Notes for Reviewers
PLAT-490