Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds HashiCorp Vault KV v2 as a SecretsManager, wires Vault into DatabaseService for credential storage and rotation, exposes POST /databases/:id/rotate-credentials (API, handler, CLI, SDK), persists Changes
Sequence DiagramsequenceDiagram
actor Client
participant API as API Handler
participant Service as DatabaseService
participant Vault as Vault (SecretsManager)
participant DB as Database Container
participant Pooler as Pooler Sidecar
Client->>API: POST /databases/:id/rotate-credentials (Idempotency-Key)
API->>Service: RotateCredentials(ctx, dbID, idempotencyKey)
Service->>Service: Generate new password
Service->>Vault: StoreSecret(vault_path, {"password": new})
Vault-->>Service: 200 OK
Service->>DB: ALTER USER ... WITH PASSWORD '<new>'
DB-->>Service: OK
Service->>Pooler: Restart (if present)
Pooler-->>Service: OK
Service->>Service: Emit audit/event logs
Service-->>API: success
API-->>Client: HTTP 200 {message: "database credentials rotated successfully"}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/guides/rds.md (1)
57-71:⚠️ Potential issue | 🟡 MinorDuplicate section numbering.
Both "High Availability & Replication" (Line 57) and "Delete a Database" (Line 71) are numbered as section 5. The "Delete a Database" section should be renumbered to 6.
📝 Proposed fix
-### 5. Delete a Database +### 6. Delete a Database🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/guides/rds.md` around lines 57 - 71, The "Delete a Database" heading is misnumbered as section 5; update the markdown heading for that section title (currently "### 5. Delete a Database") to "### 6. Delete a Database" so numbering follows "### 5. High Availability & Replication"; locate the "Delete a Database" header in docs/guides/rds.md (heading text "Delete a Database") and change its numeric prefix from 5 to 6.internal/core/services/database.go (1)
108-125:⚠️ Potential issue | 🟠 MajorClean up Vault state on early provisioning failures.
After line 110 succeeds, any error before
performProvisioningRollbackruns—such asresolveVpcNetworkorCreateVolume—returns immediately and leaves an orphaned secret behind. The same ordering is repeated in the replica and restore flows below.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/database.go` around lines 108 - 125, After storing the password via s.secrets.StoreSecret (vaultPath from s.getVaultPath and db.CredentialPath) ensure any early return paths before performProvisioningRollback remove the orphaned secret: on errors returned by s.resolveVpcNetwork and s.volumeSvc.CreateVolume (and the analogous points in the replica and restore flows) call the secrets removal API (e.g. s.secrets.DeleteSecret(ctx, vaultPath) or the project’s equivalent) before returning the wrapped error so the Vault state is cleaned up; make sure to perform this cleanup in every early-exit branch that occurs after successful StoreSecret and before performProvisioningRollback is invoked.
🧹 Nitpick comments (6)
docs/api-reference.md (1)
597-600: Consider adding response example for consistency.Other endpoints in this document include response examples. For completeness, consider adding the expected response:
📝 Suggested documentation enhancement
### POST /databases/:id/rotate-credentials Regenerate the database password and update it in both the database engine and Vault. - **Security**: Ensures credentials are never stored in plain text in the primary metadata store. - **Workflow**: Automated update of database users and sidecar (pooler) reloads. + +**Response:** +```json +{ + "message": "database credentials rotated successfully" +} +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/api-reference.md` around lines 597 - 600, Add a response example for the POST /databases/:id/rotate-credentials endpoint to match other endpoints: include a small JSON success body (e.g., { "message": "database credentials rotated successfully" }) and label it as a response example under the endpoint documentation so readers see the expected API response format for rotate-credentials.internal/adapters/vault/adapter_test.go (1)
15-69: Prefer a table-driven layout for these adapter cases.
StoreSecret,GetSecret,DeleteSecret, andPingare all the same test shape with different setup/assertions. Folding them into a small table-driven suite will make it easier to add the missing failure-path cases for the Vault adapter. As per coding guidelines, "Use table-driven tests in test files".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapters/vault/adapter_test.go` around lines 15 - 69, The TestVaultAdapter test should be converted to a table-driven subtest loop: define a slice of test cases (with fields like name, method to call (StoreSecret/GetSecret/DeleteSecret/Ping), input path, payload where needed, expected status or expected value, and whether an error is expected) and iterate with for _, tc := range cases { t.Run(tc.name, func(t *testing.T) { ... }) }, invoking the appropriate adapter methods (adapter.StoreSecret, adapter.GetSecret, adapter.DeleteSecret, adapter.Ping) based on tc.name or a function field and performing the corresponding assertions (assert.NoError/assert.Error, and for GetSecret assert returned data contains expected inner "data" map and password). Keep the existing mock server setup and re-use ctx and adapter variables, and ensure the GetSecret case checks nested "data" as before; structure will make it easy to add failure-path cases later.internal/handlers/database_handler_test.go (1)
557-574: Add failure-path coverage and make this table-driven.This new endpoint test only checks the happy path. For a credential-rotation route, please also cover invalid UUID and service-error responses, and fold the cases into a table-driven test like the rest of the handler surface should be moving toward. As per coding guidelines, "Use table-driven tests in test files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/database_handler_test.go` around lines 557 - 574, Convert TestDatabaseHandlerRotateCredentials into a table-driven test that iterates cases for success, invalid UUID, and service error; for each case build the request path, set up the mock service expectation on svc.RotateCredentials (for success return nil, for service error return a specific error, and for invalid UUID do not expect a call), call r.ServeHTTP and assert the expected status code and response body substring (e.g., 200 + "database credentials rotated successfully" for success, 400 for invalid UUID, 500 + error message for service error). Keep the test using existing helpers setupDatabaseHandlerTest, r.POST(..., handler.RotateCredentials) and reference the test name TestDatabaseHandlerRotateCredentials, the handler method RotateCredentials, and the mock expectation svc.On("RotateCredentials", ...) when creating cases. Ensure t.Parallel() remains and defer svc.AssertExpectations(t) still runs.docs/swagger/docs.go (1)
1382-1418: Use a concrete response schema for credential rotation.This documents the success body as an anonymous
map[string]string, which makes the contract vague for SDK generation and leaves it unclear what fields are safe to rely on for a security-sensitive operation. Please point the swagger annotation at a named response type, or the standard response envelope, so clients get a stable schema.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/swagger/docs.go` around lines 1382 - 1418, The OpenAPI entry for the POST "/databases/{id}/rotate-credentials" operation uses an anonymous map[string]string response which is too vague; change the responses.200.schema to reference a concrete named response model (e.g. "$ref": "#/definitions/RotateCredentialsResponse" or the existing standard envelope type) instead of the anonymous "type: object/additionalProperties" block, and add or reuse the corresponding definition in the same docs/swagger/docs.go (or definitions section) so SDKs get a stable schema for the rotated credentials payload.internal/api/setup/dependencies.go (1)
264-268: Vault connectivity is not verified at startup.The
NewVaultAdapterconstructor only creates the client but doesn't verify connectivity (seeinternal/adapters/vault/adapter.go:19-34). If Vault is unreachable or misconfigured (e.g., empty token from config), the error will only surface during the first secret operation, potentially causing database creation failures at runtime.Consider calling
vaultSvc.Ping(ctx)after initialization to fail fast if Vault is unavailable:♻️ Proposed enhancement
vaultSvc, err := vault.NewVaultAdapter(c.Config.VaultAddress, c.Config.VaultToken, c.Logger) if err != nil { return nil, nil, fmt.Errorf("failed to init vault adapter: %w", err) } + + // Verify Vault connectivity at startup + ctx := context.Background() + if err := vaultSvc.Ping(ctx); err != nil { + c.Logger.Warn("vault health check failed - credential operations may fail", "error", err) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/setup/dependencies.go` around lines 264 - 268, NewVaultAdapter currently only constructs the Vault client but doesn't verify connectivity, so network/token issues surface later; after creating vaultSvc in the initialization flow (where NewVaultAdapter is called) call vaultSvc.Ping(ctx) and return an error if Ping fails (use the same wrapped fmt.Errorf pattern) so the process fails fast on Vault connectivity/token problems; reference NewVaultAdapter and the Ping method on the returned vaultSvc to locate where to add the check.internal/platform/config.go (1)
82-84: Consider validating Vault configuration in production.
VaultTokendefaults to an empty string with no validation. When Vault is required for database credential management, an empty token will cause runtime failures during secret operations rather than failing fast at startup. Consider logging a warning or returning an error inInitServicesif Vault is enabled but the token is missing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/platform/config.go` around lines 82 - 84, The Vault configuration allows VaultToken to be empty (set via getEnv) which can cause runtime failures; update InitServices to validate Vault settings (VaultAddress, VaultToken, VaultMountPath) when Vault-based secret management is enabled: check that VaultToken is non-empty and VaultAddress/MountPath are set, and if the token is missing either log a clear warning and fail fast or return an error from InitServices so startup aborts rather than failing later during secret ops; reference the VaultToken/VaultAddress/VaultMountPath fields and the getEnv usage to locate the config and add the validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/FEATURES.md`:
- Line 127: The docs overstate credential rotation reliability because the
rotation flow in internal/core/services/database.go updates Vault before
executing the container ALTER USER and has no compensating write if the exec
fails; fix by making the workflow atomic or documenting the failure mode: either
(A) change the flow in the rotation routine (e.g.,
RotateCredentials/rotateDatabaseCredentials) to run the ALTER USER inside the DB
container first and only persist the new secret to Vault on success, or (B) keep
the current order but implement a compensating rollback in the same routine that
writes the old secret back to Vault if the container exec returns an error;
alternatively, if you choose not to change behavior, update FEATURE.md to
explicitly call out the non-atomic failure mode and the possibility of Vault/db
drift.
- Line 127: Update the "Credential Rotation" feature blurb in FEATURES.md to
narrow the claim: replace "Sidecars are automatically reloaded" with a precise
statement that only the PgBouncer/pooler sidecar is restarted (and is recreated
rather than hot-reloaded) when present, and that exporter sidecars are not
reloaded as part of the credential rotation flow; keep the rest of the sentence
about regenerating passwords, updating Vault, and executing ALTER USER intact.
In `@docs/guides/rds.md`:
- Around line 49-55: The docs reference a non-existent CLI command; add a new
subcommand "rotate-credentials" to the DB commands in cmd/cloud/db.go (e.g.,
dbCmd) with usage "rotate-credentials <id>" that calls the existing API endpoint
POST /databases/{id}/rotate-credentials via the project’s API client (use the
existing client method for rotating credentials or add one that wraps the HTTP
POST), handle errors and print success output; alternatively, if you prefer not
to implement the CLI, update docs/guides/rds.md to remove the CLI invocation and
instead document calling the API endpoint POST
/databases/{id}/rotate-credentials (show example curl) and fix the numbering by
renumbering "Delete a Database" to section 6.
In `@docs/swagger/docs.go`:
- Around line 7061-7063: The public Database schema currently exposes
"credential_path" — remove this property from the external API schema and ensure
the internal secret path remains only on the persistence/domain model; update
the swagger/docs.go Database schema to drop "credential_path" and instead map
domain.Database to a public DTO (e.g., PublicDatabase or DatabaseDTO) that omits
credential_path, and adjust any serialization/mapping code that converts
domain.Database to the API model so the secret path is not returned to clients.
In `@docs/swagger/swagger.json`:
- Around line 7053-7055: The public domain.Database schema currently includes
the credential_path property which leaks internal Vault locators; remove
credential_path from the public schema (the domain.Database definition) and
instead store it only server-side or in a privileged/internal schema (e.g.,
domain.DatabaseInternal or AdminDatabase) used by admin endpoints and internal
services; update any model/serialization code that references credential_path to
read it from the new internal/config location and ensure public API docs and
generated clients no longer include credential_path.
- Around line 1374-1408: The rotate-credentials POST currently can cause
double-rotation on retries; add support for an Idempotency-Key header and
propagate it through the request handling pipeline to prevent duplicate
rotations: update the OpenAPI operation for "/databases/{id}/rotate-credentials"
to include a new header parameter "Idempotency-Key" (in: header, type: string,
required: false), then wire that header into the upstream handler/service that
implements the rotation logic (the code path handling rotateCredentials /
rotate-credentials) so the service can detect and dedupe requests before
regenerating the password and reloading sidecars; ensure the header is
checked/stored by the rotation service to make the operation idempotent.
In `@docs/swagger/swagger.yaml`:
- Around line 2963-2969: The OpenAPI response for RotateCredentials currently
describes a bare map but the handler returns an httputil.Response wrapper via
httputil.Success(...); update the 200 response schema for RotateCredentials to
match httputil.Response (include top-level properties like code (integer/string
as used), message (string) and data), and make data the object that contains the
original map[string]string shape (e.g., data: type: object,
additionalProperties: { type: string }); ensure the schema names reference the
handler/function RotateCredentials and the wrapper type
httputil.Response/httputil.Success so generated clients will read the message
from data.
In `@go.mod`:
- Around line 133-142: The dependency github.com/hashicorp/vault/api is
currently marked as indirect but is directly imported by your code (the import
of "github.com/hashicorp/vault/api" in the Vault adapter), so run go mod tidy to
update go.mod and move github.com/hashicorp/vault/api v1.22.0 from an indirect
to a direct dependency; ensure the import in the Vault adapter remains unchanged
and re-run your build/tests to verify the module file was updated correctly.
In `@internal/adapters/vault/adapter_test.go`:
- Around line 17-37: The mock HTTP handler inside the httptest.NewServer closure
currently discards errors from json.NewEncoder(w).Encode(...) (in the handler
for "/v1/secret/data/test" and "/v1/sys/health"); change both occurrences to
capture the error and fail the test (e.g., if err :=
json.NewEncoder(w).Encode(...); err != nil { t.Fatalf("json encode failed: %v",
err) }) by closing over the test *testing.T, so encoding failures aren’t
ignored. While here, consider refactoring the handler into a table-driven test
(parameterizing path, method, response body/status) instead of hardcoding each
case to simplify adding scenarios and assertions.
In `@internal/adapters/vault/adapter.go`:
- Around line 36-66: The context passed into StoreSecret, GetSecret,
DeleteSecret and Ping is not used; update the Vault calls to their context-aware
variants so cancellations/timeouts propagate: in StoreSecret replace
a.client.Logical().Write with a.client.Logical().WriteWithContext(ctx, ...), in
GetSecret replace Read with ReadWithContext(ctx, ...), in DeleteSecret replace
Delete with DeleteWithContext(ctx, ...), and in Ping replace
a.client.Sys().Health() with a.client.Sys().HealthWithContext(ctx, ...); keep
existing error wrapping and return behavior unchanged.
- Around line 36-63: GetSecret currently returns the KV v2 envelope and none of
the Vault calls use the provided context; update StoreSecret, GetSecret, and
DeleteSecret to call a.client.Logical().WriteWithContext(ctx,...),
ReadWithContext(ctx,...), and DeleteWithContext(ctx,...) respectively, and
implement KV v2 semantics: when writing wrap the payload as
map[string]interface{}{"data": data} in StoreSecret, and when reading unwrap
secret.Data["data"] (assert/convert to map[string]interface{}) and return that
inner map (error if missing), while DeleteSecret should simply call
DeleteWithContext with ctx and return errors unchanged.
In `@internal/api/setup/router.go`:
- Line 366: The route registration uses a non-existent constant
domain.PermissionLBRead which breaks the build; update the permission constant
reference in the lbGroup.GET call to the correct identifier
domain.PermissionLbRead (i.e. change PermissionLBRead -> PermissionLbRead) so
httputil.Permission(svcs.RBAC, domain.PermissionLbRead) is used; search for
other usages of PermissionLBRead and replace them with PermissionLbRead to
ensure the code compiles.
In `@internal/core/services/database_vault_test.go`:
- Around line 129-156: The test ends without verifying mock expectations, so add
explicit assertions after the call to svc.CreateDatabase to ensure all mocked
interactions occurred: call AssertExpectations(t) on mockVolumeSvc, mockSecrets,
mockCompute, mockRepo, mockEventSvc, and mockAuditSvc (or use a single
mock.AssertExpectations(t) equivalent) to fail the test if any expected mock
call (e.g., CreateVolume, StoreSecret, LaunchInstanceWithOptions, GetInstanceIP,
Create, RecordEvent, Log) was not invoked.
- Around line 98-105: The test RotateCredentials_VaultFailure sets expectations
on mockRepo.GetByID and mockSecrets.StoreSecret but doesn't verify them; after
calling svc.RotateCredentials and asserting the error, call the mock expectation
checks (e.g., mockRepo.AssertExpectations(t) and
mockSecrets.AssertExpectations(t) or AssertCalled/AssertCalled with GetByID and
StoreSecret) to ensure those mocked methods were actually invoked as expected
for the failure path.
In `@internal/core/services/database.go`:
- Around line 619-627: The pooler restart block currently swallows errors (uses
`_ = ...`) which can leave metadata inconsistent; update the logic around
db.PoolerContainerID so you check and handle errors from
s.compute.DeleteInstance, s.compute.GetInstanceIP, s.resolveVpcNetwork and
especially s.provisionPoolerSidecar instead of ignoring them: call
DeleteInstance and return or log on error, capture errors from GetInstanceIP and
resolveVpcNetwork and propagate if they fail, and only call s.repo.Update(ctx,
db) when provisionPoolerSidecar succeeds; if provisionPoolerSidecar returns an
error, surface that error (return it or wrap and return) so callers know the
pooler restart failed and metadata isn’t left inconsistent.
- Around line 580-604: Currently the code stores the new password to Vault
before executing the DB ALTER and even reads back that just-stored value for
MySQL auth, which can leave Vault and the DB out of sync; change the flow so you
first read the existing credential (use s.secrets.GetSecret on db.CredentialPath
or s.getVaultPath(db.ID) to obtain currentPassword), then run the container
Exec/ALTER USER command using that currentPassword (handle both
domain.EnginePostgres and domain.EngineMySQL branches), and only after the Exec
returns success call s.secrets.StoreSecret to persist newPassword; additionally,
if StoreSecret fails after a successful DB change, attempt to roll Vault back by
restoring the old secret (or return a clear error if rollback fails) so the
secret and DB remain consistent.
- Around line 592-593: The Postgres branch (case domain.EnginePostgres)
currently hard-codes the psql user ("psql -U postgres") which can fail when the
container's superuser is configured as db.Username; update the constructed
command (cmd) to use the provisioned admin user variable (db.Username) instead
of the literal "postgres" so the ALTER USER command runs as the configured admin
(keep using db.Username and newPassword in the fmt.Sprintf invocation). Ensure
you update the string in the cmd assignment that builds the psql invocation.
In `@internal/platform/config.go`:
- Line 72: The environment variable name passed to getEnv for the
ObjectStorageNodes config is mistyped as "OBJECT_STORAGE_Nodes" causing
inconsistent casing; update the call that sets ObjectStorageNodes to use the
correct SCREAMING_SNAKE_CASE name "OBJECT_STORAGE_NODES" so it matches other env
vars and the expected key, i.e., change the getEnv argument associated with the
ObjectStorageNodes field to "OBJECT_STORAGE_NODES".
---
Outside diff comments:
In `@docs/guides/rds.md`:
- Around line 57-71: The "Delete a Database" heading is misnumbered as section
5; update the markdown heading for that section title (currently "### 5. Delete
a Database") to "### 6. Delete a Database" so numbering follows "### 5. High
Availability & Replication"; locate the "Delete a Database" header in
docs/guides/rds.md (heading text "Delete a Database") and change its numeric
prefix from 5 to 6.
In `@internal/core/services/database.go`:
- Around line 108-125: After storing the password via s.secrets.StoreSecret
(vaultPath from s.getVaultPath and db.CredentialPath) ensure any early return
paths before performProvisioningRollback remove the orphaned secret: on errors
returned by s.resolveVpcNetwork and s.volumeSvc.CreateVolume (and the analogous
points in the replica and restore flows) call the secrets removal API (e.g.
s.secrets.DeleteSecret(ctx, vaultPath) or the project’s equivalent) before
returning the wrapped error so the Vault state is cleaned up; make sure to
perform this cleanup in every early-exit branch that occurs after successful
StoreSecret and before performProvisioningRollback is invoked.
---
Nitpick comments:
In `@docs/api-reference.md`:
- Around line 597-600: Add a response example for the POST
/databases/:id/rotate-credentials endpoint to match other endpoints: include a
small JSON success body (e.g., { "message": "database credentials rotated
successfully" }) and label it as a response example under the endpoint
documentation so readers see the expected API response format for
rotate-credentials.
In `@docs/swagger/docs.go`:
- Around line 1382-1418: The OpenAPI entry for the POST
"/databases/{id}/rotate-credentials" operation uses an anonymous
map[string]string response which is too vague; change the responses.200.schema
to reference a concrete named response model (e.g. "$ref":
"#/definitions/RotateCredentialsResponse" or the existing standard envelope
type) instead of the anonymous "type: object/additionalProperties" block, and
add or reuse the corresponding definition in the same docs/swagger/docs.go (or
definitions section) so SDKs get a stable schema for the rotated credentials
payload.
In `@internal/adapters/vault/adapter_test.go`:
- Around line 15-69: The TestVaultAdapter test should be converted to a
table-driven subtest loop: define a slice of test cases (with fields like name,
method to call (StoreSecret/GetSecret/DeleteSecret/Ping), input path, payload
where needed, expected status or expected value, and whether an error is
expected) and iterate with for _, tc := range cases { t.Run(tc.name, func(t
*testing.T) { ... }) }, invoking the appropriate adapter methods
(adapter.StoreSecret, adapter.GetSecret, adapter.DeleteSecret, adapter.Ping)
based on tc.name or a function field and performing the corresponding assertions
(assert.NoError/assert.Error, and for GetSecret assert returned data contains
expected inner "data" map and password). Keep the existing mock server setup and
re-use ctx and adapter variables, and ensure the GetSecret case checks nested
"data" as before; structure will make it easy to add failure-path cases later.
In `@internal/api/setup/dependencies.go`:
- Around line 264-268: NewVaultAdapter currently only constructs the Vault
client but doesn't verify connectivity, so network/token issues surface later;
after creating vaultSvc in the initialization flow (where NewVaultAdapter is
called) call vaultSvc.Ping(ctx) and return an error if Ping fails (use the same
wrapped fmt.Errorf pattern) so the process fails fast on Vault
connectivity/token problems; reference NewVaultAdapter and the Ping method on
the returned vaultSvc to locate where to add the check.
In `@internal/handlers/database_handler_test.go`:
- Around line 557-574: Convert TestDatabaseHandlerRotateCredentials into a
table-driven test that iterates cases for success, invalid UUID, and service
error; for each case build the request path, set up the mock service expectation
on svc.RotateCredentials (for success return nil, for service error return a
specific error, and for invalid UUID do not expect a call), call r.ServeHTTP and
assert the expected status code and response body substring (e.g., 200 +
"database credentials rotated successfully" for success, 400 for invalid UUID,
500 + error message for service error). Keep the test using existing helpers
setupDatabaseHandlerTest, r.POST(..., handler.RotateCredentials) and reference
the test name TestDatabaseHandlerRotateCredentials, the handler method
RotateCredentials, and the mock expectation svc.On("RotateCredentials", ...)
when creating cases. Ensure t.Parallel() remains and defer
svc.AssertExpectations(t) still runs.
In `@internal/platform/config.go`:
- Around line 82-84: The Vault configuration allows VaultToken to be empty (set
via getEnv) which can cause runtime failures; update InitServices to validate
Vault settings (VaultAddress, VaultToken, VaultMountPath) when Vault-based
secret management is enabled: check that VaultToken is non-empty and
VaultAddress/MountPath are set, and if the token is missing either log a clear
warning and fail fast or return an error from InitServices so startup aborts
rather than failing later during secret ops; reference the
VaultToken/VaultAddress/VaultMountPath fields and the getEnv usage to locate the
config and add the validation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ba25f82c-6996-4262-bc14-389e77c1e629
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (25)
README.mddocs/FEATURES.mddocs/api-reference.mddocs/database.mddocs/guides/rds.mddocs/swagger/docs.godocs/swagger/swagger.jsondocs/swagger/swagger.yamlgo.modinternal/adapters/vault/adapter.gointernal/adapters/vault/adapter_test.gointernal/api/setup/dependencies.gointernal/api/setup/router.gointernal/core/domain/database.gointernal/core/ports/database.gointernal/core/ports/secrets_manager.gointernal/core/services/database.gointernal/core/services/database_vault_test.gointernal/handlers/database_handler.gointernal/handlers/database_handler_test.gointernal/platform/config.gointernal/repositories/postgres/database_repo.gointernal/repositories/postgres/database_repo_unit_test.gointernal/repositories/postgres/migrations/099_add_database_credential_path.up.sqltests/database_e2e_test.go
| "/databases/{id}/rotate-credentials": { | ||
| "post": { | ||
| "security": [ | ||
| { | ||
| "APIKeyAuth": [] | ||
| } | ||
| ], | ||
| "description": "Regenerates the database password, updates it in the database and secrets manager", | ||
| "produces": [ | ||
| "application/json" | ||
| ], | ||
| "tags": [ | ||
| "databases" | ||
| ], | ||
| "summary": "Rotate database credentials", | ||
| "parameters": [ | ||
| { | ||
| "type": "string", | ||
| "description": "Database ID", | ||
| "name": "id", | ||
| "in": "path", | ||
| "required": true | ||
| } | ||
| ], | ||
| "responses": { | ||
| "200": { | ||
| "description": "OK", | ||
| "schema": { | ||
| "type": "object", | ||
| "additionalProperties": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Add idempotency to credential rotation.
This POST rotates the password and reloads sidecars, so a retry after a timeout can rotate again and immediately invalidate the just-issued credential. Expose an Idempotency-Key on this endpoint and wire it through the upstream handler/service before regenerating Swagger.
Suggested contract change
"parameters": [
{
"type": "string",
"description": "Database ID",
"name": "id",
"in": "path",
"required": true
+ },
+ {
+ "type": "string",
+ "description": "Idempotency key",
+ "name": "Idempotency-Key",
+ "in": "header"
}
],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "/databases/{id}/rotate-credentials": { | |
| "post": { | |
| "security": [ | |
| { | |
| "APIKeyAuth": [] | |
| } | |
| ], | |
| "description": "Regenerates the database password, updates it in the database and secrets manager", | |
| "produces": [ | |
| "application/json" | |
| ], | |
| "tags": [ | |
| "databases" | |
| ], | |
| "summary": "Rotate database credentials", | |
| "parameters": [ | |
| { | |
| "type": "string", | |
| "description": "Database ID", | |
| "name": "id", | |
| "in": "path", | |
| "required": true | |
| } | |
| ], | |
| "responses": { | |
| "200": { | |
| "description": "OK", | |
| "schema": { | |
| "type": "object", | |
| "additionalProperties": { | |
| "type": "string" | |
| } | |
| } | |
| } | |
| } | |
| "/databases/{id}/rotate-credentials": { | |
| "post": { | |
| "security": [ | |
| { | |
| "APIKeyAuth": [] | |
| } | |
| ], | |
| "description": "Regenerates the database password, updates it in the database and secrets manager", | |
| "produces": [ | |
| "application/json" | |
| ], | |
| "tags": [ | |
| "databases" | |
| ], | |
| "summary": "Rotate database credentials", | |
| "parameters": [ | |
| { | |
| "type": "string", | |
| "description": "Database ID", | |
| "name": "id", | |
| "in": "path", | |
| "required": true | |
| }, | |
| { | |
| "type": "string", | |
| "description": "Idempotency key", | |
| "name": "Idempotency-Key", | |
| "in": "header" | |
| } | |
| ], | |
| "responses": { | |
| "200": { | |
| "description": "OK", | |
| "schema": { | |
| "type": "object", | |
| "additionalProperties": { | |
| "type": "string" | |
| } | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/swagger/swagger.json` around lines 1374 - 1408, The rotate-credentials
POST currently can cause double-rotation on retries; add support for an
Idempotency-Key header and propagate it through the request handling pipeline to
prevent duplicate rotations: update the OpenAPI operation for
"/databases/{id}/rotate-credentials" to include a new header parameter
"Idempotency-Key" (in: header, type: string, required: false), then wire that
header into the upstream handler/service that implements the rotation logic (the
code path handling rotateCredentials / rotate-credentials) so the service can
detect and dedupe requests before regenerating the password and reloading
sidecars; ensure the header is checked/stored by the rotation service to make
the operation idempotent.
There was a problem hiding this comment.
Pull request overview
This PR integrates HashiCorp Vault for secure managed database credential management, replaces plain-text password storage with Vault KV references, and adds a new POST /databases/:id/rotate-credentials endpoint for zero-downtime credential rotation.
Changes:
- New
VaultAdapterimplementing theSecretsManagerinterface;DatabaseServiceupdated to store/retrieve/delete credentials from Vault during create, replica, modify, restore, and delete operations - New
RotateCredentialsservice method and corresponding HTTP handler/route that generates a new password, updates Vault, runsALTER USERin the container, and restarts PgBouncer sidecars - Schema migration adding
credential_pathcolumn todatabasestable; repository queries updated accordingly; tests extended for new field
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
internal/adapters/vault/adapter.go |
New Vault adapter implementing SecretsManager |
internal/adapters/vault/adapter_test.go |
Tests for Vault adapter using mock HTTP server |
internal/core/ports/secrets_manager.go |
New SecretsManager interface |
internal/core/ports/database.go |
Adds RotateCredentials to DatabaseService interface |
internal/core/domain/database.go |
Adds CredentialPath field to Database struct |
internal/core/services/database.go |
Core Vault integration and rotation logic |
internal/core/services/database_vault_test.go |
Tests for new Vault-integrated service methods |
internal/handlers/database_handler.go |
New RotateCredentials handler |
internal/handlers/database_handler_test.go |
Handler test for credential rotation |
internal/api/setup/router.go |
Registers rotate-credentials route; incidental PermissionLBRead change |
internal/api/setup/dependencies.go |
Wires VaultAdapter into DatabaseService |
internal/platform/config.go |
Adds Vault config fields; accidentally changes OBJECT_STORAGE_NODES key |
internal/repositories/postgres/database_repo.go |
SQL queries updated with credential_path column |
internal/repositories/postgres/database_repo_unit_test.go |
Repo tests updated for new field |
internal/repositories/postgres/migrations/099_add_database_credential_path.up.sql |
Migration adding credential_path column |
tests/database_e2e_test.go |
E2E test for credential rotation |
go.mod / go.sum |
Adds hashicorp/vault/api dependency |
docs/* |
Documentation updates for Vault integration and credential rotation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| VaultAddress: getEnv("VAULT_ADDR", "http://localhost:8200"), | ||
| VaultToken: getEnv("VAULT_TOKEN", ""), |
There was a problem hiding this comment.
The VaultToken is stored as a plain string in the Config struct and loaded directly from the VAULT_TOKEN environment variable. The default is an empty string, meaning if VAULT_TOKEN is not set, the service will initialize a Vault client with no token and silently proceed. All subsequent Vault operations will fail at runtime. There is no startup validation that a non-empty token is configured (e.g., by returning an error from NewConfig or NewVaultAdapter when the token is empty). At minimum, a warning should be logged if the token is empty.
| // Store password in Vault | ||
| vaultPath := s.getVaultPath(db.ID) | ||
| if err := s.secrets.StoreSecret(ctx, vaultPath, map[string]interface{}{"password": password}); err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to store database credentials in vault", err) | ||
| } | ||
| db.CredentialPath = vaultPath |
There was a problem hiding this comment.
When LaunchInstanceWithOptions fails after the Vault secret has been stored, the code calls cleanupVolumeQuietly but does not clean up the Vault secret stored just before. Only performProvisioningRollback (which handles failure cases where db.CredentialPath is set) cleans up Vault, but on the LaunchInstanceWithOptions failure path, rollback is bypassed. This leaves an orphaned Vault secret if container launch fails after Vault storage succeeds. The Vault secret should be deleted in this error path too.
| lbGroup.POST("", httputil.Permission(svcs.RBAC, domain.PermissionLbCreate), handlers.LB.Create) | ||
| lbGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionLbRead), handlers.LB.List) | ||
| lbGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionLbRead), handlers.LB.Get) | ||
| lbGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionLBRead), handlers.LB.Get) |
There was a problem hiding this comment.
The domain.PermissionLBRead constant referenced here does not exist. The correct constant defined in internal/core/domain/rbac.go is PermissionLbRead (lowercase "b"). This will cause a compilation error, breaking the entire application.
| lbGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionLBRead), handlers.LB.Get) | |
| lbGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionLbRead), handlers.LB.Get) |
| ObjectStorageMode: getEnv("OBJECT_STORAGE_MODE", "local"), | ||
|
|
||
| ObjectStorageNodes: getEnv("OBJECT_STORAGE_NODES", ""), | ||
| ObjectStorageNodes: getEnv("OBJECT_STORAGE_Nodes", ""), |
There was a problem hiding this comment.
The environment variable key has been changed from OBJECT_STORAGE_NODES (uppercase) to OBJECT_STORAGE_Nodes (mixed case). Linux environment variables are case-sensitive, so any deployment that sets OBJECT_STORAGE_NODES will find the value empty, breaking the object storage node configuration. This was presumably an accidental change introduced alongside the Vault config additions.
| ObjectStorageNodes: getEnv("OBJECT_STORAGE_Nodes", ""), | |
| ObjectStorageNodes: getEnv("OBJECT_STORAGE_NODES", ""), |
| case domain.EnginePostgres: | ||
| cmd = []string{"psql", "-U", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", db.Username, newPassword)} | ||
| case domain.EngineMySQL: | ||
| // For MySQL rotation, we need the root password to run the ALTER USER command. | ||
| // Since we don't have it easily available, we'll try using the current user's password if it's stored. | ||
| currentPassword := db.Password | ||
| if db.CredentialPath != "" { | ||
| secret, _ := s.secrets.GetSecret(ctx, db.CredentialPath) | ||
| if p, ok := secret["password"].(string); ok { | ||
| currentPassword = p | ||
| } | ||
| } | ||
| cmd = []string{"mysql", "-u", "root", "-p" + currentPassword, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", db.Username, newPassword)} |
There was a problem hiding this comment.
The password generated by util.GenerateRandomPassword can contain single-quote characters ('), as the character set includes !@#$%^&*()-_=+ (see pkg/util/password.go:10). Single quotes are used as delimiters in both the Postgres and MySQL ALTER USER SQL strings:
- Postgres:
ALTER USER %s WITH PASSWORD '%s' - MySQL:
ALTER USER '%s'@'%%' IDENTIFIED BY '%s'
A generated password containing a single quote would break the SQL syntax, causing the rotation command to fail or, in the worst case, allowing SQL injection within the container exec context. Parameterized queries or proper SQL escaping should be used instead. For the MySQL case, note that mysql -e does not support parameterized queries, so the password should be escaped or the MySQL client should be invoked differently (e.g., via stdin or a config file).
| case domain.EngineMySQL: | ||
| // For MySQL rotation, we need the root password to run the ALTER USER command. | ||
| // Since we don't have it easily available, we'll try using the current user's password if it's stored. | ||
| currentPassword := db.Password | ||
| if db.CredentialPath != "" { | ||
| secret, _ := s.secrets.GetSecret(ctx, db.CredentialPath) | ||
| if p, ok := secret["password"].(string); ok { | ||
| currentPassword = p | ||
| } | ||
| } | ||
| cmd = []string{"mysql", "-u", "root", "-p" + currentPassword, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", db.Username, newPassword)} |
There was a problem hiding this comment.
In the MySQL branch of RotateCredentials, the code tries to retrieve the current (old) password from Vault at line 599 to authenticate to MySQL. However, at this point the new password has already been saved to Vault at line 585. Therefore, GetSecret will return the newly stored password, not the old one. The MySQL ALTER USER command will then be executed using the new password as authentication credentials, but the database still has the old password, causing the authentication to fail and leaving the system in an inconsistent state (Vault has the new password but the database still has the old one).
The old password should be fetched from Vault before storing the new one, or the legacy db.Password field should be used as the fallback for the auth credential (recognizing the limitations noted in the code comment). Additionally, the error from GetSecret is silently ignored with _, _ which can lead to a nil pointer panic when dereferencing secret["password"] if secret is nil.
| // Store password in Vault | ||
| vaultPath := s.getVaultPath(db.ID) | ||
| if err := s.secrets.StoreSecret(ctx, vaultPath, map[string]interface{}{"password": password}); err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to store database credentials in vault", err) | ||
| } | ||
| db.CredentialPath = vaultPath |
There was a problem hiding this comment.
The existing TestDatabaseServiceUnitExtended in internal/core/services/database_unit_test.go creates a DatabaseService without providing a Secrets mock (line 67-77). Since CreateDatabase, CreateReplica, RestoreDatabase, DeleteDatabase, GetConnectionString, and ModifyDatabase now unconditionally call methods on s.secrets, this will cause a nil pointer panic at runtime when those tests run. The Secrets dependency must be added to the service construction in this test file to keep the existing tests passing.
| PoolingEnabled bool `json:"pooling_enabled"` | ||
| PoolingPort int `json:"pooling_port,omitempty"` | ||
| PoolerContainerID string `json:"-"` | ||
| CredentialPath string `json:"credential_path,omitempty"` |
There was a problem hiding this comment.
The CredentialPath field is serialized to the JSON API response (json:"credential_path,omitempty"). This exposes internal Vault path structure (e.g., secret/data/thecloud/rds/<uuid>/credentials) to API consumers, which leaks infrastructure details. Similar sensitive internal fields like ExporterContainerID, PoolerContainerID, and Password are tagged with json:"-" to prevent serialization. The CredentialPath field should use the same json:"-" tag.
| CredentialPath string `json:"credential_path,omitempty"` | |
| CredentialPath string `json:"-"` |
| func (a *VaultAdapter) StoreSecret(ctx context.Context, path string, data map[string]interface{}) error { | ||
| _, err := a.client.Logical().Write(path, data) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to write secret to vault at %s: %w", path, err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func (a *VaultAdapter) GetSecret(ctx context.Context, path string) (map[string]interface{}, error) { | ||
| secret, err := a.client.Logical().Read(path) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to read secret from vault at %s: %w", path, err) | ||
| } | ||
|
|
||
| if secret == nil || secret.Data == nil { | ||
| return nil, fmt.Errorf("secret not found at path: %s", path) | ||
| } | ||
|
|
||
| return secret.Data, nil | ||
| } | ||
|
|
||
| func (a *VaultAdapter) DeleteSecret(ctx context.Context, path string) error { | ||
| _, err := a.client.Logical().Delete(path) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to delete secret from vault at %s: %w", path, err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func (a *VaultAdapter) Ping(ctx context.Context) error { | ||
| _, err := a.client.Sys().Health() | ||
| if err != nil { | ||
| return fmt.Errorf("vault health check failed: %w", err) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
All methods in the VaultAdapter (StoreSecret, GetSecret, DeleteSecret, Ping) accept a context.Context parameter but do not pass it to the underlying Vault API client calls. The hashicorp/vault/api package provides context-aware variants for all operations (e.g., a.client.Logical().WriteWithContext(ctx, path, data), ReadWithContext, DeleteWithContext, a.client.Sys().HealthWithContext(ctx)). Without context propagation, request cancellations, deadlines, and timeouts set by callers will be silently ignored, leading to goroutine leaks or unresponsive requests.
| if err := s.secrets.StoreSecret(ctx, vaultPath, map[string]interface{}{"password": newPassword}); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to store new credentials in vault", err) | ||
| } | ||
|
|
||
| // 2. Execute ALTER USER in container | ||
| var cmd []string | ||
| switch db.Engine { | ||
| case domain.EnginePostgres: | ||
| cmd = []string{"psql", "-U", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", db.Username, newPassword)} | ||
| case domain.EngineMySQL: | ||
| // For MySQL rotation, we need the root password to run the ALTER USER command. | ||
| // Since we don't have it easily available, we'll try using the current user's password if it's stored. | ||
| currentPassword := db.Password | ||
| if db.CredentialPath != "" { | ||
| secret, _ := s.secrets.GetSecret(ctx, db.CredentialPath) | ||
| if p, ok := secret["password"].(string); ok { | ||
| currentPassword = p | ||
| } | ||
| } | ||
| cmd = []string{"mysql", "-u", "root", "-p" + currentPassword, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", db.Username, newPassword)} | ||
| default: | ||
| return errors.New(errors.Internal, "unsupported engine for credential rotation") | ||
| } | ||
|
|
||
| if _, err := s.compute.Exec(ctx, db.ContainerID, cmd); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to execute password rotation in container", err) | ||
| } |
There was a problem hiding this comment.
The RotateCredentials function does not rollback the Vault secret if the Exec command fails (line 609). The operation order is: (1) store new password in Vault, (2) execute ALTER USER in container, (3) update DB record. If step 2 fails, Vault now holds the new password but the database engine still has the old one. This creates an inconsistency: subsequent reads of the credentials from Vault will return the new password, but it will be rejected by the database. A rollback to the old password in Vault should be performed if the Exec call fails, or the old password should be fetched and restored.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 27 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (3)
internal/api/setup/router.go:366
- The
PermissionLBReadconstant does not exist ininternal/core/domain/rbac.go. The correct constant name isPermissionLbRead(lowercase 'b'). This will cause a compilation failure. The original code usedPermissionLbRead(the diff confirms this is a change fromPermissionLbReadtoPermissionLBRead), so this is a regression introduced by this PR.
lbGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionLbRead), handlers.LB.Get)
internal/core/services/database.go:140
- The Vault secret is stored at line 110 before the VPC network resolution and volume creation. If
resolveVpcNetwork(line 117-119) orCreateVolume(line 122-124) fails, the function returns early without cleaning up the already-written Vault secret, leaving an orphaned secret. Similarly, ifLaunchInstanceWithOptionsfails (line 137-140),cleanupVolumeQuietlyis called but the Vault secret is not deleted.
The performProvisioningRollback does handle Vault cleanup, but it's not called in these early failure paths. The Vault secret creation should either happen after the volume/container are ready, or the early return paths should also call secrets.DeleteSecret(ctx, vaultPath) to clean up.
// Store password in Vault
vaultPath := s.getVaultPath(db.ID)
if err := s.secrets.StoreSecret(ctx, vaultPath, map[string]interface{}{"password": password}); err != nil {
return nil, errors.Wrap(errors.Internal, "failed to store database credentials in vault", err)
}
db.CredentialPath = vaultPath
imageName, env, defaultPort := s.getEngineConfig(dbEngine, req.Version, username, password, req.Name, db.Role, "")
networkID, err := s.resolveVpcNetwork(ctx, req.VpcID)
if err != nil {
return nil, err
}
vol, err := s.volumeSvc.CreateVolume(ctx, fmt.Sprintf("db-vol-%s", db.ID.String()[:8]), req.AllocatedStorage)
if err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create persistent volume", err)
}
containerID, allocatedPorts, err := s.compute.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{
Name: fmt.Sprintf("cloud-db-%s-%s", req.Name, db.ID.String()[:8]),
ImageName: imageName,
Ports: []string{"0:" + defaultPort},
NetworkID: networkID,
VolumeBinds: []string{fmt.Sprintf("%s:%s", s.getBackendVolName(vol), s.getMountPath(dbEngine))},
Env: env,
Cmd: s.buildEngineCmd(dbEngine, req.Parameters),
})
if err != nil {
s.cleanupVolumeQuietly(ctx, vol.ID.String())
return nil, errors.Wrap(errors.Internal, "failed to launch database container", err)
}
internal/core/services/database.go:544
- Same orphaned Vault secret issue as in
CreateDatabase. The Vault secret is stored at line 516 before the snapshot restore and container launch. IfRestoreSnapshotfails (line 521-523) orLaunchInstanceWithOptionsfails (line 542-544), the function returns without cleaning up the already-written Vault secret. The failure path on line 543 callscleanupVolumeQuietlybut does not callsecrets.DeleteSecret, so the orphaned Vault secret is never removed.
// Store password in Vault
vaultPath := s.getVaultPath(db.ID)
if err := s.secrets.StoreSecret(ctx, vaultPath, map[string]interface{}{"password": password}); err != nil {
return nil, errors.Wrap(errors.Internal, "failed to store restored database credentials in vault", err)
}
db.CredentialPath = vaultPath
vol, err := s.snapshotSvc.RestoreSnapshot(ctx, req.SnapshotID, fmt.Sprintf("db-vol-%s", db.ID.String()[:8]))
if err != nil {
return nil, err
}
// Re-verify volume size matches DB record (snapshot restore might resize)
if vol.SizeGB > db.AllocatedStorage {
db.AllocatedStorage = vol.SizeGB
}
networkID, _ := s.resolveVpcNetwork(ctx, req.VpcID)
imageName, env, defaultPort := s.getEngineConfig(dbEngine, req.Version, username, password, req.NewName, domain.RolePrimary, "")
containerID, allocatedPorts, err := s.compute.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{
Name: fmt.Sprintf("cloud-db-%s-%s", req.NewName, db.ID.String()[:8]),
ImageName: imageName,
Ports: []string{"0:" + defaultPort},
NetworkID: networkID,
VolumeBinds: []string{fmt.Sprintf("%s:%s", s.getBackendVolName(vol), s.getMountPath(dbEngine))},
Env: env,
})
if err != nil {
s.cleanupVolumeQuietly(ctx, vol.ID.String())
return nil, err
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (a *Adapter) StoreSecret(_ context.Context, path string, data map[string]interface{}) error { | ||
| _, err := a.client.Logical().Write(path, data) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to store secret in vault: %w", err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // GetSecret retrieves a secret from the specified path. | ||
| func (a *Adapter) GetSecret(_ context.Context, path string) (map[string]interface{}, error) { | ||
| secret, err := a.client.Logical().Read(path) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to read secret from vault: %w", err) | ||
| } | ||
| if secret == nil || secret.Data == nil { | ||
| return nil, fmt.Errorf("secret not found at path: %s", path) | ||
| } | ||
| return secret.Data, nil |
There was a problem hiding this comment.
The StoreSecret method writes data directly to the Vault KV v2 path without wrapping it in the required {"data": {...}} envelope. When using the Logical().Write() API with a KV v2 path (e.g., secret/data/thecloud/rds/...), the payload must be {"data": {"password": "..."}}. Without this wrapping, the password won't be stored correctly.
Additionally, GetSecret returns secret.Data from a KV v2 read, which will be structured as {"data": {"password": "..."}, "metadata": {...}}. All callers in database.go access the password as secret["password"].(string) (e.g., lines 199, 307, 446, 600), but the correct access for KV v2 is secret["data"].(map[string]interface{})["password"].(string).
The adapter test (adapter_test.go) confirms the KV v2 nesting by accessing data["data"].(map[string]interface{})["password"], which means the callers in database.go are inconsistent with the actual Vault response structure.
To fix: either (1) use the Vault SDK's KV v2 helper (client.KVv2(mount).Put/Get) which handles wrapping/unwrapping automatically, or (2) manually wrap the data in StoreSecret and unwrap in GetSecret.
| secret, _ := s.secrets.GetSecret(ctx, db.CredentialPath) | ||
| if p, ok := secret["password"].(string); ok { | ||
| currentPassword = p |
There was a problem hiding this comment.
The GetSecret call on line 599 ignores both the error and the nil-dereference risk. If GetSecret returns an error, secret will be nil, and secret["password"] on line 600 would panic with a nil map dereference. The error should be checked before accessing secret. Consider using the same pattern as line 196-201 which properly handles the nil/error case.
| secret, _ := s.secrets.GetSecret(ctx, db.CredentialPath) | |
| if p, ok := secret["password"].(string); ok { | |
| currentPassword = p | |
| secret, err := s.secrets.GetSecret(ctx, db.CredentialPath) | |
| if err != nil { | |
| slog.WarnContext(ctx, "failed to retrieve current password from vault; falling back to stored password", | |
| "error", err, | |
| "credentialPath", db.CredentialPath, | |
| ) | |
| } else if secret != nil { | |
| if p, ok := secret["password"].(string); ok && p != "" { | |
| currentPassword = p | |
| } |
| github.com/hashicorp/go-secure-stdlib/strutil v0.1.2 // indirect | ||
| github.com/hashicorp/go-sockaddr v1.0.7 // indirect | ||
| github.com/hashicorp/hcl v1.0.1-vault-7 // indirect | ||
| github.com/hashicorp/vault/api v1.22.0 // indirect |
There was a problem hiding this comment.
The github.com/hashicorp/vault/api v1.22.0 dependency is listed as // indirect in go.mod, but it is directly imported and used in internal/adapters/vault/adapter.go. Direct dependencies should not be marked as // indirect. Running go mod tidy would move it to the direct dependencies section.
| "net/http" | ||
| "net/http/httptest" | ||
| "testing" | ||
|
|
||
| "log/slog" |
There was a problem hiding this comment.
The "log/slog" import is separated from the other standard library imports ("context", "encoding/json", "net/http", etc.) and placed in the same block as the third-party imports. In idiomatic Go, all standard library imports should be in the same group. Running goimports would fix this automatically.
| "net/http" | |
| "net/http/httptest" | |
| "testing" | |
| "log/slog" | |
| "log/slog" | |
| "net/http" | |
| "net/http/httptest" | |
| "testing" |
| cloud db rotate-credentials <id> | ||
| ``` | ||
|
|
||
| ### 5. High Availability & Replication |
There was a problem hiding this comment.
The env var name MYSQL_MASTER_HOST was changed to PRIMARY_HOST in getEngineConfig (database.go line 839), but the documentation in docs/guides/rds.md still refers to MYSQL_MASTER_HOST. Note that the documentation update at line 81 (saying "MYSQL_MASTER_HOST for MySQL") is now out of date and should be updated to say PRIMARY_HOST for both engines.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 33 changed files in this pull request and generated 10 comments.
Comments suppressed due to low confidence (2)
internal/repositories/postgres/database_repo.go:72
ListReplicasselectstenant_idbut doesn’t constrain by it. With the repository moving to tenant scoping elsewhere, this method can return replicas across tenants if aprimary_idis known. Filter bytenant_idfrom context (or make this an explicitly internal/admin method) to keep access boundaries consistent.
func (r *DatabaseRepository) ListReplicas(ctx context.Context, primaryID uuid.UUID) ([]*domain.Database, error) {
query := `
SELECT id, user_id, tenant_id, name, engine, version, status, role, primary_id, vpc_id, COALESCE(container_id, ''), port, username, password, created_at, updated_at, allocated_storage, parameters, metrics_enabled, COALESCE(metrics_port, 0), COALESCE(exporter_container_id, ''), pooling_enabled, COALESCE(pooling_port, 0), COALESCE(pooler_container_id, ''), COALESCE(credential_path, '')
FROM databases
WHERE primary_id = $1
ORDER BY created_at DESC
`
internal/core/services/database.go:443
PromoteToPrimarynow enforces RBAC usingUserIDFromContext(ctx). Internal/background workflows (e.g., the database failover worker) typically run without an end-user context, so this is likely to start failing with “failed to get user” / forbidden. Consider using a dedicated system principal in context for workers, or providing an internal promotion path that bypasses end-user RBAC while still being restricted to trusted callers.
func (s *DatabaseService) PromoteToPrimary(ctx context.Context, id uuid.UUID) error {
userID := appcontext.UserIDFromContext(ctx)
if err := s.rbacSvc.Authorize(ctx, userID, domain.PermissionDBUpdate, id.String()); err != nil {
return err
}
db, err := s.repo.GetByID(ctx, id)
if err != nil {
return err
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| query := ` | ||
| INSERT INTO databases (id, user_id, name, engine, version, status, role, primary_id, vpc_id, container_id, port, username, password, created_at, updated_at, allocated_storage, parameters, metrics_enabled, metrics_port, exporter_container_id, pooling_enabled, pooling_port, pooler_container_id) | ||
| VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23) | ||
| INSERT INTO databases (id, user_id, tenant_id, name, engine, version, status, role, primary_id, vpc_id, container_id, port, username, password, created_at, updated_at, allocated_storage, parameters, metrics_enabled, metrics_port, exporter_container_id, pooling_enabled, pooling_port, pooler_container_id, credential_path) | ||
| VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25) | ||
| ` | ||
| _, err := r.db.Exec(ctx, query, | ||
| db.ID, db.UserID, db.Name, db.Engine, db.Version, db.Status, db.Role, db.PrimaryID, db.VpcID, db.ContainerID, db.Port, db.Username, db.Password, db.CreatedAt, db.UpdatedAt, db.AllocatedStorage, db.Parameters, db.MetricsEnabled, db.MetricsPort, db.ExporterContainerID, db.PoolingEnabled, db.PoolingPort, db.PoolerContainerID, | ||
| db.ID, db.UserID, db.TenantID, db.Name, db.Engine, db.Version, db.Status, db.Role, db.PrimaryID, db.VpcID, db.ContainerID, db.Port, db.Username, db.Password, db.CreatedAt, db.UpdatedAt, db.AllocatedStorage, db.Parameters, db.MetricsEnabled, db.MetricsPort, db.ExporterContainerID, db.PoolingEnabled, db.PoolingPort, db.PoolerContainerID, db.CredentialPath, | ||
| ) |
There was a problem hiding this comment.
tenant_id is now required in INSERT/queries, but the current migrations for databases don’t add a tenant_id column (the original create-table migration only has user_id). This will cause runtime SQL errors on create/read/update/delete. Add a migration to introduce tenant_id (with appropriate FK/index/backfill) or keep repository filtering on user_id until the schema is updated.
| func (r *DatabaseRepository) List(ctx context.Context) ([]*domain.Database, error) { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
| tenantID := appcontext.TenantIDFromContext(ctx) | ||
| query := ` | ||
| SELECT id, user_id, name, engine, version, status, role, primary_id, vpc_id, COALESCE(container_id, ''), port, username, password, created_at, updated_at, allocated_storage, parameters, metrics_enabled, COALESCE(metrics_port, 0), COALESCE(exporter_container_id, ''), pooling_enabled, COALESCE(pooling_port, 0), COALESCE(pooler_container_id, '') | ||
| SELECT id, user_id, tenant_id, name, engine, version, status, role, primary_id, vpc_id, COALESCE(container_id, ''), port, username, password, created_at, updated_at, allocated_storage, parameters, metrics_enabled, COALESCE(metrics_port, 0), COALESCE(exporter_container_id, ''), pooling_enabled, COALESCE(pooling_port, 0), COALESCE(pooler_container_id, ''), COALESCE(credential_path, '') | ||
| FROM databases | ||
| WHERE user_id = $1 | ||
| WHERE tenant_id = $1 | ||
| ORDER BY created_at DESC | ||
| ` | ||
| rows, err := r.db.Query(ctx, query, userID) | ||
| rows, err := r.db.Query(ctx, query, tenantID) |
There was a problem hiding this comment.
List() is now tenant-scoped via TenantIDFromContext(ctx). Background components like DatabaseFailoverWorker call repo.List(ctx) with a plain context that does not include a tenant id, which will effectively return no databases and disable failover checks. Consider adding a non-tenant-scoped/internal listing method for workers, or ensure worker contexts carry the correct tenant (and adjust RBAC/service calls accordingly).
| ALTER TABLE databases ADD COLUMN credential_path VARCHAR(255); | ||
|
|
||
| -- +goose Down | ||
| ALTER TABLE databases DROP COLUMN credential_path; |
There was a problem hiding this comment.
This migration adds credential_path but does not add the tenant_id column that the updated DatabaseRepository now reads/writes. Without a tenant_id migration (plus backfill/index), database CRUD will fail at runtime. Consider adding the tenant_id migration here (or as a separate migration in the same PR) before switching repository queries to tenant scoping.
| ALTER TABLE databases ADD COLUMN credential_path VARCHAR(255); | |
| -- +goose Down | |
| ALTER TABLE databases DROP COLUMN credential_path; | |
| ALTER TABLE databases | |
| ADD COLUMN credential_path VARCHAR(255), | |
| ADD COLUMN tenant_id VARCHAR(255); | |
| CREATE INDEX idx_databases_tenant_id ON databases (tenant_id); | |
| -- +goose Down | |
| DROP INDEX idx_databases_tenant_id; | |
| ALTER TABLE databases | |
| DROP COLUMN tenant_id, | |
| DROP COLUMN credential_path; |
| @@ -0,0 +1,5 @@ | |||
| -- +goose Up | |||
| ALTER TABLE databases ADD COLUMN credential_path VARCHAR(255); | |||
There was a problem hiding this comment.
credential_path is constrained to VARCHAR(255). Vault KV v2 paths can exceed 255 characters in practice (mount + nested paths + UUIDs + prefixes), and truncation would be disastrous. Prefer TEXT (as used for other path-like fields) or increase the limit and enforce a length check in code.
| ALTER TABLE databases ADD COLUMN credential_path VARCHAR(255); | |
| ALTER TABLE databases ADD COLUMN credential_path TEXT; |
| // NewDatabaseService constructs a DatabaseService with its dependencies. | ||
| func NewDatabaseService(params DatabaseServiceParams) *DatabaseService { | ||
| return &DatabaseService{ | ||
| repo: params.Repo, | ||
| compute: params.Compute, | ||
| vpcRepo: params.VpcRepo, | ||
| volumeSvc: params.VolumeSvc, | ||
| snapshotSvc: params.SnapshotSvc, | ||
| snapshotRepo: params.SnapshotRepo, | ||
| eventSvc: params.EventSvc, | ||
| auditSvc: params.AuditSvc, | ||
| logger: params.Logger, | ||
| repo: params.Repo, | ||
| rbacSvc: params.RBAC, | ||
| compute: params.Compute, | ||
| vpcRepo: params.VpcRepo, | ||
| volumeSvc: params.VolumeSvc, | ||
| snapshotSvc: params.SnapshotSvc, | ||
| snapshotRepo: params.SnapshotRepo, | ||
| eventSvc: params.EventSvc, | ||
| auditSvc: params.AuditSvc, | ||
| secrets: params.Secrets, | ||
| logger: params.Logger, | ||
| vaultMountPath: params.VaultMountPath, | ||
| rotationCache: make(map[string]bool), | ||
| } | ||
| } | ||
|
|
||
| func (s *DatabaseService) CreateDatabase(ctx context.Context, req ports.CreateDatabaseRequest) (*domain.Database, error) { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
| tenantID := appcontext.TenantIDFromContext(ctx) | ||
| if err := s.rbacSvc.Authorize(ctx, userID, domain.PermissionDBCreate, "*"); err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
DatabaseService now unconditionally calls s.rbacSvc.Authorize(...) in multiple methods, but NewDatabaseService does not guard against RBAC being nil. This can panic in benchmarks/integration setups that construct the service without RBAC. Consider requiring RBAC (fail fast during construction) or providing a no-op RBAC implementation for non-API contexts/tests.
| func (s *DatabaseService) RotateCredentials(ctx context.Context, id uuid.UUID, idempotencyKey string) error { | ||
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } | ||
| s.rotationMu.Unlock() | ||
| } | ||
| dbEngine := domain.DatabaseEngine(req.Engine) | ||
| password, _ := util.GenerateRandomPassword(16) | ||
| username := s.getDefaultUsername(dbEngine) | ||
| db := s.initialDatabaseRecord(userID, req.NewName, dbEngine, req.Version, username, password, req.VpcID) | ||
|
|
||
| // Preserving user requested storage if larger than snapshot | ||
| db.AllocatedStorage = req.AllocatedStorage | ||
| if snap.SizeGB > db.AllocatedStorage { | ||
| db.AllocatedStorage = snap.SizeGB | ||
|
|
||
| db, err := s.repo.GetByID(ctx, id) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
RotateCredentials is missing an RBAC authorization check, unlike other service methods (Create/Modify/Get/Delete/etc.). Even though the HTTP route has RBAC middleware, this method can be called from non-HTTP contexts and would bypass authorization. Add Authorize(ctx, userID, PermissionDBUpdate, id.String()) (or similar) at the start of this method.
| // Simple in-memory idempotency cache for rotation | ||
| rotationCache map[string]bool | ||
| rotationMu sync.Mutex | ||
| } |
There was a problem hiding this comment.
The in-memory idempotency cache is (1) unbounded (no TTL/eviction), (2) keyed only by Idempotency-Key (so the same key across different DBs will incorrectly short-circuit rotation), and (3) not concurrency-safe for duplicate requests (two concurrent requests with the same key can both pass the initial check and rotate twice). Consider using a TTL cache keyed by {dbID}:{idempotencyKey} and a singleflight/lock-per-key approach, or persist idempotency in the database.
| // 2. Update in Vault ONLY after DB success | ||
| vaultPath := db.CredentialPath | ||
| if vaultPath == "" { | ||
| vaultPath = s.getVaultPath(db.ID) | ||
| } | ||
| if err := s.secrets.StoreSecret(ctx, vaultPath, map[string]interface{}{"password": newPassword}); err != nil { | ||
| // Note: At this point DB password is changed but Vault update failed. | ||
| // The system is out of sync. Fallback password in DB record remains old. | ||
| return errors.Wrap(errors.Internal, "database password updated but failed to store in vault", err) | ||
| } | ||
|
|
||
| // 3. Update DB record if needed (metadata or path) | ||
| db.CredentialPath = vaultPath | ||
| if err := s.repo.Update(ctx, db); err != nil { | ||
| return err | ||
| } | ||
|
|
There was a problem hiding this comment.
After a successful rotation you update Vault and the DB record’s CredentialPath, but you never update db.Password to newPassword. This breaks the documented “fallback compatibility”: if Vault is unavailable later, GetConnectionString will fall back to the stale password and clients won’t be able to connect. Either stop relying on db.Password as a fallback after rotation, or update it (and persist it) when rotation succeeds.
| // DeleteSecret removes a secret from the specified path. | ||
| func (a *Adapter) DeleteSecret(ctx context.Context, path string) error { | ||
| _, err := a.client.Logical().DeleteWithContext(ctx, path) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to delete secret from vault: %w", err) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
For Vault KV v2, deleting a secret’s data vs. deleting its metadata are different endpoints. Logical().DeleteWithContext(ctx, path) against a /data/... path typically only deletes the latest version and leaves metadata/old versions behind. If the intent is to fully remove credentials when a database is deleted, implement KV v2 deletion semantics explicitly (e.g., use the metadata endpoint or a dedicated KV v2 helper) and document which behavior you want.
| **Response Example**: | ||
| ```json | ||
| { | ||
| "message": "database credentials rotated successfully", | ||
| "data": { | ||
| "message": "database credentials rotated successfully" | ||
| } | ||
| } | ||
| ``` |
There was a problem hiding this comment.
The response example shows a top-level message alongside data.message, but the API’s httputil.Success wrapper only emits data (and meta). The example should reflect the actual envelope (e.g., {"data": {"message": ...}, "meta": ...}) to avoid confusing SDK/CLI consumers.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 35 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } | ||
| s.rotationMu.Unlock() |
There was a problem hiding this comment.
Idempotency cache is keyed only by idempotencyKey, so reusing the same key across different databases will incorrectly short-circuit rotations. Also, the map has no eviction/TTL, so it can grow unbounded in a long-running process. Consider keying by {dbID}:{idempotencyKey} and persisting idempotency state (or at least adding TTL/size bounds).
| // 3. Update DB record if needed (metadata or path) | ||
| db.CredentialPath = vaultPath |
There was a problem hiding this comment.
After a successful rotation, the service updates Vault and credential_path but never updates db.Password to newPassword before calling repo.Update. Since GetConnectionString falls back to db.Password when Vault is unavailable, a Vault outage after rotation would return a connection string with a stale password. Update the stored fallback password (or explicitly clear it and adjust fallback behavior) when rotation succeeds.
| // 3. Update DB record if needed (metadata or path) | |
| db.CredentialPath = vaultPath | |
| // 3. Update DB record if needed (metadata, path, and fallback password) | |
| db.CredentialPath = vaultPath | |
| db.Password = newPassword |
|
|
||
| ### POST /databases/:id/rotate-credentials | ||
| Regenerate the database password and update it in both the database engine and Vault. | ||
| - **Security**: Ensures credentials are never stored in plain text in the primary metadata store. |
There was a problem hiding this comment.
This bullet claims credentials are “never stored in plain text in the primary metadata store,” but the service/repo still persist the password column for fallback/legacy compatibility (also described elsewhere in the docs). Please adjust this statement to avoid promising a property the implementation doesn’t provide.
| - **Security**: Ensures credentials are never stored in plain text in the primary metadata store. | |
| - **Security**: Rotates credentials and synchronizes the updated password with the database engine and Vault. |
| port INT, | ||
| username VARCHAR(255), | ||
| password TEXT, | ||
| credential_path TEXT, |
There was a problem hiding this comment.
Schema docs list credential_path TEXT, but the actual migration adds it as VARCHAR(255) (see 099_add_database_credential_path.up.sql). Please align the documentation with the real column type (or change the migration to TEXT if long Vault paths should be supported).
| credential_path TEXT, | |
| credential_path VARCHAR(255), |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 38 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (r *DatabaseRepository) scanDatabase(row pgx.Row) (*domain.Database, error) { | ||
| var db domain.Database | ||
| var engine, status, role string | ||
| err := row.Scan( | ||
| &db.ID, &db.UserID, &db.Name, &engine, &db.Version, &status, &role, &db.PrimaryID, &db.VpcID, &db.ContainerID, &db.Port, &db.Username, &db.Password, &db.CreatedAt, &db.UpdatedAt, &db.AllocatedStorage, &db.Parameters, &db.MetricsEnabled, &db.MetricsPort, &db.ExporterContainerID, &db.PoolingEnabled, &db.PoolingPort, &db.PoolerContainerID, | ||
| &db.ID, &db.UserID, &db.TenantID, &db.Name, &engine, &db.Version, &status, &role, &db.PrimaryID, &db.VpcID, &db.ContainerID, &db.Port, &db.Username, &db.Password, &db.CreatedAt, &db.UpdatedAt, &db.AllocatedStorage, &db.Parameters, &db.MetricsEnabled, &db.MetricsPort, &db.ExporterContainerID, &db.PoolingEnabled, &db.PoolingPort, &db.PoolerContainerID, &db.CredentialPath, | ||
| ) |
There was a problem hiding this comment.
tenant_id is selected as a nullable column but scanned into db.TenantID (a non-nullable uuid.UUID). If any row has tenant_id = NULL (allowed by the migration), Scan will error and break listing/scanning. Consider making tenant_id NOT NULL at the DB level, or selecting COALESCE(tenant_id, '00000000-0000-0000-0000-000000000000'), or changing the domain field to a nullable type (e.g., *uuid.UUID/uuid.NullUUID).
| func (s *DatabaseService) RotateCredentials(ctx context.Context, id uuid.UUID, idempotencyKey string) error { | ||
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } | ||
| s.rotationMu.Unlock() | ||
| } |
There was a problem hiding this comment.
The in-memory idempotency cache for rotations is keyed only by Idempotency-Key, so the same key reused for a different database ID would incorrectly short-circuit rotation. Also, the map has no TTL/eviction and can grow unbounded over time. Consider scoping the cache key by database ID and adding expiry/size bounds (or persisting idempotency in the DB).
| // 2. Update in Vault ONLY after DB success | ||
| vaultPath := db.CredentialPath | ||
| if vaultPath == "" { | ||
| vaultPath = s.getVaultPath(db.ID) | ||
| } | ||
| if err := s.secrets.StoreSecret(ctx, vaultPath, map[string]interface{}{"password": newPassword}); err != nil { | ||
| // Note: At this point DB password is changed but Vault update failed. | ||
| // The system is out of sync. Fallback password in DB record remains old. | ||
| return errors.Wrap(errors.Internal, "database password updated but failed to store in vault", err) | ||
| } | ||
|
|
||
| // 3. Update DB record if needed (metadata or path) | ||
| db.CredentialPath = vaultPath | ||
| if err := s.repo.Update(ctx, db); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // 4. If pooler is enabled, restart it to pick up new credentials | ||
| if db.PoolerContainerID != "" { | ||
| if err := s.compute.DeleteInstance(ctx, db.PoolerContainerID); err != nil { | ||
| s.logger.Warn("failed to delete old pooler during rotation", "pooler_id", db.PoolerContainerID, "error", err) | ||
| } | ||
| db.PoolerContainerID = "" | ||
| dbIP, err := s.compute.GetInstanceIP(ctx, db.ContainerID) | ||
| if err != nil { | ||
| return s.performProvisioningRollback(ctx, db, vol.ID.String(), errors.Wrap(errors.Internal, "failed to get restored database IP", err)) | ||
| return errors.Wrap(errors.Internal, "failed to get database IP for pooler restart", err) | ||
| } | ||
| if err := s.provisionSidecars(ctx, db, dbEngine, dbIP, username, password, networkID); err != nil { | ||
| return s.performProvisioningRollback(ctx, db, vol.ID.String(), err) | ||
| networkID, err := s.resolveVpcNetwork(ctx, db.VpcID) | ||
| if err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to resolve network for pooler restart", err) | ||
| } | ||
| if err := s.provisionPoolerSidecar(ctx, db, db.Engine, dbIP, db.Username, newPassword, networkID); err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to provision new pooler sidecar during rotation", err) | ||
| } | ||
| if err := s.repo.Update(ctx, db); err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
After a successful rotation, the DB metadata record keeps db.Password unchanged (only CredentialPath is updated). This means your fallback path (GetConnectionString uses db.Password when Vault read fails) will return an invalid password post-rotation, and other components (e.g., metrics exporter) may continue using stale credentials. Consider updating the stored password alongside Vault (if fallback is required), and restarting/reprovisioning any credential-dependent sidecars (exporter as well as pooler) to avoid breaking metrics/ops after rotation.
| -- +goose Up | ||
| -- Add tenant_id to remaining resource tables | ||
|
|
||
| -- 1. databases | ||
| ALTER TABLE databases ADD COLUMN IF NOT EXISTS tenant_id UUID REFERENCES tenants(id) ON DELETE SET NULL; | ||
| CREATE INDEX IF NOT EXISTS idx_databases_tenant ON databases(tenant_id); | ||
|
|
||
| -- 2. usage_records | ||
| ALTER TABLE usage_records ADD COLUMN IF NOT EXISTS tenant_id UUID REFERENCES tenants(id) ON DELETE SET NULL; | ||
| CREATE INDEX IF NOT EXISTS idx_usage_records_tenant ON usage_records(tenant_id); | ||
|
|
||
| -- 3. api_keys | ||
| ALTER TABLE api_keys ADD COLUMN IF NOT EXISTS tenant_id UUID REFERENCES tenants(id) ON DELETE SET NULL; | ||
| CREATE INDEX IF NOT EXISTS idx_api_keys_tenant ON api_keys(tenant_id); | ||
|
|
||
| -- Backfill existing data using default_tenant_id from users table | ||
| UPDATE databases d SET tenant_id = u.default_tenant_id FROM users u WHERE d.user_id = u.id AND d.tenant_id IS NULL; | ||
| UPDATE usage_records ur SET tenant_id = u.default_tenant_id FROM users u WHERE ur.user_id = u.id AND ur.tenant_id IS NULL; | ||
| UPDATE api_keys ak SET tenant_id = u.default_tenant_id FROM users u WHERE ak.user_id = u.id AND ak.tenant_id IS NULL; |
There was a problem hiding this comment.
This PR is described as Vault integration + credential rotation, but this migration also introduces tenant_id to databases, usage_records, and api_keys (with backfills and new indexes). If this is intentional scope expansion, it should be called out in the PR description/release notes; otherwise consider moving it to a separate PR to reduce risk and review surface area.
| @@ -76,6 +79,9 @@ func NewConfig() (*Config, error) { | |||
| FirecrackerKernel: getEnv("FIRECRACKER_KERNEL", "/var/lib/thecloud/vmlinux"), | |||
| FirecrackerRootfs: getEnv("FIRECRACKER_ROOTFS", "/var/lib/thecloud/rootfs.ext4"), | |||
| FirecrackerMockMode: getEnv("FIRECRACKER_MOCK_MODE", "false") == "true", | |||
| VaultAddress: getEnv("VAULT_ADDR", "http://localhost:8200"), | |||
| VaultToken: getEnv("VAULT_TOKEN", ""), | |||
| VaultMountPath: getEnv("VAULT_MOUNT_PATH", "secret/data/thecloud/rds"), | |||
| }, nil | |||
There was a problem hiding this comment.
VAULT_MOUNT_PATH is named like a mount (e.g., secret) but the default includes the KV v2 /data/ segment (secret/data/...), and DatabaseService.getVaultPath concatenates it directly. This is easy to misconfigure (setting the mount to secret would generate non-KV-v2 paths and break reads/writes). Consider renaming this setting to something like VAULT_KV_V2_PATH_PREFIX or having the adapter build the correct KV v2 API path from a true mount + logical secret name.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| db := s.initialDatabaseRecord(userID, name, primary.Engine, primary.Version, primary.Username, password, primary.VpcID) | ||
| db.TenantID = tenantID | ||
| db.Role = domain.RoleReplica | ||
| db.PrimaryID = &primaryID | ||
| db.AllocatedStorage = primary.AllocatedStorage | ||
| db.MetricsEnabled = primary.MetricsEnabled | ||
| db.PoolingEnabled = primary.PoolingEnabled | ||
|
|
||
| func (s *DatabaseService) validateCreationRequest(req ports.CreateDatabaseRequest, engine domain.DatabaseEngine) error { | ||
| if !s.isValidEngine(engine) { | ||
| return errors.New(errors.InvalidInput, "unsupported database engine") | ||
| } | ||
| if req.AllocatedStorage < 10 { | ||
| return errors.New(errors.InvalidInput, "allocated storage must be at least 10GB") | ||
| } | ||
| if req.PoolingEnabled && engine != domain.EnginePostgres { | ||
| return errors.New(errors.InvalidInput, "connection pooling is currently only supported for PostgreSQL") | ||
| } | ||
| return nil | ||
| return s.provisionDatabase(ctx, db, password, primary.Parameters, primaryIP, "DATABASE_REPLICA_CREATE") |
There was a problem hiding this comment.
CreateReplica provisions/launches using primary.Parameters, but never assigns them to db.Parameters before persisting. This can result in parameters being inserted as NULL (bypassing the DB default) and later failing to scan into map[string]string on reads, and also means replicas won't persist the inherited config. Set db.Parameters = primary.Parameters (or an empty map) before repo.Create.
| db := s.initialDatabaseRecord(userID, req.NewName, dbEngine, req.Version, username, password, req.VpcID) | ||
| db.TenantID = tenantID | ||
| db.Role = domain.RoleReplica | ||
| db.PrimaryID = &primaryID | ||
| db.AllocatedStorage = primary.AllocatedStorage | ||
| db.MetricsEnabled = primary.MetricsEnabled | ||
| db.PoolingEnabled = primary.PoolingEnabled | ||
|
|
||
| imageName, env, defaultPort := s.getEngineConfig(primary.Engine, primary.Version, primary.Username, primary.Password, name, db.Role, primaryIP) | ||
| db.AllocatedStorage = req.AllocatedStorage | ||
| if snap.SizeGB > db.AllocatedStorage { | ||
| db.AllocatedStorage = snap.SizeGB | ||
| } | ||
| db.MetricsEnabled = req.MetricsEnabled | ||
| db.PoolingEnabled = req.PoolingEnabled | ||
|
|
There was a problem hiding this comment.
RestoreDatabase creates a DB record but does not set db.Role (primary) or db.Parameters before provisioning/persisting. This can store an empty role and NULL parameters, breaking subsequent reads and making restored instances inconsistent with regular creates. Explicitly set db.Role = domain.RolePrimary and assign db.Parameters = req.Parameters (or {}) before calling finalizeProvisioning.
| func (s *DatabaseService) RotateCredentials(ctx context.Context, id uuid.UUID, idempotencyKey string) error { | ||
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } | ||
| s.rotationMu.Unlock() | ||
| } | ||
|
|
||
| snap, err := s.snapshotSvc.GetSnapshot(ctx, req.SnapshotID) | ||
| db, err := s.repo.GetByID(ctx, id) | ||
| if err != nil { | ||
| return nil, err | ||
| return err | ||
| } |
There was a problem hiding this comment.
RotateCredentials bypasses RBAC authorization checks that other service methods enforce (Create/Modify/Delete/etc.). Even though the HTTP route has middleware, this service method can be called from other code paths (workers/tests) without protection. Add an Authorize call for domain.PermissionDBUpdate on the database ID, consistent with other DatabaseService methods.
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } | ||
| s.rotationMu.Unlock() | ||
| } |
There was a problem hiding this comment.
The idempotency handling is not safe under concurrent requests: the code checks rotationCache[idempotencyKey] and unlocks before any “in-progress” marker is set, so two requests with the same key can both rotate credentials concurrently. Consider using singleflight.Group, or store a state (in-progress/done) under the lock and keep it until completion.
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| s.rotationCache[idempotencyKey] = true | ||
| s.rotationMu.Unlock() |
There was a problem hiding this comment.
rotationCache grows without bounds and has no TTL/eviction, which can lead to unbounded memory growth if clients send unique Idempotency-Key values. Use a bounded cache (LRU) and/or TTL-based eviction, or persist idempotency state in the datastore with an expiration.
| // 3. Update DB record if needed (metadata or path) | ||
| db.CredentialPath = vaultPath |
There was a problem hiding this comment.
After a successful rotation, the database metadata password field is never updated (only Vault is updated and CredentialPath is saved). Since GetConnectionString falls back to db.Password when Vault is unavailable, this makes the fallback incorrect after rotation. Either update the stored fallback password on success, or remove/disable fallback usage once Vault-backed credentials are enabled.
| // 3. Update DB record if needed (metadata or path) | |
| db.CredentialPath = vaultPath | |
| // 3. Update DB record so metadata stays consistent with the rotated credential. | |
| db.CredentialPath = vaultPath | |
| db.Password = newPassword |
| port INT, | ||
| username VARCHAR(255), | ||
| password TEXT, | ||
| credential_path TEXT, |
There was a problem hiding this comment.
The documented table schema uses credential_path TEXT, but the migration adds credential_path VARCHAR(255). Please align the documentation with the actual schema (or adjust the migration) to avoid confusion for operators and contributors.
| credential_path TEXT, | |
| credential_path VARCHAR(255), |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (s *DatabaseService) RotateCredentials(ctx context.Context, id uuid.UUID, idempotencyKey string) error { | ||
| if idempotencyKey != "" { | ||
| s.rotationMu.Lock() | ||
| if s.rotationCache[idempotencyKey] { | ||
| s.rotationMu.Unlock() | ||
| return nil // Already rotated | ||
| } |
There was a problem hiding this comment.
RotateCredentials bypasses RBAC authorization checks in the service layer (unlike the other DatabaseService methods). Even though the HTTP route is wrapped with permission middleware, this method can be called from internal code/worker paths and should enforce tenant/user authorization consistently.
| vaultMountPath string | ||
| // Simple in-memory idempotency cache for rotation | ||
| rotationCache map[string]bool | ||
| rotationMu sync.Mutex | ||
| } |
There was a problem hiding this comment.
The in-memory idempotency cache (rotationCache) grows unbounded and has no TTL/eviction. In a long-running API process, unique Idempotency-Key values can accumulate indefinitely and increase memory usage.
| var cmd []string | ||
| switch db.Engine { | ||
| case domain.EnginePostgres: | ||
| cmd = []string{"psql", "-h", "127.0.0.1", "-U", db.Username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", db.Username, newPassword)} |
There was a problem hiding this comment.
The Postgres rotation command uses psql -h 127.0.0.1 -U <user> without providing a password (e.g., via PGPASSWORD) and forces TCP auth. With the official Postgres image defaults, this typically fails with “no password supplied”. Consider connecting via the local socket (omit -h) or passing credentials explicitly for non-interactive exec.
| cmd = []string{"psql", "-h", "127.0.0.1", "-U", db.Username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", db.Username, newPassword)} | |
| cmd = []string{"psql", "-U", db.Username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", db.Username, newPassword)} |
| if err := s.resolveDatabasePort(ctx, db, allocatedPorts, defaultPort); err != nil { | ||
| return s.performProvisioningRollback(ctx, db, vol.ID.String(), errors.Wrap(errors.Internal, "failed to resolve restored database port", err)) | ||
| // 3. Update DB record if needed (metadata or path) | ||
| db.CredentialPath = vaultPath |
There was a problem hiding this comment.
After a successful rotation, the metadata fallback password (db.Password) is not updated before calling repo.Update. If Vault later becomes unavailable, GetConnectionString() will fall back to the stale password and return an invalid connection string. If the password field is intentionally kept for fallback/legacy compatibility, it should be updated to the new password on success.
| db.CredentialPath = vaultPath | |
| db.CredentialPath = vaultPath | |
| db.Password = newPassword |
| env := []string{"MYSQL_ROOT_PASSWORD=" + password, "MYSQL_DATABASE=" + name} | ||
| if role == domain.RoleReplica { | ||
| env = append(env, "MYSQL_MASTER_HOST="+primaryIP) | ||
| env = append(env, "PRIMARY_HOST="+primaryIP) |
There was a problem hiding this comment.
MySQL replica containers are configured with PRIMARY_HOST but the docs mention MYSQL_MASTER_HOST for MySQL replication. If replication setup relies on the documented variable, this will break replica provisioning; consider setting the expected MySQL env var (or both) and keeping docs/code consistent.
| env = append(env, "PRIMARY_HOST="+primaryIP) | |
| env = append(env, | |
| "PRIMARY_HOST="+primaryIP, | |
| "MYSQL_MASTER_HOST="+primaryIP, | |
| ) |
| func (m *NoOpSecretsManager) GetSecret(ctx context.Context, path string) (map[string]interface{}, error) { | ||
| m.mu.RLock() | ||
| defer m.mu.RUnlock() | ||
| data, ok := m.secrets[path] | ||
| if !ok { | ||
| return nil, fmt.Errorf("secret not found at path: %s", path) | ||
| } | ||
| return data, nil | ||
| } |
There was a problem hiding this comment.
NoOpSecretsManager.GetSecret returns the internal stored map by reference. Callers can mutate it outside the lock, which can cause data races and corrupt the in-memory store. Consider returning a shallow copy (and optionally copying nested values) to keep the store’s internal state immutable from the caller’s perspective.
Implementation of Managed Database Credential Security with HashiCorp Vault
This PR introduces enterprise-grade secret management for our Managed Database (RDS) service by integrating HashiCorp Vault. It also adds an automated credential rotation feature.
Key Changes
1. HashiCorp Vault Integration
VaultAdapterimplementing theSecretsManagerinterface.DatabaseServiceto orchestrate credential storage during creation, retrieval, and restoration.2. Credential Rotation (Day 2 Ops)
POST /databases/:id/rotate-credentialsendpoint.ALTER USERinside the database container (PostgreSQL/MySQL).3. Schema & Repository Updates
credential_pathin thedatabasestable.DatabaseRepositoryto handle the new field while maintaining fallback compatibility.4. Comprehensive Testing
Verification
go test ./internal/...)go test -v tests/database_e2e_test.go)docs/database.mdanddocs/guides/rds.md.Commit History
The changes have been broken down into 17 logical commits for easier review.
Summary by CodeRabbit
New Features
Documentation
Database