Skip to content

fix(database): add rollback on vault failure in RotateCredentials#161

Open
poyrazK wants to merge 8 commits intomainfrom
fix/credential-rotation-rollback
Open

fix(database): add rollback on vault failure in RotateCredentials#161
poyrazK wants to merge 8 commits intomainfrom
fix/credential-rotation-rollback

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Apr 18, 2026

Summary

  • Add automatic rollback when vault store fails after DB password change
  • Previously the system would be left in an inconsistent state (DB has new password, vault has old)
  • Added buildPasswordChangeCmd helper for generating password change/rollback commands
  • Updated test to verify rollback behavior on vault failure

Test plan

  • go build ./...
  • go test ./internal/core/services/... -run RotateCredentials

Summary by CodeRabbit

Release Notes

New Features

  • Credential rotation now includes automatic database password rollback if Vault write fails, maintaining consistency between the database and secrets manager
  • Database promotion to primary now executes engine-specific operations: PostgreSQL creates a promotion trigger file; MySQL stops replication

Documentation

  • Updated credential rotation workflow documentation to reflect automatic rollback behavior

poyrazK added 6 commits April 18, 2026 13:52
1. rotationCache memory leak: Add TTL-based eviction (24h TTL)
   to prevent unbounded map growth on credential rotation

2. PromoteToPrimary: Execute actual pg_promote() / MySQL replica
   reset commands in container instead of only updating metadata

3. NoopDatabaseService: Add missing interface methods
   (CreateReplica, PromoteToPrimary, GetDatabase, ListDatabases,
   RotateCredentials, StopDatabase, StartDatabase)

4. Update PromoteToPrimary unit test with mock Exec call
- MySQL: fetch current password from Vault instead of using
  stale db.Password (can diverge after RotateCredentials)
- PostgreSQL: use pg_ctl promote instead of psql to avoid
  authentication issues with TCP connections
pg_ctl promote requires specifying the data directory and may have
PATH issues in the Docker container. Using 'postgres --promote'
sends a fast promotion signal directly to the running server
without needing to locate pg_ctl or the data directory.

This fixes the E2E test TestDatabaseReplicationE2E/PromoteReplica
which was returning HTTP 500.
pg_ctl and postgres --promote can fail due to PATH issues or signal
routing in Docker containers. PostgreSQL monitors a trigger file
(promote signal file) at PGDATA/promote - when this file appears,
the server exits recovery and promotes.

Using 'touch /var/lib/postgresql/data/promote' is the most reliable
promotion method in containerized environments.
The TTL check only affected read behavior - expired entries remained
in the map forever, causing unbounded memory growth. Now we delete
the entry when we find it expired during lookup.
When vault store fails after the DB password has already been changed,
the system now automatically rolls back the DB password to the original
value to maintain consistency. Previously this left the system in an
inconsistent state where vault had old password but DB expected new one.

Added buildPasswordChangeCmd helper for generating rollback commands.
Updated test to verify rollback behavior.
Copilot AI review requested due to automatic review settings April 18, 2026 21:14
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

The changes refactor credential rotation in the database service to include automatic password rollback if Vault storage fails after database updates, enhance replica promotion with engine-specific container commands, and replace rotation idempotency caching with TTL-based timestamps. Documentation is updated accordingly, and no-op adapter stubs are added for new methods.

Changes

Cohort / File(s) Summary
Documentation
docs/database.md
Updated credential rotation workflow to specify automatic database password rollback if Vault storage fails.
Core Service Implementation
internal/core/services/database.go
Replaced rotation idempotency cache with TTL-based timestamps; enhanced PromoteToPrimary with engine-specific in-container commands (PostgreSQL trigger file, MySQL replica reset); added rollback mechanism for failed Vault storage that reverses database password changes; added buildPasswordChangeCmd helper for engine-specific commands.
Service Tests
internal/core/services/database_unit_test.go, internal/core/services/database_vault_test.go
Updated PromoteToPrimary test setup with engine and container fields; enhanced Vault failure test to validate rollback Exec call and modified error assertions.
No-op Adapter
internal/repositories/noop/adapters.go
Added stub implementations of seven new NoopDatabaseService methods: CreateReplica, PromoteToPrimary, GetDatabase, ListDatabases, RotateCredentials, StopDatabase, StartDatabase.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Service as Database Service
    participant Container as DB Container
    participant Vault
    
    Client->>Service: RotateCredentials(id, idempotencyKey)
    Service->>Container: Exec(ALTER USER password)
    Container-->>Service: Success
    Service->>Vault: StoreSecret(newPassword)
    Vault-->>Service: Error
    
    alt Vault Storage Failed
        Service->>Container: Exec(ALTER USER originalPassword)
        Container-->>Service: Success
        Service-->>Client: Error (Vault failed, DB rolled back)
    else Rollback Failed
        Service-->>Client: Error (Vault failed, rollback also failed)
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

Poem

🐰 A password rolls back with grace,
When Vault won't keep its place,
The database and vault now align,
With TTLs ticking just fine! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and specifically describes the main change: adding rollback functionality on Vault failure in the RotateCredentials operation, which is the core focus of the changes across documentation, implementation, and tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/credential-rotation-rollback
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/credential-rotation-rollback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent inconsistent credential state during DatabaseService.RotateCredentials by rolling back the in-database password change when writing the new password to Vault fails.

Changes:

  • Add rollback logic in RotateCredentials when Vault StoreSecret fails after the DB password update.
  • Add a buildPasswordChangeCmd helper and introduce TTL timestamps in the in-memory idempotency cache.
  • Update service tests to validate rollback behavior on Vault failure and adjust unit tests for the updated PromoteToPrimary behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
internal/repositories/noop/adapters.go Extends the noop database service adapter to satisfy the updated ports.DatabaseService surface.
internal/core/services/database_vault_test.go Updates RotateCredentials tests to assert the new rollback-on-Vault-failure behavior.
internal/core/services/database_unit_test.go Updates PromoteToPrimary unit test setup to account for newly introduced in-container promotion execution.
internal/core/services/database.go Implements rollback on Vault failure for RotateCredentials; changes idempotency cache storage to timestamps with TTL; adds engine promotion execution and password change helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/core/services/database.go Outdated
Comment on lines +818 to +819
"credential rotation failed and rollback also failed - manual intervention required",
err)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Vault store failure path, if the rollback Exec fails you currently wrap/return the Vault error (err) as the cause, which loses the rollback failure details. Wrap rollbackErr as the cause (and include the Vault error in the message/context) so operators can see why the rollback failed.

Suggested change
"credential rotation failed and rollback also failed - manual intervention required",
err)
fmt.Sprintf("credential rotation failed and rollback also failed - manual intervention required (vault store error: %v)", err),
rollbackErr)

Copilot uses AI. Check for mistakes.
Comment on lines 752 to 760
s.rotationMu.Lock()
if s.rotationCache[idempotencyKey] {
s.rotationMu.Unlock()
return nil // Already rotated
if cachedAt, ok := s.rotationCache[idempotencyKey]; ok {
if time.Since(cachedAt) < s.rotationCacheTTL {
s.rotationMu.Unlock()
return nil // Already rotated
}
// Expired entry - remove it to prevent unbounded map growth
delete(s.rotationCache, idempotencyKey)
}
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TTL cache eviction only deletes the specific key being checked; entries for other idempotency keys are never reaped, so the map can still grow without bound. Either periodically sweep expired entries (e.g., on each call with a bounded scan or via a background ticker), or change the comment/approach to avoid implying TTL-based eviction prevents unbounded growth.

Copilot uses AI. Check for mistakes.
Comment on lines +512 to +514
// Create promotion trigger file - PostgreSQL monitors this file
// and exits recovery mode when it appears
cmd = []string{"touch", "/var/lib/postgresql/data/promote"}
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL promotion via touch /var/lib/postgresql/data/promote will only work if the replica is configured with promote_trigger_file pointing at that path; I can’t find any code that sets this, so promotion will likely be a no-op. Consider using pg_ctl promote -D /var/lib/postgresql/data (or another standard promotion mechanism) instead of relying on an unspecified trigger file.

Suggested change
// Create promotion trigger file - PostgreSQL monitors this file
// and exits recovery mode when it appears
cmd = []string{"touch", "/var/lib/postgresql/data/promote"}
// Promote the PostgreSQL standby using the standard server command
// instead of relying on an externally configured trigger file.
cmd = []string{"pg_ctl", "promote", "-D", "/var/lib/postgresql/data"}

Copilot uses AI. Check for mistakes.
}
}
}
cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP REPLICA; RESET REPLICA ALL;"}
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MySQL promotion command uses STOP REPLICA; RESET REPLICA ALL;, but database version isn’t validated anywhere (e.g., users could provision 5.7), where these statements may not exist. To avoid promotion failures, either validate/limit supported MySQL versions or use the more widely compatible STOP SLAVE; RESET SLAVE ALL; (or branch by version).

Suggested change
cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP REPLICA; RESET REPLICA ALL;"}
cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP SLAVE; RESET SLAVE ALL;"}

Copilot uses AI. Check for mistakes.
Comment on lines +508 to +533
// Execute database engine promotion command in container
var cmd []string
switch db.Engine {
case domain.EnginePostgres:
// Create promotion trigger file - PostgreSQL monitors this file
// and exits recovery mode when it appears
cmd = []string{"touch", "/var/lib/postgresql/data/promote"}
case domain.EngineMySQL:
// Fetch current password from Vault (db.Password may be stale after rotation)
password := db.Password
if db.CredentialPath != "" {
secret, err := s.secrets.GetSecret(ctx, db.CredentialPath)
if err == nil && secret != nil {
if p, ok := secret["password"].(string); ok {
password = p
}
}
}
cmd = []string{"mysql", "-u", "root", "-p" + password, "-e", "STOP REPLICA; RESET REPLICA ALL;"}
default:
return errors.New(errors.Internal, "unsupported engine for promotion")
}

if _, err := s.compute.Exec(ctx, db.ContainerID, cmd); err != nil {
return errors.Wrap(errors.Internal, "failed to promote database engine", err)
}
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/title focuses on RotateCredentials rollback, but this hunk also changes PromoteToPrimary to execute in-container promotion commands (and adds new operational behavior). Please either update the PR description to reflect this additional behavior change or split it into a separate PR so it can be reviewed/released independently.

Copilot uses AI. Check for mistakes.
poyrazK added 2 commits April 19, 2026 00:32
…ehavior

Previously the docs noted that vault store failure could leave DB and vault
out of sync. Now documents the automatic rollback mechanism that restores
the original password if vault store fails.
- Wrap rollbackErr as cause in error (was wrapping vault err instead)
- Clarify comment: TTL eviction only deletes key being checked,
  not a full sweep - prevents implying guaranteed bounded growth
Copilot AI review requested due to automatic review settings April 19, 2026 16:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +533 to +535
if _, err := s.compute.Exec(ctx, db.ContainerID, cmd); err != nil {
return errors.Wrap(errors.Internal, "failed to promote database engine", err)
}
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PromoteToPrimary now executes a command in the database container, but it doesn’t validate db.ContainerID is present before calling s.compute.Exec. If ContainerID is empty, this will fail in a less controlled way; other methods (e.g., StartDatabase) explicitly check for a missing container ID and return a clear internal error.

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +110
t.Run("RotateCredentials_VaultFailure_WithRollback", func(t *testing.T) {
mockRepo.On("GetByID", mock.Anything, dbID).Return(db, nil).Once()
mockSecrets.On("GetSecret", mock.Anything, db.CredentialPath).Return(map[string]interface{}{"password": "old-pass"}, nil).Once()
// First Exec: ALTER USER with new password (succeeds)
mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once()
// Vault store fails
mockSecrets.On("StoreSecret", mock.Anything, db.CredentialPath, mock.Anything).Return(fmt.Errorf("vault error")).Once()
// Second Exec: rollback to original password (succeeds)
mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once()
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new rollback behavior is only exercised for PostgreSQL. Since the rollback path for MySQL is different (it needs to authenticate with the post-rotation password to revert), adding a MySQL-focused test case here would help prevent regressions and would have caught incorrect rollback command construction.

Copilot uses AI. Check for mistakes.
Comment on lines +815 to +823
// Vault store failed but DB already has new password - rollback to original
rollbackCmd := s.buildPasswordChangeCmd(db.Engine, db.Username, currentPassword)
if _, rollbackErr := s.compute.Exec(ctx, db.ContainerID, rollbackCmd); rollbackErr != nil {
// Rollback also failed - system is in critical state requiring manual intervention
// Wrap rollbackErr as cause so operators can see why rollback failed
return errors.Wrap(errors.Internal,
fmt.Sprintf("credential rotation failed and rollback also failed - manual intervention required (vault store error: %v)", err),
rollbackErr)
}
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollback for MySQL credential rotation is built using currentPassword as the mysql -p... auth value. If the rotated user is root (which is the default username for MySQL in this service), the password has already been changed to newPassword by the first Exec, so authenticating with currentPassword will fail and rollback will never succeed. Consider building the rollback command with newPassword for authentication and currentPassword as the target password (i.e., separate auth vs desired password), or use a MySQL-safe password reset flow that can authenticate after the change.

Copilot uses AI. Check for mistakes.
Comment on lines +1048 to +1053
func (s *DatabaseService) buildPasswordChangeCmd(engine domain.DatabaseEngine, username, password string) []string {
switch engine {
case domain.EnginePostgres:
return []string{"psql", "-h", "127.0.0.1", "-U", username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", username, password)}
case domain.EngineMySQL:
return []string{"mysql", "-u", "root", "-p" + password, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", username, password)}
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildPasswordChangeCmd takes a single password argument, but for MySQL it is used both as the authentication password (-p...) and as the password being set in ALTER USER ... IDENTIFIED BY .... For rollback you generally need to authenticate with the current password (after the attempted rotation) while setting the desired password (the pre-rotation value). Updating this helper to accept distinct authPassword and newPassword (or returning an error for unsupported engines) would prevent incorrect commands and make call sites clearer.

Suggested change
func (s *DatabaseService) buildPasswordChangeCmd(engine domain.DatabaseEngine, username, password string) []string {
switch engine {
case domain.EnginePostgres:
return []string{"psql", "-h", "127.0.0.1", "-U", username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", username, password)}
case domain.EngineMySQL:
return []string{"mysql", "-u", "root", "-p" + password, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", username, password)}
func (s *DatabaseService) buildPasswordChangeCmd(engine domain.DatabaseEngine, username, password string, newPassword ...string) []string {
targetPassword := password
if len(newPassword) > 0 {
targetPassword = newPassword[0]
}
switch engine {
case domain.EnginePostgres:
return []string{"psql", "-h", "127.0.0.1", "-U", username, "-d", "postgres", "-c", fmt.Sprintf("ALTER USER %s WITH PASSWORD '%s';", username, targetPassword)}
case domain.EngineMySQL:
return []string{"mysql", "-u", "root", "-p" + password, "-e", fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY '%s';", username, targetPassword)}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/core/services/database.go (1)

788-796: ⚠️ Potential issue | 🟠 Major

Fix rollback authentication and SQL injection in password rotation.

Credential rotation has two bugs: (1) The rollback call (line 816) passes currentPassword for authentication, but the database was already changed to newPassword by the forward flow—rollback will fail with authentication error. Rollback must authenticate with newPassword and set the password back to currentPassword. (2) SQL literals and identifiers are interpolated directly into commands without escaping, creating SQL injection vulnerabilities for both PostgreSQL and MySQL (lines 791, 793, 1051, 1053).

Refactor buildPasswordChangeCmd to accept separate authPassword and targetPassword parameters. Update the forward call at lines 788–796 and the rollback call at line 816 to use the correct credentials. Add escaping helpers (sqlStringLiteral and postgresIdentifier) to properly escape SQL values and identifiers before interpolation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 788 - 796, Refactor
buildPasswordChangeCmd to accept authPassword and targetPassword (instead of one
password) and update its callers so the forward rotation call uses
authPassword=currentPassword and targetPassword=newPassword while the rollback
call uses authPassword=newPassword and targetPassword=currentPassword; also
introduce escaping helpers sqlStringLiteral (for SQL string literals) and
postgresIdentifier (for PG identifiers) and use them when composing the psql and
mysql command strings so usernames and passwords are properly escaped (e.g.,
replace direct fmt.Sprintf interpolation in the psql/psql and mysql command
construction in buildPasswordChangeCmd and any rollback invocation), ensuring
MySQL quoting/escaping is handled for the -p flag and SQL literal, and return an
error for unsupported engines.
🧹 Nitpick comments (3)
internal/core/services/database_unit_test.go (1)

147-147: Assert the promotion command, not just the Exec call.

This test would still pass if PromoteToPrimary executed the wrong container command. Match the expected command shape after the service command is finalized.

🧪 Example tighter expectation
-		mockCompute.On("Exec", mock.Anything, "test-container", mock.Anything).Return("", nil).Once()
+		mockCompute.On("Exec", mock.Anything, "test-container", mock.MatchedBy(func(cmd []string) bool {
+			return len(cmd) >= 2 && cmd[0] == "pg_ctl" && cmd[1] == "promote"
+		})).Return("", nil).Once()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database_unit_test.go` at line 147, The test currently
only asserts that mockCompute.Exec was called, but not that PromoteToPrimary
sent the correct promotion command; update the test to assert the exact command
string/arguments passed into mockCompute.Exec when PromoteToPrimary is invoked
(match the finalized service command shape), e.g., inspect the second parameter
or command argument in the Exec call expectation on mockCompute and replace
mock.Anything for the command with a tight matcher/assertion that equals the
expected promotion command for "test-container" so the test fails if the wrong
command is constructed.
internal/core/services/database.go (1)

101-101: Extract the rotation cache TTL into a named constant.

This keeps the default discoverable and avoids hard-coded operational policy in the constructor.

♻️ Proposed constant extraction
 const (
 	// Default ports for database engines
 	DefaultPostgresPort = "5432"
 	DefaultMySQLPort    = "3306"
+
+	// Rotation cache defaults
+	DefaultRotationCacheTTL = 24 * time.Hour
 
 	// Connection Pooling (PgBouncer) defaults
-		rotationCacheTTL: 24 * time.Hour,
+		rotationCacheTTL: DefaultRotationCacheTTL,

As per coding guidelines, Do not use magic numbers - use named constants instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` at line 101, Extract the hard-coded 24 *
time.Hour into a named package-level constant (e.g., defaultRotationCacheTTL)
and replace the literal in the struct/constructor assignment for
rotationCacheTTL with that constant; update the declaration near the top of
internal/core/services/database.go and use defaultRotationCacheTTL wherever
rotationCacheTTL is initialized (referencing the rotationCacheTTL field in the
constructor/struct initialization) so the default TTL is discoverable and no
magic number remains.
internal/core/services/database_vault_test.go (1)

105-110: Verify the rollback command restores the original password.

Both Exec expectations currently accept any command, so this test would pass even if rollback reused the new password or sent an unrelated command.

🧪 Proposed matcher tightening
 import (
 	"context"
 	"fmt"
 	"log/slog"
+	"strings"
 	"testing"
 		// First Exec: ALTER USER with new password (succeeds)
-		mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once()
+		mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.MatchedBy(func(cmd []string) bool {
+			return !strings.Contains(strings.Join(cmd, " "), "old-pass")
+		})).Return("ALTER ROLE", nil).Once()
 		// Vault store fails
 		mockSecrets.On("StoreSecret", mock.Anything, db.CredentialPath, mock.Anything).Return(fmt.Errorf("vault error")).Once()
 		// Second Exec: rollback to original password (succeeds)
-		mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything).Return("ALTER ROLE", nil).Once()
+		mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.MatchedBy(func(cmd []string) bool {
+			return strings.Contains(strings.Join(cmd, " "), "old-pass")
+		})).Return("ALTER ROLE", nil).Once()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database_vault_test.go` around lines 105 - 110, The
test's mock expectations for mockCompute.Exec are too permissive and may accept
any command, so tighten the second Exec expectation to verify the rollback
command restores the original password: change the mockCompute.On("Exec",
mock.Anything, db.ContainerID, mock.Anything) for the rollback to use a matcher
that inspects the command string and asserts it contains the original password
(or the expected ALTER ROLE ... WITH PASSWORD 'original') rather than the new
one; keep the first Exec expectation for the initial ALTER and the
mockSecrets.On("StoreSecret", mock.Anything, db.CredentialPath, mock.Anything)
failure as-is, but ensure the rollback Exec expectation specifically checks the
command argument content to validate restoration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/database.md`:
- Line 410: Update the documentation sentence about automatic rollback to
clarify that while the service will attempt to roll back the DB password if
writing to Vault fails, the rollback itself can fail and RotateCredentials may
surface a manual-intervention error (e.g., when its Exec fails); replace the
absolute guarantee with a statement that rollback is attempted but may require
manual intervention if RotateCredentials/Exec reports an error, and reference
RotateCredentials and its Exec call so readers know where the potential failure
originates.

In `@internal/core/services/database.go`:
- Around line 753-763: The current idempotency handling stores the key only
after a successful rotation, allowing concurrent callers to both miss the cache
and perform duplicate rotations; modify the code around the idempotencyKey
handling (use of s.rotationMu, s.rotationCache, s.rotationCacheTTL) to insert an
"in-flight" entry under the lock before starting rotation: store a struct or
sentinel that includes a wait channel or condition variable and a state
(in-flight), unlock and perform the rotation, then under the lock set the final
cachedAt on success and close/signal the waiting channel so duplicate callers
return immediately; on failure remove the in-flight entry and close/signal with
an error so callers can retry. Ensure duplicate callers detect the in-flight
marker, wait on it while holding no locks, then observe the final result (cached
timestamp or absence) to decide whether to proceed or return.
- Around line 510-517: The PostgreSQL promotion branch using cmd =
[]string{"touch", "/var/lib/postgresql/data/promote"} is unreliable; update the
db promotion logic (the switch handling db.Engine / case domain.EnginePostgres)
to execute "pg_ctl promote" instead of touching a file so the standby is
explicitly promoted; ensure the command is run as the postgres user or with the
same environment used by the containerized postgres process and that the exec
path references pg_ctl available in the postgres image.

---

Outside diff comments:
In `@internal/core/services/database.go`:
- Around line 788-796: Refactor buildPasswordChangeCmd to accept authPassword
and targetPassword (instead of one password) and update its callers so the
forward rotation call uses authPassword=currentPassword and
targetPassword=newPassword while the rollback call uses authPassword=newPassword
and targetPassword=currentPassword; also introduce escaping helpers
sqlStringLiteral (for SQL string literals) and postgresIdentifier (for PG
identifiers) and use them when composing the psql and mysql command strings so
usernames and passwords are properly escaped (e.g., replace direct fmt.Sprintf
interpolation in the psql/psql and mysql command construction in
buildPasswordChangeCmd and any rollback invocation), ensuring MySQL
quoting/escaping is handled for the -p flag and SQL literal, and return an error
for unsupported engines.

---

Nitpick comments:
In `@internal/core/services/database_unit_test.go`:
- Line 147: The test currently only asserts that mockCompute.Exec was called,
but not that PromoteToPrimary sent the correct promotion command; update the
test to assert the exact command string/arguments passed into mockCompute.Exec
when PromoteToPrimary is invoked (match the finalized service command shape),
e.g., inspect the second parameter or command argument in the Exec call
expectation on mockCompute and replace mock.Anything for the command with a
tight matcher/assertion that equals the expected promotion command for
"test-container" so the test fails if the wrong command is constructed.

In `@internal/core/services/database_vault_test.go`:
- Around line 105-110: The test's mock expectations for mockCompute.Exec are too
permissive and may accept any command, so tighten the second Exec expectation to
verify the rollback command restores the original password: change the
mockCompute.On("Exec", mock.Anything, db.ContainerID, mock.Anything) for the
rollback to use a matcher that inspects the command string and asserts it
contains the original password (or the expected ALTER ROLE ... WITH PASSWORD
'original') rather than the new one; keep the first Exec expectation for the
initial ALTER and the mockSecrets.On("StoreSecret", mock.Anything,
db.CredentialPath, mock.Anything) failure as-is, but ensure the rollback Exec
expectation specifically checks the command argument content to validate
restoration.

In `@internal/core/services/database.go`:
- Line 101: Extract the hard-coded 24 * time.Hour into a named package-level
constant (e.g., defaultRotationCacheTTL) and replace the literal in the
struct/constructor assignment for rotationCacheTTL with that constant; update
the declaration near the top of internal/core/services/database.go and use
defaultRotationCacheTTL wherever rotationCacheTTL is initialized (referencing
the rotationCacheTTL field in the constructor/struct initialization) so the
default TTL is discoverable and no magic number remains.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d062a70-40bb-4fda-ad42-d09f2afc1309

📥 Commits

Reviewing files that changed from the base of the PR and between 7f6f817 and a7816b3.

📒 Files selected for processing (5)
  • docs/database.md
  • internal/core/services/database.go
  • internal/core/services/database_unit_test.go
  • internal/core/services/database_vault_test.go
  • internal/repositories/noop/adapters.go

Comment thread docs/database.md
1. **Generate Password**: A new 16-character secure password is generated.
2. **Engine Update**: The `ALTER USER` command is executed inside the database container first to apply the new password.
3. **Update Vault**: The new secret is written to Vault. (Note: A failure here may leave the database and Vault out of sync).
3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system automatically rolls back the database password to the original value to maintain consistency.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid guaranteeing rollback when rollback itself can fail.

The service attempts rollback, but RotateCredentials can return a manual-intervention error when the rollback Exec fails. The docs should reflect that nuance.

📝 Proposed wording
-    3.  **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system automatically rolls back the database password to the original value to maintain consistency.
+    3.  **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system attempts to roll back the database password to the original value to maintain consistency. If rollback also fails, the API returns an error indicating manual intervention is required.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system automatically rolls back the database password to the original value to maintain consistency.
3. **Update Vault**: The new secret is written to Vault. If Vault store fails after the DB password has been changed, the system attempts to roll back the database password to the original value to maintain consistency. If rollback also fails, the API returns an error indicating manual intervention is required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/database.md` at line 410, Update the documentation sentence about
automatic rollback to clarify that while the service will attempt to roll back
the DB password if writing to Vault fails, the rollback itself can fail and
RotateCredentials may surface a manual-intervention error (e.g., when its Exec
fails); replace the absolute guarantee with a statement that rollback is
attempted but may require manual intervention if RotateCredentials/Exec reports
an error, and reference RotateCredentials and its Exec call so readers know
where the potential failure originates.

Comment on lines +510 to +517
// Execute database engine promotion command in container
var cmd []string
switch db.Engine {
case domain.EnginePostgres:
// Create promotion trigger file - PostgreSQL monitors this file
// and exits recovery mode when it appears
cmd = []string{"touch", "/var/lib/postgresql/data/promote"}
case domain.EngineMySQL:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect repository configuration for PostgreSQL promotion support.
# Expectation: If no promote_trigger_file configuration exists, promotion should use pg_ctl promote or pg_promote().

rg -n -C3 'promote_trigger_file|pg_ctl\s+promote|pg_promote|/var/lib/postgresql/data/promote|PRIMARY_HOST|postgres:.*alpine'

Repository: poyrazK/thecloud

Length of output: 6674


🏁 Script executed:

sed -n '530,540p' internal/core/services/database.go

Repository: poyrazK/thecloud

Length of output: 400


🏁 Script executed:

sed -n '510,535p' internal/core/services/database.go

Repository: poyrazK/thecloud

Length of output: 1042


Replace the PostgreSQL promotion method with pg_ctl promote.

The touch /var/lib/postgresql/data/promote command will not promote the replica without explicit promote_trigger_file configuration in PostgreSQL. No such configuration exists in this repository, and the official postgres:16-alpine image does not enable it by default. This can silently fail to promote the replica.

Recommended fix
 	case domain.EnginePostgres:
-		// Create promotion trigger file - PostgreSQL monitors this file
-		// and exits recovery mode when it appears
-		cmd = []string{"touch", "/var/lib/postgresql/data/promote"}
+		cmd = []string{"pg_ctl", "promote", "-D", "/var/lib/postgresql/data", "-w"}

Note: The MySQL promotion method (lines 517-524) using STOP REPLICA; RESET REPLICA ALL; is correct and requires no changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 510 - 517, The PostgreSQL
promotion branch using cmd = []string{"touch",
"/var/lib/postgresql/data/promote"} is unreliable; update the db promotion logic
(the switch handling db.Engine / case domain.EnginePostgres) to execute "pg_ctl
promote" instead of touching a file so the standby is explicitly promoted;
ensure the command is run as the postgres user or with the same environment used
by the containerized postgres process and that the exec path references pg_ctl
available in the postgres image.

Comment on lines 753 to 763
if idempotencyKey != "" {
s.rotationMu.Lock()
if s.rotationCache[idempotencyKey] {
s.rotationMu.Unlock()
return nil // Already rotated
if cachedAt, ok := s.rotationCache[idempotencyKey]; ok {
if time.Since(cachedAt) < s.rotationCacheTTL {
s.rotationMu.Unlock()
return nil // Already rotated
}
// Expired entry - remove it to prevent unbounded map growth
delete(s.rotationCache, idempotencyKey)
}
s.rotationMu.Unlock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reserve idempotency keys while rotation is in flight.

The key is only cached after a successful rotation, so concurrent requests with the same idempotency key can both miss the cache and rotate the password twice.

Consider storing an in-flight entry under the lock and making duplicate callers wait for the first result, or using an equivalent single-flight pattern. On failure, remove the in-flight entry so the caller can retry.

Also applies to: 855-858

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 753 - 763, The current
idempotency handling stores the key only after a successful rotation, allowing
concurrent callers to both miss the cache and perform duplicate rotations;
modify the code around the idempotencyKey handling (use of s.rotationMu,
s.rotationCache, s.rotationCacheTTL) to insert an "in-flight" entry under the
lock before starting rotation: store a struct or sentinel that includes a wait
channel or condition variable and a state (in-flight), unlock and perform the
rotation, then under the lock set the final cachedAt on success and close/signal
the waiting channel so duplicate callers return immediately; on failure remove
the in-flight entry and close/signal with an error so callers can retry. Ensure
duplicate callers detect the in-flight marker, wait on it while holding no
locks, then observe the final result (cached timestamp or absence) to decide
whether to proceed or return.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants