Skip to content

feat(rds): implement persistent volume integration for managed databases#76

Merged
poyrazK merged 9 commits intomainfrom
feature/rds-persistence-v2
Feb 28, 2026
Merged

feat(rds): implement persistent volume integration for managed databases#76
poyrazK merged 9 commits intomainfrom
feature/rds-persistence-v2

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Feb 28, 2026

This PR implements data persistence for the Managed Database Service by integrating it with the VolumeService.

Changes:

  • internal/core/services/database.go: Added logic to provision a 10GB persistent block volume for every new database (primary and replica). The volumes are mounted to /var/lib/postgresql/data (Postgres) or /var/lib/mysql (MySQL).
  • internal/api/setup/dependencies.go: Injected VolumeService into the DatabaseService constructor.
  • tests/database_persistence_e2e_test.go: Added a new E2E test suite to verify that volumes are correctly provisioned during DB creation and cleaned up during DB deletion.
  • Unit/Integration Tests: Updated existing tests to handle the new dependency and verify the persistent volume lifecycle.

Summary by CodeRabbit

  • New Features
    • Databases and read replicas now get automatically provisioned persistent block volumes (10GB) and have them mounted for durable storage.
  • Bug Fixes
    • Volumes created during provisioning are cleaned up on failures and removed when databases/replicas are deleted to avoid orphaned storage.
  • Tests
    • Added unit and end-to-end tests covering volume provisioning, attachment, and cleanup.
  • Documentation
    • Docs and design ADR updated to describe persistence behavior, mounts, and lifecycle.

Copilot AI review requested due to automatic review settings February 28, 2026 13:43
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 28, 2026

📝 Walkthrough

Walkthrough

Injects a VolumeService into the DatabaseService, provisions and binds persistent volumes for databases and replicas, cleans up volumes on failures and deletions, and extends unit and end-to-end tests to cover the volume lifecycle.

Changes

Cohort / File(s) Summary
Dependency Injection Setup
internal/api/setup/dependencies.go
Add VolumeSvc to DatabaseServiceParams and pass volumeSvc when constructing the database service.
Core Database Service Implementation
internal/core/services/database.go
DatabaseService gains a volumeSvc dependency; creates persistent volumes for primary and replica databases, includes volume binds when launching containers, and deletes associated volumes on database deletion or on failed creation flows.
Unit Tests
internal/core/services/database_test.go, internal/core/services/database_unit_test.go
Wire real/noop and mock VolumeService instances into service tests; add Create/List/Delete volume calls and adjust expectations to verify volume lifecycle in unit tests.
End-to-end Tests
tests/database_persistence_e2e_test.go
Add E2E test that creates databases (Postgres/MySQL), verifies volume provisioning by prefix, and ensures volumes are removed after database deletion with polling.
Docs & ADR
docs/FEATURES.md, docs/database.md, docs/adr/ADR-015-database-data-persistence.md
Document persistent volume behavior: automatic 10GB volume provisioning, mount paths per engine, replica handling, automated cleanup, and design rationale/limitations.

Sequence Diagram(s)

sequenceDiagram
    Client->>API: POST /databases
    API->>DatabaseService: CreateDatabase(req)
    DatabaseService->>VolumeSvc: CreateVolume(db-id, 10GB, params)
    VolumeSvc->>VolumeRepo: PersistVolume(...)
    VolumeRepo-->>VolumeSvc: volume
    VolumeSvc-->>DatabaseService: volume
    DatabaseService->>DatabaseRepo: PersistDatabase(with volume)
    DatabaseRepo-->>DatabaseService: database
    DatabaseService-->>API: 201 Created

    Client->>API: DELETE /databases/{id}
    API->>DatabaseService: DeleteDatabase(id)
    DatabaseService->>VolumeSvc: ListVolumes(filter db-id)
    VolumeSvc->>VolumeRepo: QueryVolumes(...)
    VolumeRepo-->>VolumeSvc: volumes
    VolumeSvc-->>DatabaseService: volumes
    DatabaseService->>VolumeSvc: DeleteVolume(volume-id) [*for each*]
    VolumeSvc->>VolumeRepo: RemoveVolume(...)
    VolumeRepo-->>VolumeSvc: removed
    DatabaseService->>DatabaseRepo: DeleteDatabase(id)
    DatabaseRepo-->>DatabaseService: deleted
    DatabaseService-->>API: 200 OK
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I dug a little tunnel near the code,
Planted volumes so data can grow.
I bound and cleaned with nimble paws,
Hopped when tests cheered out applause. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: implementing persistent volume integration for the managed database service.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/rds-persistence-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
internal/core/services/database.go (1)

379-395: Volume lookup by prefix scan is acceptable but consider tracking volume ID.

The current approach iterates all volumes to find the matching one by name prefix. While this works, it could become inefficient at scale. Consider storing the volume ID in the database record for direct lookup in a future iteration.

Additionally, the break on line 392 means only the first matching volume is deleted. If there's any chance of multiple volumes matching (unlikely with current naming), this could leave orphaned volumes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 379 - 395, The current
cleanup scans all volumes via s.volumeSvc.ListVolumes and deletes only the first
match (strings.HasPrefix check) which is inefficient and may leave orphaned
volumes; update the code to persist the volume ID on database creation (add a
volume_id field to the DB record) so future deletes can call
s.volumeSvc.DeleteVolume(ctx, volumeID) directly using that stored ID, and in
the short term remove the premature break in the loop in the cleanup block (the
loop that calls s.volumeSvc.DeleteVolume for matching v.ID) so all matching
volumes are deleted if a stored volume ID isn't yet available; refer to
s.volumeSvc.ListVolumes, s.volumeSvc.DeleteVolume, db.ID and db.Role to locate
the relevant code paths to change.
tests/database_persistence_e2e_test.go (1)

79-80: Fixed sleep may cause flaky tests in CI.

A 1-second sleep might be insufficient under load or excessive in fast environments. Consider polling with a timeout instead for more reliable async cleanup verification.

Example polling approach
// Poll for volume cleanup with timeout
deadline := time.Now().Add(5 * time.Second)
for time.Now().Before(deadline) {
    respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token)
    // ... check if volume is gone ...
    if !found {
        break
    }
    time.Sleep(200 * time.Millisecond)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/database_persistence_e2e_test.go` around lines 79 - 80, Replace the
fixed time.Sleep(1 * time.Second) in tests/database_persistence_e2e_test.go with
a polling loop that repeatedly calls getRequest(t, client,
testutil.TestBaseURL+"/volumes", token) until the desired cleanup condition
(volume not found) is met or a timeout elapses; implement a short interval
between polls (e.g., ~200ms) and a reasonable overall deadline (e.g., ~5s) so
the test waits only as long as needed and fails if cleanup doesn’t happen within
the timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/core/services/database.go`:
- Around line 167-172: Replace the magic literal 10 when creating the replica
volume with the shared volume-size constant used elsewhere (the same one used
for the primary DB volume) instead of a hardcoded number; update the call to
s.volumeSvc.CreateVolume(ctx, volName, <sharedVolumeSizeConstant>) so volName
and the CreateVolume invocation remain, but use the existing constant (e.g.,
DefaultDBVolumeSizeGB or the project’s primary volume-size constant) for the
replica size.
- Around line 80-85: Replace the magic number 10 used when creating the DB
persistent volume with a package-level named constant; add a constant (e.g.
DefaultDatabaseVolumeGB = 10) at the top of the package and update the call to
s.volumeSvc.CreateVolume(ctx, volName, 10) to use DefaultDatabaseVolumeGB
instead, referencing the existing symbols volName, s.volumeSvc.CreateVolume and
db.ID to locate the change.

In `@tests/database_persistence_e2e_test.go`:
- Line 100: The assertion uses fmt.Sprintf unnecessarily; update the call to
assert.False to pass the format string and argument directly instead of wrapping
them in fmt.Sprintf. Locate the assert.False invocation (the one checking found
with the message "Volume with prefix %s should have been deleted") and change it
to assert.False(t, found, "Volume with prefix %s should have been deleted",
expectedPrefix) so the test framework handles formatting.
- Line 70: Replace the call that builds the assertion message with fmt.Sprintf
by passing the format string and argument directly to assert.True; specifically,
update the assert.True(t, found, fmt.Sprintf("Expected volume with prefix %s not
found", expectedPrefix)) call to assert.True(t, found, "Expected volume with
prefix %s not found", expectedPrefix) so the message is passed as the
assertion's format string (affecting the assert.True usage with found and
expectedPrefix).

---

Nitpick comments:
In `@internal/core/services/database.go`:
- Around line 379-395: The current cleanup scans all volumes via
s.volumeSvc.ListVolumes and deletes only the first match (strings.HasPrefix
check) which is inefficient and may leave orphaned volumes; update the code to
persist the volume ID on database creation (add a volume_id field to the DB
record) so future deletes can call s.volumeSvc.DeleteVolume(ctx, volumeID)
directly using that stored ID, and in the short term remove the premature break
in the loop in the cleanup block (the loop that calls s.volumeSvc.DeleteVolume
for matching v.ID) so all matching volumes are deleted if a stored volume ID
isn't yet available; refer to s.volumeSvc.ListVolumes, s.volumeSvc.DeleteVolume,
db.ID and db.Role to locate the relevant code paths to change.

In `@tests/database_persistence_e2e_test.go`:
- Around line 79-80: Replace the fixed time.Sleep(1 * time.Second) in
tests/database_persistence_e2e_test.go with a polling loop that repeatedly calls
getRequest(t, client, testutil.TestBaseURL+"/volumes", token) until the desired
cleanup condition (volume not found) is met or a timeout elapses; implement a
short interval between polls (e.g., ~200ms) and a reasonable overall deadline
(e.g., ~5s) so the test waits only as long as needed and fails if cleanup
doesn’t happen within the timeout.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37ad4c0 and 820da73.

📒 Files selected for processing (5)
  • internal/api/setup/dependencies.go
  • internal/core/services/database.go
  • internal/core/services/database_test.go
  • internal/core/services/database_unit_test.go
  • tests/database_persistence_e2e_test.go

Comment on lines +80 to +85
// Create persistent volume for the database
volName := fmt.Sprintf("db-vol-%s", db.ID.String()[:8])
vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
if err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create persistent volume", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a named constant for the default volume size.

The literal 10 for the volume size in GB should be a named constant for clarity and maintainability.

As per coding guidelines: "Do not use magic numbers - use named constants instead."

Proposed fix

Add a constant at package level:

const (
    // DefaultDatabaseVolumeGB is the default size in GB for database persistent volumes.
    DefaultDatabaseVolumeGB = 10
)

Then update the calls:

-	vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
+	vol, err := s.volumeSvc.CreateVolume(ctx, volName, DefaultDatabaseVolumeGB)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 80 - 85, Replace the magic
number 10 used when creating the DB persistent volume with a package-level named
constant; add a constant (e.g. DefaultDatabaseVolumeGB = 10) at the top of the
package and update the call to s.volumeSvc.CreateVolume(ctx, volName, 10) to use
DefaultDatabaseVolumeGB instead, referencing the existing symbols volName,
s.volumeSvc.CreateVolume and db.ID to locate the change.

Comment on lines +167 to +172
// Create persistent volume for the replica
volName := fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8])
vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
if err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create persistent volume for replica", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same magic number issue for replica volume size.

Apply the same constant here for consistency.

Proposed fix
-	vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
+	vol, err := s.volumeSvc.CreateVolume(ctx, volName, DefaultDatabaseVolumeGB)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Create persistent volume for the replica
volName := fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8])
vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
if err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create persistent volume for replica", err)
}
// Create persistent volume for the replica
volName := fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8])
vol, err := s.volumeSvc.CreateVolume(ctx, volName, DefaultDatabaseVolumeGB)
if err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create persistent volume for replica", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/services/database.go` around lines 167 - 172, Replace the magic
literal 10 when creating the replica volume with the shared volume-size constant
used elsewhere (the same one used for the primary DB volume) instead of a
hardcoded number; update the call to s.volumeSvc.CreateVolume(ctx, volName,
<sharedVolumeSizeConstant>) so volName and the CreateVolume invocation remain,
but use the existing constant (e.g., DefaultDBVolumeSizeGB or the project’s
primary volume-size constant) for the replica size.

Comment thread tests/database_persistence_e2e_test.go Outdated
Comment thread tests/database_persistence_e2e_test.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds persistent storage for managed databases by provisioning a VolumeService-backed volume per database/replica and mounting it into the DB runtime, plus updates wiring and tests to validate provisioning and cleanup.

Changes:

  • Provision a 10GB volume during DB/replica creation and pass it via VolumeBinds to the compute backend.
  • Inject VolumeService into DatabaseService initialization.
  • Add/adjust E2E + unit/integration tests around volume lifecycle.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
internal/core/services/database.go Creates per-DB volumes, mounts them on container start, and attempts volume cleanup on delete.
internal/api/setup/dependencies.go Wires VolumeService into DatabaseService constructor params.
internal/core/services/database_unit_test.go Updates unit tests to mock volume provisioning/deletion behavior.
internal/core/services/database_test.go Updates integration test setup to provide a real VolumeService dependency.
tests/database_persistence_e2e_test.go Adds E2E coverage asserting volume creation and cleanup around DB lifecycle.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 375 to +379
s.logger.Warn("failed to remove database container", "container_id", db.ContainerID, "error", err)
}
}

// 2. Delete from repo
// 2. Clean up volume
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If compute.DeleteInstance fails, the method still proceeds to volume deletion. That can delete the underlying storage while the DB container is potentially still running, risking data loss/corruption. Consider returning an error (or at least skipping volume deletion) when container removal fails, and/or ensure the instance is stopped/removed before decommissioning the volume.

Copilot uses AI. Check for mistakes.
}

backendVolName := "thecloud-vol-" + vol.ID.String()[:8]
if vol.BackendPath != "" {
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vol.BackendPath is being used as the Docker bind source. With the current LVM storage backend, BackendPath is a block device path like /dev/<vg>/<lv> (not a directory), and Docker bind-mounting that onto /var/lib/postgresql/data will fail. Consider using the backend name (e.g. thecloud-vol-<id> / a named volume) when compute.Type()=="docker", or ensure the storage backend returns a mounted directory path suitable for Docker binds.

Suggested change
if vol.BackendPath != "" {
if s.compute.Type() != "docker" && vol.BackendPath != "" {

Copilot uses AI. Check for mistakes.
}

backendVolName := "thecloud-vol-" + vol.ID.String()[:8]
if vol.BackendPath != "" {
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as primary creation: using vol.BackendPath as the bind source will break Docker when the storage backend returns a block device path (e.g. LVM /dev/<vg>/<lv>). Prefer a Docker-compatible source (named volume or mounted directory) or branch on compute.Type()/storage backend type.

Suggested change
if vol.BackendPath != "" {
if vol.BackendPath != "" && !strings.HasPrefix(vol.BackendPath, "/dev/") {

Copilot uses AI. Check for mistakes.
Comment on lines +380 to +395
// We try to find the volume by the expected name
vols, err := s.volumeSvc.ListVolumes(ctx)
if err == nil {
expectedPrefix := fmt.Sprintf("db-vol-%s", db.ID.String()[:8])
if db.Role == domain.RoleReplica {
expectedPrefix = fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8])
}
for _, v := range vols {
if strings.HasPrefix(v.Name, expectedPrefix) {
if err := s.volumeSvc.DeleteVolume(ctx, v.ID.String()); err != nil {
s.logger.Warn("failed to delete database volume", "volume_id", v.ID, "error", err)
}
break
}
}
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume cleanup currently does a full ListVolumes() and deletes the first entry with a matching prefix. This is O(n) per delete and can delete the wrong volume if any other volume happens to share the prefix. Since the expected volume name is deterministic, delete by exact name via volumeSvc.DeleteVolume(ctx, expectedName) (or GetVolume by name) and log/handle the ListVolumes error path explicitly.

Suggested change
// We try to find the volume by the expected name
vols, err := s.volumeSvc.ListVolumes(ctx)
if err == nil {
expectedPrefix := fmt.Sprintf("db-vol-%s", db.ID.String()[:8])
if db.Role == domain.RoleReplica {
expectedPrefix = fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8])
}
for _, v := range vols {
if strings.HasPrefix(v.Name, expectedPrefix) {
if err := s.volumeSvc.DeleteVolume(ctx, v.ID.String()); err != nil {
s.logger.Warn("failed to delete database volume", "volume_id", v.ID, "error", err)
}
break
}
}
}
// Delete the expected volume directly by deterministic name instead of scanning all volumes
expectedName := fmt.Sprintf("db-vol-%s", db.ID.String())
if db.Role == domain.RoleReplica {
expectedName = fmt.Sprintf("db-replica-vol-%s", db.ID.String())
}
if err := s.volumeSvc.DeleteVolume(ctx, expectedName); err != nil {
s.logger.Warn("failed to delete database volume", "volume_name", expectedName, "error", err)
}

Copilot uses AI. Check for mistakes.
Comment thread tests/database_persistence_e2e_test.go Outdated
Comment on lines +79 to +100
// Wait a bit for async cleanup if any
time.Sleep(1 * time.Second)

// Verify Volume is gone
respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token)
defer func() { _ = respVols.Body.Close() }()
require.Equal(t, http.StatusOK, respVols.StatusCode)

var volsRes struct {
Data []domain.Volume `json:"data"`
}
require.NoError(t, json.NewDecoder(respVols.Body).Decode(&volsRes))

found := false
expectedPrefix := fmt.Sprintf("db-vol-%s", dbID[:8])
for _, v := range volsRes.Data {
if strings.HasPrefix(v.Name, expectedPrefix) {
found = true
break
}
}
assert.False(t, found, fmt.Sprintf("Volume with prefix %s should have been deleted", expectedPrefix))
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed time.Sleep(1 * time.Second) makes this E2E test both slower and potentially flaky if volume cleanup takes longer than expected. Prefer polling /volumes until the expected volume disappears (with a reasonable timeout), similar to other E2E tests that retry for eventual consistency.

Suggested change
// Wait a bit for async cleanup if any
time.Sleep(1 * time.Second)
// Verify Volume is gone
respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token)
defer func() { _ = respVols.Body.Close() }()
require.Equal(t, http.StatusOK, respVols.StatusCode)
var volsRes struct {
Data []domain.Volume `json:"data"`
}
require.NoError(t, json.NewDecoder(respVols.Body).Decode(&volsRes))
found := false
expectedPrefix := fmt.Sprintf("db-vol-%s", dbID[:8])
for _, v := range volsRes.Data {
if strings.HasPrefix(v.Name, expectedPrefix) {
found = true
break
}
}
assert.False(t, found, fmt.Sprintf("Volume with prefix %s should have been deleted", expectedPrefix))
// Verify Volume is eventually gone (async cleanup)
expectedPrefix := fmt.Sprintf("db-vol-%s", dbID[:8])
deadline := time.Now().Add(30 * time.Second)
for {
respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token)
func() {
defer func() { _ = respVols.Body.Close() }()
require.Equal(t, http.StatusOK, respVols.StatusCode)
var volsRes struct {
Data []domain.Volume `json:"data"`
}
require.NoError(t, json.NewDecoder(respVols.Body).Decode(&volsRes))
found := false
for _, v := range volsRes.Data {
if strings.HasPrefix(v.Name, expectedPrefix) {
found = true
break
}
}
if !found {
// Volume has been cleaned up
return
}
}()
if time.Now().After(deadline) {
t.Fatalf("Volume with prefix %s should have been deleted within timeout", expectedPrefix)
}
time.Sleep(500 * time.Millisecond)
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/database_persistence_e2e_test.go`:
- Line 31: Replace the magic number modulus in the dbName generation with a
named constant (e.g., const dbSuffixMod = 1000 or better const dbPrefixLen = 8
if switching to hex/UUID) and ensure any slicing of dbID (dbID[:8]) is guarded
by computing min(len(dbID), dbPrefixLen) or by using a helper GetPrefix(dbID
string, n int) to safely return the first n characters; update all occurrences
(dbName creation and the dbID[:8] uses) to use the constant/helper so you avoid
magic numbers and prevent panics if ID format changes.
- Around line 79-85: Replace the fixed time.Sleep with bounded polling that
repeatedly calls getRequest(t, client, testutil.TestBaseURL+"/volumes", token)
(the same logic around respVols and respVols.Body.Close()) until the expected
cleanup condition is observed or a timeout elapses; implement this either using
require.Eventually or a manual loop with time.Ticker and time.After (e.g., poll
every 100–200ms up to a 5s timeout), and then assert the final HTTP status/body
indicates the Volume is gone. Apply the same change to the second occurrence
handling the later cleanup check.
- Line 41: The defer statements currently discarding Close() errors (e.g., defer
func() { _ = resp.Body.Close() }()) should be replaced with deferred closures
that check the error and report it to the test harness instead of ignoring it;
locate the places where the http response variable resp is closed (the deferred
Close calls around resp) and change each to something that captures err :=
resp.Body.Close() and calls t.Fatalf/t.Fatalf-like assertion (or
require.NoError(t, err)) so test failures are surfaced rather than swallowed.
- Around line 34-39: The test CreateDatabase_ProvisionsVolume should be
converted to a table-driven subtest: replace the hard-coded payload using
"engine":"postgres" with a slice of test cases (e.g., cases with engine
"postgres"/version "16" and engine "mysql"/appropriate version) and iterate over
them, running t.Run for each case (use a descriptive name like
fmt.Sprintf("CreateDatabase_ProvisionsVolume/%s", tc.engine)); inside the loop
build the payload map from tc.engine and tc.version, execute the existing
assertions for provisioning and volume lifecycle, and ensure any cleanup runs
per-case so both Postgres and MySQL paths are validated.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 820da73 and 09084b6.

📒 Files selected for processing (1)
  • tests/database_persistence_e2e_test.go

Comment thread tests/database_persistence_e2e_test.go Outdated
Comment thread tests/database_persistence_e2e_test.go Outdated
Comment thread tests/database_persistence_e2e_test.go Outdated
Comment thread tests/database_persistence_e2e_test.go Outdated
Copilot AI review requested due to automatic review settings February 28, 2026 14:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +60 to +63
t.Run(fmt.Sprintf("Engine/%s", tc.engine), func(t *testing.T) {
var dbID string
dbName := fmt.Sprintf("e2e-persistent-%s-%d", tc.engine, time.Now().UnixNano()%dbSuffixMod)

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This E2E test only deletes the database in a later subtest; if the create subtest fails early (e.g., require aborts before dbID is set), the database/volume can be left behind and interfere with subsequent runs. Consider registering a t.Cleanup (once dbID is known) to best-effort delete the DB/volume even when assertions fail.

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +97
backendVolName := "thecloud-vol-" + vol.ID.String()[:8]
if vol.BackendPath != "" {
backendVolName = vol.BackendPath
}

mountPath := "/var/lib/postgresql/data"
if dbEngine == domain.EngineMySQL {
mountPath = "/var/lib/mysql"
}
volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)}

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backendVolName is used as the source for Docker HostConfig.Binds, but vol.BackendPath can be a block-device path (e.g. LVM returns /dev/<vg>/<lv>), which Docker bind-mount cannot mount onto a data directory like /var/lib/postgresql/data. Additionally, the current noop storage backend returns a constant path, which would cause multiple DBs to share the same underlying mount. Consider aligning the storage backend contract with the Docker compute backend (e.g., return a unique host directory or Docker volume name for Docker, and only return /dev/... paths for hypervisors; or introduce a dedicated Docker storage backend / device-mount support).

Copilot uses AI. Check for mistakes.
Comment on lines +80 to 107
// Create persistent volume for the database
volName := fmt.Sprintf("db-vol-%s", db.ID.String()[:8])
vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
if err != nil {
return nil, errors.Wrap(errors.Internal, "failed to create persistent volume", err)
}

backendVolName := "thecloud-vol-" + vol.ID.String()[:8]
if vol.BackendPath != "" {
backendVolName = vol.BackendPath
}

mountPath := "/var/lib/postgresql/data"
if dbEngine == domain.EngineMySQL {
mountPath = "/var/lib/mysql"
}
volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)}

dockerName := fmt.Sprintf("cloud-db-%s-%s", name, db.ID.String()[:8])
containerID, allocatedPorts, err := s.compute.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{
Name: dockerName,
ImageName: imageName,
Ports: []string{"0:" + defaultPort},
NetworkID: networkID,
VolumeBinds: nil,
VolumeBinds: volumeBinds,
Env: env,
Cmd: nil,
})
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A persistent volume is created and mounted, but the volume record is never marked as IN-USE / associated with the running database container. As a result, users can delete the volume via the Volume API while the DB is running (since DeleteVolume only blocks IN-USE), risking data loss. After a successful launch, consider updating the volume metadata (status + attachment info) and clearing it on DB delete, likely via a VolumeService method rather than leaving it AVAILABLE.

Copilot uses AI. Check for mistakes.
Comment on lines 108 to 111
if err != nil {
_ = s.volumeSvc.DeleteVolume(ctx, vol.ID.String())
s.logger.Error("failed to launch database container", "error", err)
return nil, errors.Wrap(errors.Internal, "failed to launch database container", err)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On container launch failure, the rollback deletes the volume but ignores any error. If deletion fails, the DB create request returns an error while leaving an orphaned volume behind with no logging, which will be hard to diagnose operationally. Consider logging a warning (or returning a combined error) when volume cleanup fails in rollback paths.

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +183
backendVolName := "thecloud-vol-" + vol.ID.String()[:8]
if vol.BackendPath != "" {
backendVolName = vol.BackendPath
}

mountPath := "/var/lib/postgresql/data"
if db.Engine == domain.EngineMySQL {
mountPath = "/var/lib/mysql"
}
volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replica volume binding has the same backend-path mismatch risk as primary: vol.BackendPath may be a block device path (LVM) which Docker cannot bind-mount to a directory, and with noop storage it may not be unique. This can prevent replica containers from starting or cause replicas to share storage unexpectedly. Consider making the bind source a Docker-compatible mountpoint/volume name for the Docker backend (or adding device-mount support).

Suggested change
backendVolName := "thecloud-vol-" + vol.ID.String()[:8]
if vol.BackendPath != "" {
backendVolName = vol.BackendPath
}
mountPath := "/var/lib/postgresql/data"
if db.Engine == domain.EngineMySQL {
mountPath = "/var/lib/mysql"
}
volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)}
// Use a Docker-compatible named volume instead of the raw backend path to
// avoid binding block devices or non-unique backend paths.
dockerVolumeName := "thecloud-vol-" + vol.ID.String()
mountPath := "/var/lib/postgresql/data"
if db.Engine == domain.EngineMySQL {
mountPath = "/var/lib/mysql"
}
volumeBinds := []string{fmt.Sprintf("%s:%s", dockerVolumeName, mountPath)}

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +46
volRepo := postgres.NewVolumeRepository(db)
storage := noop.NewNoopStorageBackend()
volumeSvc := services.NewVolumeService(volRepo, storage, eventSvc, auditSvc, slog.Default())

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoopStorageBackend.CreateVolume returns a constant backend path ("vol-1"), so wiring DatabaseService tests to NewNoopStorageBackend() means every DB volume bind will point at the same underlying Docker volume name. This can cause cross-test contamination and make DB lifecycle tests flaky/incorrect. Consider using a storage backend stub that returns a unique backend name per call (e.g., based on the requested name), or adjusting the noop backend to return the provided backend name.

Copilot uses AI. Check for mistakes.
for _, tc := range testCases {
t.Run(fmt.Sprintf("Engine/%s", tc.engine), func(t *testing.T) {
var dbID string
dbName := fmt.Sprintf("e2e-persistent-%s-%d", tc.engine, time.Now().UnixNano()%dbSuffixMod)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbName uses time.Now().UnixNano()%dbSuffixMod with dbSuffixMod=1000, which increases the chance of collisions across repeated/parallel CI runs (only 1000 possible suffixes). Consider using the full UnixNano() value or a UUID-based suffix so test resources are reliably unique.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 28, 2026 14:30
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/database_persistence_e2e_test.go (1)

40-141: Consider adding replica volume lifecycle coverage.

The test validates primary database volume provisioning and cleanup effectively. However, the PR objectives mention that replicas also utilize persistent volumes. Consider extending the test to verify that:

  1. Creating a replica provisions a volume with the db-replica-vol- prefix
  2. Deleting a replica cleans up its associated volume

This would ensure complete E2E coverage of the persistence feature for both primary and replica databases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/database_persistence_e2e_test.go` around lines 40 - 141, Test lacks
replica volume lifecycle checks; extend TestDatabasePersistenceE2E to create a
replica for the created database, assert a volume exists with the
"db-replica-vol-" prefix, then delete that replica and poll until the replica
volume is removed. Use the same helpers used for primaries (postRequest,
getRequest, deleteRequest) to call the replica creation and deletion endpoints,
derive expectedPrefix with fmt.Sprintf("db-replica-vol-%s",
safePrefix(replicaID, dbPrefixLen)), and reuse the existing volume-list
assertions and require.Eventually polling logic to verify provisioning and
cleanup. Ensure you capture and store the replica ID (similar to dbID) from the
replica creation response and defer closeBody on all responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/database_persistence_e2e_test.go`:
- Around line 40-141: Test lacks replica volume lifecycle checks; extend
TestDatabasePersistenceE2E to create a replica for the created database, assert
a volume exists with the "db-replica-vol-" prefix, then delete that replica and
poll until the replica volume is removed. Use the same helpers used for
primaries (postRequest, getRequest, deleteRequest) to call the replica creation
and deletion endpoints, derive expectedPrefix with
fmt.Sprintf("db-replica-vol-%s", safePrefix(replicaID, dbPrefixLen)), and reuse
the existing volume-list assertions and require.Eventually polling logic to
verify provisioning and cleanup. Ensure you capture and store the replica ID
(similar to dbID) from the replica creation response and defer closeBody on all
responses.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09084b6 and 714a2f7.

📒 Files selected for processing (4)
  • docs/FEATURES.md
  • docs/adr/ADR-015-database-data-persistence.md
  • docs/database.md
  • tests/database_persistence_e2e_test.go
✅ Files skipped from review due to trivial changes (1)
  • docs/adr/ADR-015-database-data-persistence.md

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 43 to 53
// NewDatabaseService constructs a DatabaseService with its dependencies.
func NewDatabaseService(params DatabaseServiceParams) *DatabaseService {
return &DatabaseService{
repo: params.Repo,
compute: params.Compute,
vpcRepo: params.VpcRepo,
eventSvc: params.EventSvc,
auditSvc: params.AuditSvc,
logger: params.Logger,
repo: params.Repo,
compute: params.Compute,
vpcRepo: params.VpcRepo,
volumeSvc: params.VolumeSvc,
eventSvc: params.EventSvc,
auditSvc: params.AuditSvc,
logger: params.Logger,
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewDatabaseService accepts VolumeSvc but doesn’t validate it. If a caller constructs the service without VolumeSvc (zero value nil) and then calls CreateDatabase/DeleteDatabase, this will panic at runtime. Consider failing fast in the constructor (return error or panic with a clear message) or defaulting to a no-op implementation to make the dependency requirement explicit.

Copilot uses AI. Check for mistakes.
require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))
dbID = res.Data.ID.String()
assert.NotEmpty(t, dbID)

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this E2E, the database cleanup only happens in the dedicated DeleteDatabase_CleansUpVolume subtest. If CreateDatabase_ProvisionsVolume fails after the API creates the DB but before dbID is set/usable (e.g., decode error), the DB/volume can be left behind and affect later runs. Consider registering a t.Cleanup (or defer) right after successfully decoding/setting dbID to best-effort delete the DB even when intermediate assertions fail.

Suggested change
t.Cleanup(func() {
if dbID == "" {
return
}
respDel := deleteRequest(t, client, testutil.TestBaseURL+"/databases/"+dbID, token)
closeBody(t, respDel)
})

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +90
if vol.BackendPath != "" {
backendVolName = vol.BackendPath
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backendVolName prefers vol.BackendPath when present. With the current NoopStorageBackend, CreateVolume always returns the constant path "vol-1", so every database would bind-mount the same underlying volume, causing cross-database data corruption in dev/test. Consider using the backend volume name you generated (thecloud-vol-<idprefix>) as the bind source (or make BackendPath unique per volume / return the backendName) so each DB gets an isolated mount source.

Suggested change
if vol.BackendPath != "" {
backendVolName = vol.BackendPath
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants