feat(rds): implement persistent volume integration for managed databases#76
feat(rds): implement persistent volume integration for managed databases#76
Conversation
📝 WalkthroughWalkthroughInjects a VolumeService into the DatabaseService, provisions and binds persistent volumes for databases and replicas, cleans up volumes on failures and deletions, and extends unit and end-to-end tests to cover the volume lifecycle. Changes
Sequence Diagram(s)sequenceDiagram
Client->>API: POST /databases
API->>DatabaseService: CreateDatabase(req)
DatabaseService->>VolumeSvc: CreateVolume(db-id, 10GB, params)
VolumeSvc->>VolumeRepo: PersistVolume(...)
VolumeRepo-->>VolumeSvc: volume
VolumeSvc-->>DatabaseService: volume
DatabaseService->>DatabaseRepo: PersistDatabase(with volume)
DatabaseRepo-->>DatabaseService: database
DatabaseService-->>API: 201 Created
Client->>API: DELETE /databases/{id}
API->>DatabaseService: DeleteDatabase(id)
DatabaseService->>VolumeSvc: ListVolumes(filter db-id)
VolumeSvc->>VolumeRepo: QueryVolumes(...)
VolumeRepo-->>VolumeSvc: volumes
VolumeSvc-->>DatabaseService: volumes
DatabaseService->>VolumeSvc: DeleteVolume(volume-id) [*for each*]
VolumeSvc->>VolumeRepo: RemoveVolume(...)
VolumeRepo-->>VolumeSvc: removed
DatabaseService->>DatabaseRepo: DeleteDatabase(id)
DatabaseRepo-->>DatabaseService: deleted
DatabaseService-->>API: 200 OK
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
internal/core/services/database.go (1)
379-395: Volume lookup by prefix scan is acceptable but consider tracking volume ID.The current approach iterates all volumes to find the matching one by name prefix. While this works, it could become inefficient at scale. Consider storing the volume ID in the database record for direct lookup in a future iteration.
Additionally, the
breakon line 392 means only the first matching volume is deleted. If there's any chance of multiple volumes matching (unlikely with current naming), this could leave orphaned volumes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/database.go` around lines 379 - 395, The current cleanup scans all volumes via s.volumeSvc.ListVolumes and deletes only the first match (strings.HasPrefix check) which is inefficient and may leave orphaned volumes; update the code to persist the volume ID on database creation (add a volume_id field to the DB record) so future deletes can call s.volumeSvc.DeleteVolume(ctx, volumeID) directly using that stored ID, and in the short term remove the premature break in the loop in the cleanup block (the loop that calls s.volumeSvc.DeleteVolume for matching v.ID) so all matching volumes are deleted if a stored volume ID isn't yet available; refer to s.volumeSvc.ListVolumes, s.volumeSvc.DeleteVolume, db.ID and db.Role to locate the relevant code paths to change.tests/database_persistence_e2e_test.go (1)
79-80: Fixed sleep may cause flaky tests in CI.A 1-second sleep might be insufficient under load or excessive in fast environments. Consider polling with a timeout instead for more reliable async cleanup verification.
Example polling approach
// Poll for volume cleanup with timeout deadline := time.Now().Add(5 * time.Second) for time.Now().Before(deadline) { respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token) // ... check if volume is gone ... if !found { break } time.Sleep(200 * time.Millisecond) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/database_persistence_e2e_test.go` around lines 79 - 80, Replace the fixed time.Sleep(1 * time.Second) in tests/database_persistence_e2e_test.go with a polling loop that repeatedly calls getRequest(t, client, testutil.TestBaseURL+"/volumes", token) until the desired cleanup condition (volume not found) is met or a timeout elapses; implement a short interval between polls (e.g., ~200ms) and a reasonable overall deadline (e.g., ~5s) so the test waits only as long as needed and fails if cleanup doesn’t happen within the timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/core/services/database.go`:
- Around line 167-172: Replace the magic literal 10 when creating the replica
volume with the shared volume-size constant used elsewhere (the same one used
for the primary DB volume) instead of a hardcoded number; update the call to
s.volumeSvc.CreateVolume(ctx, volName, <sharedVolumeSizeConstant>) so volName
and the CreateVolume invocation remain, but use the existing constant (e.g.,
DefaultDBVolumeSizeGB or the project’s primary volume-size constant) for the
replica size.
- Around line 80-85: Replace the magic number 10 used when creating the DB
persistent volume with a package-level named constant; add a constant (e.g.
DefaultDatabaseVolumeGB = 10) at the top of the package and update the call to
s.volumeSvc.CreateVolume(ctx, volName, 10) to use DefaultDatabaseVolumeGB
instead, referencing the existing symbols volName, s.volumeSvc.CreateVolume and
db.ID to locate the change.
In `@tests/database_persistence_e2e_test.go`:
- Line 100: The assertion uses fmt.Sprintf unnecessarily; update the call to
assert.False to pass the format string and argument directly instead of wrapping
them in fmt.Sprintf. Locate the assert.False invocation (the one checking found
with the message "Volume with prefix %s should have been deleted") and change it
to assert.False(t, found, "Volume with prefix %s should have been deleted",
expectedPrefix) so the test framework handles formatting.
- Line 70: Replace the call that builds the assertion message with fmt.Sprintf
by passing the format string and argument directly to assert.True; specifically,
update the assert.True(t, found, fmt.Sprintf("Expected volume with prefix %s not
found", expectedPrefix)) call to assert.True(t, found, "Expected volume with
prefix %s not found", expectedPrefix) so the message is passed as the
assertion's format string (affecting the assert.True usage with found and
expectedPrefix).
---
Nitpick comments:
In `@internal/core/services/database.go`:
- Around line 379-395: The current cleanup scans all volumes via
s.volumeSvc.ListVolumes and deletes only the first match (strings.HasPrefix
check) which is inefficient and may leave orphaned volumes; update the code to
persist the volume ID on database creation (add a volume_id field to the DB
record) so future deletes can call s.volumeSvc.DeleteVolume(ctx, volumeID)
directly using that stored ID, and in the short term remove the premature break
in the loop in the cleanup block (the loop that calls s.volumeSvc.DeleteVolume
for matching v.ID) so all matching volumes are deleted if a stored volume ID
isn't yet available; refer to s.volumeSvc.ListVolumes, s.volumeSvc.DeleteVolume,
db.ID and db.Role to locate the relevant code paths to change.
In `@tests/database_persistence_e2e_test.go`:
- Around line 79-80: Replace the fixed time.Sleep(1 * time.Second) in
tests/database_persistence_e2e_test.go with a polling loop that repeatedly calls
getRequest(t, client, testutil.TestBaseURL+"/volumes", token) until the desired
cleanup condition (volume not found) is met or a timeout elapses; implement a
short interval between polls (e.g., ~200ms) and a reasonable overall deadline
(e.g., ~5s) so the test waits only as long as needed and fails if cleanup
doesn’t happen within the timeout.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
internal/api/setup/dependencies.gointernal/core/services/database.gointernal/core/services/database_test.gointernal/core/services/database_unit_test.gotests/database_persistence_e2e_test.go
| // Create persistent volume for the database | ||
| volName := fmt.Sprintf("db-vol-%s", db.ID.String()[:8]) | ||
| vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default | ||
| if err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to create persistent volume", err) | ||
| } |
There was a problem hiding this comment.
Use a named constant for the default volume size.
The literal 10 for the volume size in GB should be a named constant for clarity and maintainability.
As per coding guidelines: "Do not use magic numbers - use named constants instead."
Proposed fix
Add a constant at package level:
const (
// DefaultDatabaseVolumeGB is the default size in GB for database persistent volumes.
DefaultDatabaseVolumeGB = 10
)Then update the calls:
- vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
+ vol, err := s.volumeSvc.CreateVolume(ctx, volName, DefaultDatabaseVolumeGB)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 80 - 85, Replace the magic
number 10 used when creating the DB persistent volume with a package-level named
constant; add a constant (e.g. DefaultDatabaseVolumeGB = 10) at the top of the
package and update the call to s.volumeSvc.CreateVolume(ctx, volName, 10) to use
DefaultDatabaseVolumeGB instead, referencing the existing symbols volName,
s.volumeSvc.CreateVolume and db.ID to locate the change.
| // Create persistent volume for the replica | ||
| volName := fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8]) | ||
| vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default | ||
| if err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to create persistent volume for replica", err) | ||
| } |
There was a problem hiding this comment.
Same magic number issue for replica volume size.
Apply the same constant here for consistency.
Proposed fix
- vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default
+ vol, err := s.volumeSvc.CreateVolume(ctx, volName, DefaultDatabaseVolumeGB)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Create persistent volume for the replica | |
| volName := fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8]) | |
| vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default | |
| if err != nil { | |
| return nil, errors.Wrap(errors.Internal, "failed to create persistent volume for replica", err) | |
| } | |
| // Create persistent volume for the replica | |
| volName := fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8]) | |
| vol, err := s.volumeSvc.CreateVolume(ctx, volName, DefaultDatabaseVolumeGB) | |
| if err != nil { | |
| return nil, errors.Wrap(errors.Internal, "failed to create persistent volume for replica", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/services/database.go` around lines 167 - 172, Replace the magic
literal 10 when creating the replica volume with the shared volume-size constant
used elsewhere (the same one used for the primary DB volume) instead of a
hardcoded number; update the call to s.volumeSvc.CreateVolume(ctx, volName,
<sharedVolumeSizeConstant>) so volName and the CreateVolume invocation remain,
but use the existing constant (e.g., DefaultDBVolumeSizeGB or the project’s
primary volume-size constant) for the replica size.
There was a problem hiding this comment.
Pull request overview
Adds persistent storage for managed databases by provisioning a VolumeService-backed volume per database/replica and mounting it into the DB runtime, plus updates wiring and tests to validate provisioning and cleanup.
Changes:
- Provision a 10GB volume during DB/replica creation and pass it via
VolumeBindsto the compute backend. - Inject
VolumeServiceintoDatabaseServiceinitialization. - Add/adjust E2E + unit/integration tests around volume lifecycle.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
internal/core/services/database.go |
Creates per-DB volumes, mounts them on container start, and attempts volume cleanup on delete. |
internal/api/setup/dependencies.go |
Wires VolumeService into DatabaseService constructor params. |
internal/core/services/database_unit_test.go |
Updates unit tests to mock volume provisioning/deletion behavior. |
internal/core/services/database_test.go |
Updates integration test setup to provide a real VolumeService dependency. |
tests/database_persistence_e2e_test.go |
Adds E2E coverage asserting volume creation and cleanup around DB lifecycle. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| s.logger.Warn("failed to remove database container", "container_id", db.ContainerID, "error", err) | ||
| } | ||
| } | ||
|
|
||
| // 2. Delete from repo | ||
| // 2. Clean up volume |
There was a problem hiding this comment.
If compute.DeleteInstance fails, the method still proceeds to volume deletion. That can delete the underlying storage while the DB container is potentially still running, risking data loss/corruption. Consider returning an error (or at least skipping volume deletion) when container removal fails, and/or ensure the instance is stopped/removed before decommissioning the volume.
| } | ||
|
|
||
| backendVolName := "thecloud-vol-" + vol.ID.String()[:8] | ||
| if vol.BackendPath != "" { |
There was a problem hiding this comment.
vol.BackendPath is being used as the Docker bind source. With the current LVM storage backend, BackendPath is a block device path like /dev/<vg>/<lv> (not a directory), and Docker bind-mounting that onto /var/lib/postgresql/data will fail. Consider using the backend name (e.g. thecloud-vol-<id> / a named volume) when compute.Type()=="docker", or ensure the storage backend returns a mounted directory path suitable for Docker binds.
| if vol.BackendPath != "" { | |
| if s.compute.Type() != "docker" && vol.BackendPath != "" { |
| } | ||
|
|
||
| backendVolName := "thecloud-vol-" + vol.ID.String()[:8] | ||
| if vol.BackendPath != "" { |
There was a problem hiding this comment.
Same issue as primary creation: using vol.BackendPath as the bind source will break Docker when the storage backend returns a block device path (e.g. LVM /dev/<vg>/<lv>). Prefer a Docker-compatible source (named volume or mounted directory) or branch on compute.Type()/storage backend type.
| if vol.BackendPath != "" { | |
| if vol.BackendPath != "" && !strings.HasPrefix(vol.BackendPath, "/dev/") { |
| // We try to find the volume by the expected name | ||
| vols, err := s.volumeSvc.ListVolumes(ctx) | ||
| if err == nil { | ||
| expectedPrefix := fmt.Sprintf("db-vol-%s", db.ID.String()[:8]) | ||
| if db.Role == domain.RoleReplica { | ||
| expectedPrefix = fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8]) | ||
| } | ||
| for _, v := range vols { | ||
| if strings.HasPrefix(v.Name, expectedPrefix) { | ||
| if err := s.volumeSvc.DeleteVolume(ctx, v.ID.String()); err != nil { | ||
| s.logger.Warn("failed to delete database volume", "volume_id", v.ID, "error", err) | ||
| } | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Volume cleanup currently does a full ListVolumes() and deletes the first entry with a matching prefix. This is O(n) per delete and can delete the wrong volume if any other volume happens to share the prefix. Since the expected volume name is deterministic, delete by exact name via volumeSvc.DeleteVolume(ctx, expectedName) (or GetVolume by name) and log/handle the ListVolumes error path explicitly.
| // We try to find the volume by the expected name | |
| vols, err := s.volumeSvc.ListVolumes(ctx) | |
| if err == nil { | |
| expectedPrefix := fmt.Sprintf("db-vol-%s", db.ID.String()[:8]) | |
| if db.Role == domain.RoleReplica { | |
| expectedPrefix = fmt.Sprintf("db-replica-vol-%s", db.ID.String()[:8]) | |
| } | |
| for _, v := range vols { | |
| if strings.HasPrefix(v.Name, expectedPrefix) { | |
| if err := s.volumeSvc.DeleteVolume(ctx, v.ID.String()); err != nil { | |
| s.logger.Warn("failed to delete database volume", "volume_id", v.ID, "error", err) | |
| } | |
| break | |
| } | |
| } | |
| } | |
| // Delete the expected volume directly by deterministic name instead of scanning all volumes | |
| expectedName := fmt.Sprintf("db-vol-%s", db.ID.String()) | |
| if db.Role == domain.RoleReplica { | |
| expectedName = fmt.Sprintf("db-replica-vol-%s", db.ID.String()) | |
| } | |
| if err := s.volumeSvc.DeleteVolume(ctx, expectedName); err != nil { | |
| s.logger.Warn("failed to delete database volume", "volume_name", expectedName, "error", err) | |
| } |
| // Wait a bit for async cleanup if any | ||
| time.Sleep(1 * time.Second) | ||
|
|
||
| // Verify Volume is gone | ||
| respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token) | ||
| defer func() { _ = respVols.Body.Close() }() | ||
| require.Equal(t, http.StatusOK, respVols.StatusCode) | ||
|
|
||
| var volsRes struct { | ||
| Data []domain.Volume `json:"data"` | ||
| } | ||
| require.NoError(t, json.NewDecoder(respVols.Body).Decode(&volsRes)) | ||
|
|
||
| found := false | ||
| expectedPrefix := fmt.Sprintf("db-vol-%s", dbID[:8]) | ||
| for _, v := range volsRes.Data { | ||
| if strings.HasPrefix(v.Name, expectedPrefix) { | ||
| found = true | ||
| break | ||
| } | ||
| } | ||
| assert.False(t, found, fmt.Sprintf("Volume with prefix %s should have been deleted", expectedPrefix)) |
There was a problem hiding this comment.
The fixed time.Sleep(1 * time.Second) makes this E2E test both slower and potentially flaky if volume cleanup takes longer than expected. Prefer polling /volumes until the expected volume disappears (with a reasonable timeout), similar to other E2E tests that retry for eventual consistency.
| // Wait a bit for async cleanup if any | |
| time.Sleep(1 * time.Second) | |
| // Verify Volume is gone | |
| respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token) | |
| defer func() { _ = respVols.Body.Close() }() | |
| require.Equal(t, http.StatusOK, respVols.StatusCode) | |
| var volsRes struct { | |
| Data []domain.Volume `json:"data"` | |
| } | |
| require.NoError(t, json.NewDecoder(respVols.Body).Decode(&volsRes)) | |
| found := false | |
| expectedPrefix := fmt.Sprintf("db-vol-%s", dbID[:8]) | |
| for _, v := range volsRes.Data { | |
| if strings.HasPrefix(v.Name, expectedPrefix) { | |
| found = true | |
| break | |
| } | |
| } | |
| assert.False(t, found, fmt.Sprintf("Volume with prefix %s should have been deleted", expectedPrefix)) | |
| // Verify Volume is eventually gone (async cleanup) | |
| expectedPrefix := fmt.Sprintf("db-vol-%s", dbID[:8]) | |
| deadline := time.Now().Add(30 * time.Second) | |
| for { | |
| respVols := getRequest(t, client, testutil.TestBaseURL+"/volumes", token) | |
| func() { | |
| defer func() { _ = respVols.Body.Close() }() | |
| require.Equal(t, http.StatusOK, respVols.StatusCode) | |
| var volsRes struct { | |
| Data []domain.Volume `json:"data"` | |
| } | |
| require.NoError(t, json.NewDecoder(respVols.Body).Decode(&volsRes)) | |
| found := false | |
| for _, v := range volsRes.Data { | |
| if strings.HasPrefix(v.Name, expectedPrefix) { | |
| found = true | |
| break | |
| } | |
| } | |
| if !found { | |
| // Volume has been cleaned up | |
| return | |
| } | |
| }() | |
| if time.Now().After(deadline) { | |
| t.Fatalf("Volume with prefix %s should have been deleted within timeout", expectedPrefix) | |
| } | |
| time.Sleep(500 * time.Millisecond) | |
| } |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/database_persistence_e2e_test.go`:
- Line 31: Replace the magic number modulus in the dbName generation with a
named constant (e.g., const dbSuffixMod = 1000 or better const dbPrefixLen = 8
if switching to hex/UUID) and ensure any slicing of dbID (dbID[:8]) is guarded
by computing min(len(dbID), dbPrefixLen) or by using a helper GetPrefix(dbID
string, n int) to safely return the first n characters; update all occurrences
(dbName creation and the dbID[:8] uses) to use the constant/helper so you avoid
magic numbers and prevent panics if ID format changes.
- Around line 79-85: Replace the fixed time.Sleep with bounded polling that
repeatedly calls getRequest(t, client, testutil.TestBaseURL+"/volumes", token)
(the same logic around respVols and respVols.Body.Close()) until the expected
cleanup condition is observed or a timeout elapses; implement this either using
require.Eventually or a manual loop with time.Ticker and time.After (e.g., poll
every 100–200ms up to a 5s timeout), and then assert the final HTTP status/body
indicates the Volume is gone. Apply the same change to the second occurrence
handling the later cleanup check.
- Line 41: The defer statements currently discarding Close() errors (e.g., defer
func() { _ = resp.Body.Close() }()) should be replaced with deferred closures
that check the error and report it to the test harness instead of ignoring it;
locate the places where the http response variable resp is closed (the deferred
Close calls around resp) and change each to something that captures err :=
resp.Body.Close() and calls t.Fatalf/t.Fatalf-like assertion (or
require.NoError(t, err)) so test failures are surfaced rather than swallowed.
- Around line 34-39: The test CreateDatabase_ProvisionsVolume should be
converted to a table-driven subtest: replace the hard-coded payload using
"engine":"postgres" with a slice of test cases (e.g., cases with engine
"postgres"/version "16" and engine "mysql"/appropriate version) and iterate over
them, running t.Run for each case (use a descriptive name like
fmt.Sprintf("CreateDatabase_ProvisionsVolume/%s", tc.engine)); inside the loop
build the payload map from tc.engine and tc.version, execute the existing
assertions for provisioning and volume lifecycle, and ensure any cleanup runs
per-case so both Postgres and MySQL paths are validated.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| t.Run(fmt.Sprintf("Engine/%s", tc.engine), func(t *testing.T) { | ||
| var dbID string | ||
| dbName := fmt.Sprintf("e2e-persistent-%s-%d", tc.engine, time.Now().UnixNano()%dbSuffixMod) | ||
|
|
There was a problem hiding this comment.
This E2E test only deletes the database in a later subtest; if the create subtest fails early (e.g., require aborts before dbID is set), the database/volume can be left behind and interfere with subsequent runs. Consider registering a t.Cleanup (once dbID is known) to best-effort delete the DB/volume even when assertions fail.
| backendVolName := "thecloud-vol-" + vol.ID.String()[:8] | ||
| if vol.BackendPath != "" { | ||
| backendVolName = vol.BackendPath | ||
| } | ||
|
|
||
| mountPath := "/var/lib/postgresql/data" | ||
| if dbEngine == domain.EngineMySQL { | ||
| mountPath = "/var/lib/mysql" | ||
| } | ||
| volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)} | ||
|
|
There was a problem hiding this comment.
backendVolName is used as the source for Docker HostConfig.Binds, but vol.BackendPath can be a block-device path (e.g. LVM returns /dev/<vg>/<lv>), which Docker bind-mount cannot mount onto a data directory like /var/lib/postgresql/data. Additionally, the current noop storage backend returns a constant path, which would cause multiple DBs to share the same underlying mount. Consider aligning the storage backend contract with the Docker compute backend (e.g., return a unique host directory or Docker volume name for Docker, and only return /dev/... paths for hypervisors; or introduce a dedicated Docker storage backend / device-mount support).
| // Create persistent volume for the database | ||
| volName := fmt.Sprintf("db-vol-%s", db.ID.String()[:8]) | ||
| vol, err := s.volumeSvc.CreateVolume(ctx, volName, 10) // 10GB default | ||
| if err != nil { | ||
| return nil, errors.Wrap(errors.Internal, "failed to create persistent volume", err) | ||
| } | ||
|
|
||
| backendVolName := "thecloud-vol-" + vol.ID.String()[:8] | ||
| if vol.BackendPath != "" { | ||
| backendVolName = vol.BackendPath | ||
| } | ||
|
|
||
| mountPath := "/var/lib/postgresql/data" | ||
| if dbEngine == domain.EngineMySQL { | ||
| mountPath = "/var/lib/mysql" | ||
| } | ||
| volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)} | ||
|
|
||
| dockerName := fmt.Sprintf("cloud-db-%s-%s", name, db.ID.String()[:8]) | ||
| containerID, allocatedPorts, err := s.compute.LaunchInstanceWithOptions(ctx, ports.CreateInstanceOptions{ | ||
| Name: dockerName, | ||
| ImageName: imageName, | ||
| Ports: []string{"0:" + defaultPort}, | ||
| NetworkID: networkID, | ||
| VolumeBinds: nil, | ||
| VolumeBinds: volumeBinds, | ||
| Env: env, | ||
| Cmd: nil, | ||
| }) |
There was a problem hiding this comment.
A persistent volume is created and mounted, but the volume record is never marked as IN-USE / associated with the running database container. As a result, users can delete the volume via the Volume API while the DB is running (since DeleteVolume only blocks IN-USE), risking data loss. After a successful launch, consider updating the volume metadata (status + attachment info) and clearing it on DB delete, likely via a VolumeService method rather than leaving it AVAILABLE.
| if err != nil { | ||
| _ = s.volumeSvc.DeleteVolume(ctx, vol.ID.String()) | ||
| s.logger.Error("failed to launch database container", "error", err) | ||
| return nil, errors.Wrap(errors.Internal, "failed to launch database container", err) |
There was a problem hiding this comment.
On container launch failure, the rollback deletes the volume but ignores any error. If deletion fails, the DB create request returns an error while leaving an orphaned volume behind with no logging, which will be hard to diagnose operationally. Consider logging a warning (or returning a combined error) when volume cleanup fails in rollback paths.
| backendVolName := "thecloud-vol-" + vol.ID.String()[:8] | ||
| if vol.BackendPath != "" { | ||
| backendVolName = vol.BackendPath | ||
| } | ||
|
|
||
| mountPath := "/var/lib/postgresql/data" | ||
| if db.Engine == domain.EngineMySQL { | ||
| mountPath = "/var/lib/mysql" | ||
| } | ||
| volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)} |
There was a problem hiding this comment.
Replica volume binding has the same backend-path mismatch risk as primary: vol.BackendPath may be a block device path (LVM) which Docker cannot bind-mount to a directory, and with noop storage it may not be unique. This can prevent replica containers from starting or cause replicas to share storage unexpectedly. Consider making the bind source a Docker-compatible mountpoint/volume name for the Docker backend (or adding device-mount support).
| backendVolName := "thecloud-vol-" + vol.ID.String()[:8] | |
| if vol.BackendPath != "" { | |
| backendVolName = vol.BackendPath | |
| } | |
| mountPath := "/var/lib/postgresql/data" | |
| if db.Engine == domain.EngineMySQL { | |
| mountPath = "/var/lib/mysql" | |
| } | |
| volumeBinds := []string{fmt.Sprintf("%s:%s", backendVolName, mountPath)} | |
| // Use a Docker-compatible named volume instead of the raw backend path to | |
| // avoid binding block devices or non-unique backend paths. | |
| dockerVolumeName := "thecloud-vol-" + vol.ID.String() | |
| mountPath := "/var/lib/postgresql/data" | |
| if db.Engine == domain.EngineMySQL { | |
| mountPath = "/var/lib/mysql" | |
| } | |
| volumeBinds := []string{fmt.Sprintf("%s:%s", dockerVolumeName, mountPath)} |
| volRepo := postgres.NewVolumeRepository(db) | ||
| storage := noop.NewNoopStorageBackend() | ||
| volumeSvc := services.NewVolumeService(volRepo, storage, eventSvc, auditSvc, slog.Default()) | ||
|
|
There was a problem hiding this comment.
NoopStorageBackend.CreateVolume returns a constant backend path ("vol-1"), so wiring DatabaseService tests to NewNoopStorageBackend() means every DB volume bind will point at the same underlying Docker volume name. This can cause cross-test contamination and make DB lifecycle tests flaky/incorrect. Consider using a storage backend stub that returns a unique backend name per call (e.g., based on the requested name), or adjusting the noop backend to return the provided backend name.
| for _, tc := range testCases { | ||
| t.Run(fmt.Sprintf("Engine/%s", tc.engine), func(t *testing.T) { | ||
| var dbID string | ||
| dbName := fmt.Sprintf("e2e-persistent-%s-%d", tc.engine, time.Now().UnixNano()%dbSuffixMod) |
There was a problem hiding this comment.
dbName uses time.Now().UnixNano()%dbSuffixMod with dbSuffixMod=1000, which increases the chance of collisions across repeated/parallel CI runs (only 1000 possible suffixes). Consider using the full UnixNano() value or a UUID-based suffix so test resources are reliably unique.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/database_persistence_e2e_test.go (1)
40-141: Consider adding replica volume lifecycle coverage.The test validates primary database volume provisioning and cleanup effectively. However, the PR objectives mention that replicas also utilize persistent volumes. Consider extending the test to verify that:
- Creating a replica provisions a volume with the
db-replica-vol-prefix- Deleting a replica cleans up its associated volume
This would ensure complete E2E coverage of the persistence feature for both primary and replica databases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/database_persistence_e2e_test.go` around lines 40 - 141, Test lacks replica volume lifecycle checks; extend TestDatabasePersistenceE2E to create a replica for the created database, assert a volume exists with the "db-replica-vol-" prefix, then delete that replica and poll until the replica volume is removed. Use the same helpers used for primaries (postRequest, getRequest, deleteRequest) to call the replica creation and deletion endpoints, derive expectedPrefix with fmt.Sprintf("db-replica-vol-%s", safePrefix(replicaID, dbPrefixLen)), and reuse the existing volume-list assertions and require.Eventually polling logic to verify provisioning and cleanup. Ensure you capture and store the replica ID (similar to dbID) from the replica creation response and defer closeBody on all responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/database_persistence_e2e_test.go`:
- Around line 40-141: Test lacks replica volume lifecycle checks; extend
TestDatabasePersistenceE2E to create a replica for the created database, assert
a volume exists with the "db-replica-vol-" prefix, then delete that replica and
poll until the replica volume is removed. Use the same helpers used for
primaries (postRequest, getRequest, deleteRequest) to call the replica creation
and deletion endpoints, derive expectedPrefix with
fmt.Sprintf("db-replica-vol-%s", safePrefix(replicaID, dbPrefixLen)), and reuse
the existing volume-list assertions and require.Eventually polling logic to
verify provisioning and cleanup. Ensure you capture and store the replica ID
(similar to dbID) from the replica creation response and defer closeBody on all
responses.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/FEATURES.mddocs/adr/ADR-015-database-data-persistence.mddocs/database.mdtests/database_persistence_e2e_test.go
✅ Files skipped from review due to trivial changes (1)
- docs/adr/ADR-015-database-data-persistence.md
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // NewDatabaseService constructs a DatabaseService with its dependencies. | ||
| func NewDatabaseService(params DatabaseServiceParams) *DatabaseService { | ||
| return &DatabaseService{ | ||
| repo: params.Repo, | ||
| compute: params.Compute, | ||
| vpcRepo: params.VpcRepo, | ||
| eventSvc: params.EventSvc, | ||
| auditSvc: params.AuditSvc, | ||
| logger: params.Logger, | ||
| repo: params.Repo, | ||
| compute: params.Compute, | ||
| vpcRepo: params.VpcRepo, | ||
| volumeSvc: params.VolumeSvc, | ||
| eventSvc: params.EventSvc, | ||
| auditSvc: params.AuditSvc, | ||
| logger: params.Logger, | ||
| } |
There was a problem hiding this comment.
NewDatabaseService accepts VolumeSvc but doesn’t validate it. If a caller constructs the service without VolumeSvc (zero value nil) and then calls CreateDatabase/DeleteDatabase, this will panic at runtime. Consider failing fast in the constructor (return error or panic with a clear message) or defaulting to a no-op implementation to make the dependency requirement explicit.
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | ||
| dbID = res.Data.ID.String() | ||
| assert.NotEmpty(t, dbID) | ||
|
|
There was a problem hiding this comment.
In this E2E, the database cleanup only happens in the dedicated DeleteDatabase_CleansUpVolume subtest. If CreateDatabase_ProvisionsVolume fails after the API creates the DB but before dbID is set/usable (e.g., decode error), the DB/volume can be left behind and affect later runs. Consider registering a t.Cleanup (or defer) right after successfully decoding/setting dbID to best-effort delete the DB even when intermediate assertions fail.
| t.Cleanup(func() { | |
| if dbID == "" { | |
| return | |
| } | |
| respDel := deleteRequest(t, client, testutil.TestBaseURL+"/databases/"+dbID, token) | |
| closeBody(t, respDel) | |
| }) |
| if vol.BackendPath != "" { | ||
| backendVolName = vol.BackendPath | ||
| } |
There was a problem hiding this comment.
backendVolName prefers vol.BackendPath when present. With the current NoopStorageBackend, CreateVolume always returns the constant path "vol-1", so every database would bind-mount the same underlying volume, causing cross-database data corruption in dev/test. Consider using the backend volume name you generated (thecloud-vol-<idprefix>) as the bind source (or make BackendPath unique per volume / return the backendName) so each DB gets an isolated mount source.
| if vol.BackendPath != "" { | |
| backendVolName = vol.BackendPath | |
| } |
This PR implements data persistence for the Managed Database Service by integrating it with the VolumeService.
Changes:
Summary by CodeRabbit