Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds a CloudLogs subsystem: domain types and ports, PostgreSQL-backed repository and migrations, CloudLogs service (ingest/search/retention) with trace propagation, HTTP handlers and SDK/CLI commands, a background LogWorker for retention, InstanceService ingestion wiring, DI/router changes, and tests (unit, repo, worker, handler, e2e), plus API docs and Swagger updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler as LogHandler
participant Service as CloudLogsService
participant Repo as LogRepository
participant DB as PostgreSQL
Client->>Handler: GET /logs?resource_type=instance
Handler->>Handler: parse params (time, pagination)
Handler->>Service: SearchLogs(LogQuery)
Service->>Repo: List(ctx, query)
Repo->>DB: COUNT / SELECT ... WHERE ... ORDER BY timestamp DESC LIMIT/OFFSET
DB-->>Repo: rows + total
Repo-->>Service: []*LogEntry, total
Service-->>Handler: entries, total
Handler-->>Client: 200 {data: {entries, total, limit, offset}}
sequenceDiagram
participant Instance as InstanceService
participant Container as ContainerRuntime
participant LogSvc as CloudLogsService
participant Repo as LogRepository
participant DB as PostgreSQL
Instance->>Container: fetch stdout/stderr logs
Container-->>Instance: raw log lines
Instance->>LogSvc: IngestLogs(entries)
LogSvc->>LogSvc: extract/propagate TraceID from ctx
LogSvc->>Repo: Create(ctx, entries)
Repo->>DB: INSERT INTO log_entries (...)
DB-->>Repo: success
Repo-->>LogSvc: nil
LogSvc-->>Instance: nil
Instance->>Container: remove container
sequenceDiagram
participant Worker as LogWorker
participant Service as CloudLogsService
participant Repo as LogRepository
participant DB as PostgreSQL
participant Timer as Ticker
Worker->>Service: RunRetentionPolicy(30 days)
Service->>Repo: DeleteByAge(ctx, 30)
Repo->>DB: DELETE FROM log_entries WHERE timestamp < NOW() - INTERVAL '30 days'
DB-->>Repo: rows deleted
Repo-->>Service: nil
Service-->>Worker: nil
Worker->>Timer: wait 24h
Timer-->>Worker: tick -> RunRetentionPolicy(...)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
🤖 Fix all issues with AI agents
In `@cmd/cloud/logs.go`:
- Around line 46-49: The json.MarshalIndent call in the outputJSON branch
currently discards its error (data, _ := json.MarshalIndent(res, "", " "));
change this to capture and check the error, and on failure write an error
message to stderr (including the error detail) and exit non‑zero or return an
error so the CLI doesn't silently continue; do the same fix for the identical
pattern in logsShowCmd so both json.MarshalIndent usages validate errors and
report them (refer to the outputJSON branch, the json.MarshalIndent call, and
logsShowCmd).
In `@docs/api-reference.md`:
- Around line 516-517: Update the API reference route from "GET /logs/:id" to
"GET /logs/:resource_id" and change any accompanying parameter names and
examples to match (e.g., path templates, curl examples, request/response
samples, and the parameters table). Ensure the documented path parameter name
matches the implementation plan and handler semantics so all references to ":id"
in this endpoint are replaced with ":resource_id" for consistency.
In `@docs/cli-reference.md`:
- Line 122: Fix the typo in the CLI reference table: change the mismatched
quoted entry `--volume"` to the correct backticked flag `--volume` in the table
row (the cell currently showing `--volume"` | `-V` | No | - | Volume attachment
(vol-name:/path) ), ensuring the backticks correctly wrap the flag name.
In `@docs/plans/cloudlogs-implementation.md`:
- Around line 37-40: Update Phase 6 checklist to reflect the tests added in the
PR: mark "Unit tests for `LogService`" as done (checked), and if unit tests for
the handler and worker were also added, add and check items like "Unit tests for
`LogHandler`" and "Unit tests for `LogWorker`"; leave "Integration tests for
`LogRepository`" and the smoke test unchecked unless those were actually
implemented, in which case check them too so the plan matches reality.
In `@internal/api/setup/router.go`:
- Line 68: The Handlers struct's Log field is never initialized and no /logs
routes are registered; add initialization of handlers.Log using
NewLogHandler(svcs.Log) inside InitHandlers where other handlers are created,
and then register a top-level log group in SetupRouter (e.g., r.Group("/logs"))
that applies the same middleware (httputil.Auth(svcs.Identity, svcs.Tenant),
httputil.RequireTenant()) and wires handlers.Log.Search to GET "" and
handlers.Log.GetByResource to GET "/:id" so the LogHandler is both non-nil and
reachable at runtime.
In `@internal/core/services/cloudlogs_test.go`:
- Line 66: The test currently discards errors from svc.IngestLogs by assigning
to the blank identifier; update both calls to capture the returned error (from
svc.IngestLogs(ctx, []*domain.LogEntry{...})) and assert success instead of
ignoring it—e.g., replace `_ = svc.IngestLogs(...)` with `err :=
svc.IngestLogs(...)` followed by a test assertion like `require.NoError(t, err)`
(or t.Fatalf on error) to fail fast when setup fails; do this for both
occurrences of svc.IngestLogs in the test.
In `@internal/core/services/cloudlogs.go`:
- Around line 52-54: In CloudLogsService.RunRetentionPolicy validate the days
parameter and return a service-layer InvalidInput error instead of calling the
repo when days is non-positive: check if days <= 0, and if so return
internal/errors.New(internal/errors.InvalidInput, "invalid retention days; must
be > 0"); otherwise proceed to log and call s.repo.DeleteByAge(ctx, days).
Ensure you reference RunRetentionPolicy and s.repo.DeleteByAge in the change.
In `@internal/handlers/log_handler.go`:
- Around line 66-69: The code silently ignores strconv.Atoi errors for
pagination (limit, offset) which masks bad input; update the handler to check
the errors returned by strconv.Atoi when parsing c.DefaultQuery("limit", "100")
and c.DefaultQuery("offset", "0") and return a 400 Bad Request with a clear
error message if either value is non-numeric instead of assigning to the blank
identifier. Specifically, after parsing into limit and offset, if err != nil for
either parse, use the request context (the same c) to abort/respond with status
400 and a JSON or text error explaining which parameter is invalid; otherwise
assign the validated limit and offset to query.Limit and query.Offset as before.
- Around line 52-63: The handler currently swallows time.Parse errors when
reading c.Query("start_time") and c.Query("end_time") and silently ignores
invalid filters; change the logic to validate parsing and return a 400 Bad
Request with a clear error message when time.Parse(time.RFC3339, ...) fails.
Specifically, inside the same handler that reads c.Query use the returned err
from time.Parse and if non-nil call the request context's error response (e.g.,
abort/return 400 JSON) rather than setting nothing; continue to set
query.StartTime and query.EndTime only on successful parse. Ensure the response
includes which parameter ("start_time" or "end_time") was invalid and the parse
error text so callers know why the request was rejected.
In `@internal/repositories/postgres/log_repo.go`:
- Around line 114-124: After the rows.Next() loop in the function that builds
entries (using domain.LogEntry), add a check for rows.Err() immediately before
returning: if err := rows.Err(); err != nil { return nil, total,
errors.Wrap(errors.Internal, "error iterating log rows", err) } so iteration
errors (network/scan failures) are propagated instead of returning partial
entries; place this check after the loop and before the final return of entries,
total, nil and use the same errors.Wrap pattern used elsewhere.
In `@internal/workers/log_worker.go`:
- Line 35: The initial call to w.logSvc.RunRetentionPolicy(ctx, w.retentionDays)
currently discards errors; update the startup call in log_worker.go to capture
the returned error and handle it the same way as the tick handler does (check
the error and call the logger to record it), e.g. assign the result to err and
log a descriptive message if err != nil so that failures from
w.logSvc.RunRetentionPolicy and the ctx/w.retentionDays invocation are not
silently ignored.
In `@pkg/sdk/logs.go`:
- Around line 43-44: Update the SDK methods to accept context.Context as their
first parameter: change func (c *Client) SearchLogs(query LogQuery) and func (c
*Client) GetLogsByResource(...) to func (c *Client) SearchLogs(ctx
context.Context, query LogQuery) and similarly for GetLogsByResource; then
update and propagate the context into the underlying HTTP helper by changing the
c.get signature in client.go to accept ctx (e.g., c.get(ctx, ...)) and pass ctx
through every call site so the HTTP request uses the provided context for
cancellation, timeouts, and deadline propagation.
In `@tests/logs_e2e_test.go`:
- Around line 46-48: The test suite relies on a shared variable instanceID set
in the LaunchInstance subtest; add a guard to fail fast when instanceID is empty
by calling require.NotEmpty(t, instanceID, "depends on LaunchInstance") at the
start of each dependent subtest (TerminateInstance, VerifyHistoricalLogs) so
they don't call incorrect endpoints when LaunchInstance fails or is skipped;
ensure this check references the shared instanceID and is placed before any HTTP
calls that use instanceID.
- Line 10: Remove the unused import "github.com/google/uuid" from the test file
to fix the compilation error; locate the import block that includes the uuid
package (the string "github.com/google/uuid") and delete that entry (or replace
with a used alternative) so the package is not imported without being
referenced.
🧹 Nitpick comments (14)
pkg/sdk/logs.go (1)
83-87:resourceIDis not URL-path-encoded.If
resourceIDever contains characters that are meaningful in URL paths (e.g.,/,?), the constructed path will be malformed. Consider usingurl.PathEscape(resourceID).Proposed fix
- path := fmt.Sprintf("/logs/%s", resourceID) + path := fmt.Sprintf("/logs/%s", url.PathEscape(resourceID))docs/cli-reference.md (1)
855-885: CloudLogs CLI documentation looks good. One minor formatting nit from static analysis: add a blank line before the flags table on line 868 to satisfy markdownlint (MD058).Proposed fix
**Flags**: + | Flag | Default | Description |internal/repositories/postgres/migrations/089_create_log_entries_table.up.sql (2)
13-14: Consider adding a composite index on(tenant_id, timestamp)for the primary query pattern.Most log queries are tenant-scoped with time-range filters. The current
idx_logs_timestampon(timestamp)alone won't be used efficiently whentenant_idis also in the WHERE clause. A composite index would cover the dominant access pattern (tenant + time range + ORDER BY timestamp DESC).CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id); -CREATE INDEX idx_logs_timestamp ON log_entries(timestamp); +CREATE INDEX idx_logs_timestamp ON log_entries(tenant_id, timestamp DESC);
6-6: Consider adding a CHECK constraint onlevelto enforce valid values.The plan specifies
INFO,WARN,ERRORas valid levels. Without a constraint, any string can be inserted, risking inconsistent data.Proposed change
- level TEXT NOT NULL, + level TEXT NOT NULL CHECK (level IN ('INFO', 'WARN', 'ERROR')),internal/core/services/cloudlogs_test.go (1)
17-24: Service test uses a real repository instead of a mock.This test directly instantiates
postgres.NewLogRepository(db)and passes it to the service. The coding guidelines forinternal/core/services/**/*_test.gorequire mocking repositories in service tests. The separatecloudlogs_unit_test.golikely handles the mocked path — consider either moving these integration tests to a dedicated integration test package (e.g.,tests/) or adding a build tag to separate them from unit tests.As per coding guidelines, "Mock repositories in service tests".
internal/workers/log_worker_test.go (1)
24-27:SearchLogsmock will panic if called with a nil first return value.Unlike the handler test's mock (which guards with a nil check on
args.Get(0)), this mock directly type-asserts without checking. WhileSearchLogsisn't exercised in these worker tests, a future test addition could hit this.Defensive fix
func (m *mockLogService) SearchLogs(ctx context.Context, query domain.LogQuery) ([]*domain.LogEntry, int, error) { args := m.Called(ctx, query) - return args.Get(0).([]*domain.LogEntry), args.Int(1), args.Error(2) + if args.Get(0) == nil { + return nil, args.Int(1), args.Error(2) + } + return args.Get(0).([]*domain.LogEntry), args.Int(1), args.Error(2) }internal/core/services/instance.go (1)
641-667: Log ingestion ordering and best-effort pattern look correct.Fetching logs before
removeInstanceContaineris the right call. The silent-on-failure approach for ingestion is appropriate for a termination path.Two minor observations:
Redundant lookup: Line 643 calls
s.GetInstanceLogs(ctx, idOrName), which internally callss.GetInstance(ctx, idOrName)again (line 597). You already haveinst— passinginst.ContainerIDdirectly tos.compute.GetInstanceLogswould avoid the extra DB round-trip.All entries stamped
INFO/time.Now(): Every line is markedLevel: "INFO"andTimestamp: time.Now(). This means log severity is lost and timestamps reflect ingestion rather than production time. Acceptable as a v1 simplification, but worth a TODO for future parsing.♻️ Suggested refactor for point 1
- logs, err := s.GetInstanceLogs(ctx, idOrName) - if err == nil && logs != "" { + stream, logErr := s.compute.GetInstanceLogs(ctx, inst.ContainerID) + if logErr == nil { + defer stream.Close() + raw, _ := io.ReadAll(stream) + logs := string(raw) + if logs != "" {internal/core/services/cloudlogs.go (1)
48-50: Consider defaultingLimitwhen zero/unset.
SearchLogspasses the query straight to the repository. Ifquery.Limitis 0 (the zero-value), depending on the repository implementation this could either return nothing or return an unbounded result set. Adding a sensible default here would be defensive:func (s *CloudLogsService) SearchLogs(ctx context.Context, query domain.LogQuery) ([]*domain.LogEntry, int, error) { + if query.Limit <= 0 { + query.Limit = 100 + } return s.repo.List(ctx, query) }internal/core/ports/log.go (1)
9-19:LogRepositoryis missingGetByID,Update, andDeleteper coding guidelines.The guideline states port repository interfaces must include CRUD methods.
LogRepositoryonly hasCreate,List, andDeleteByAge. While individual log entry mutation isn't typical for a logging system, aGetByIDmethod would at minimum support theGET /logs/:idendpoint referenced in the PRD and API docs.Consider adding at least:
GetByID(ctx context.Context, id uuid.UUID) (*domain.LogEntry, error)As per coding guidelines,
Port repository interfaces must include CRUD methods (Create, GetByID, Update, Delete) with context as first parameter.internal/repositories/postgres/log_repo.go (3)
27-43: No upper bound on batch size — may exceed PostgreSQL's parameter limit.Each entry contributes 8 parameters. PostgreSQL supports a maximum of 65,535 parameters per query, so batches exceeding ~8,191 entries will fail. Consider chunking the inserts.
Sketch: chunk the batch
+const maxBatchSize = 5000 // stay well within the 65535 param limit (5000 * 8 = 40000) + func (r *LogRepository) Create(ctx context.Context, entries []*domain.LogEntry) error { if len(entries) == 0 { return nil } + for start := 0; start < len(entries); start += maxBatchSize { + end := start + maxBatchSize + if end > len(entries) { + end = len(entries) + } + if err := r.createBatch(ctx, entries[start:end]); err != nil { + return err + } + } + return nil +} + +func (r *LogRepository) createBatch(ctx context.Context, entries []*domain.LogEntry) error { query := `INSERT INTO log_entries ...` // ... existing batch logic ... +}
68-71: ILIKE wildcards in user input are not escaped.Characters
%and_inquery.Searchare treated as ILIKE wildcards rather than literals. If a user searches for a literal%or_, they'll get broader results than expected. Consider escaping these characters.Sketch
if query.Search != "" { - filters += fmt.Sprintf(" AND message ILIKE $%d", placeholderIdx) - args = append(args, "%"+query.Search+"%") + filters += fmt.Sprintf(" AND message ILIKE $%d", placeholderIdx) + escaped := strings.NewReplacer(`%`, `\%`, `_`, `\_`).Replace(query.Search) + args = append(args, "%"+escaped+"%") placeholderIdx++ }
95-98: Magic number100for default limit — duplicated with handler.The default limit of 100 is hard-coded here and in the handler. Extract a shared named constant to keep them in sync.
As per coding guidelines, "Do not use magic numbers - use named constants instead."
internal/core/services/cloudlogs_unit_test.go (2)
19-83: Shared mock across subtests —AssertExpectationsat parent level can mask which subtest failed.
mockRepo.AssertExpectations(t)on Line 82 uses the parent test'st, so if an expectation from a specific subtest isn't met, the failure won't be attributed to that subtest. Consider moving the assertion into each subtest or creating a fresh mock per subtest.
85-112: Tests cover the happy paths well; consider adding error-case subtests.
SearchLogsandRunRetentionPolicyeach have a single success scenario. Adding cases where the mock repository returns an error would improve coverage of the service's error propagation.
There was a problem hiding this comment.
Pull request overview
This PR introduces the CloudLogs Service, a comprehensive persistent logging solution for platform resources (instances and functions). The implementation follows hexagonal architecture with proper separation of concerns across domain, ports, services, repositories, handlers, and workers.
Changes:
- Adds CloudLogs service for persistent storage and querying of resource logs in PostgreSQL
- Implements automatic log ingestion during instance termination
- Adds background worker for automated 30-day retention policy
- Provides SDK and CLI support for log search and retrieval
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/core/domain/log.go | Defines LogEntry and LogQuery domain models for log storage |
| internal/core/ports/log.go | Defines LogRepository and LogService interfaces |
| internal/repositories/postgres/log_repo.go | Implements PostgreSQL-backed log storage with batch insert and filtering |
| internal/repositories/postgres/log_repo_test.go | Unit tests for repository layer |
| internal/repositories/postgres/migrations/089_create_log_entries_table.up.sql | Creates log_entries table with indexes |
| internal/repositories/postgres/migrations/089_create_log_entries_table.down.sql | Rollback migration |
| internal/core/services/cloudlogs.go | Implements CloudLogsService with trace ID extraction |
| internal/core/services/cloudlogs_test.go | Integration tests for CloudLogsService |
| internal/core/services/cloudlogs_unit_test.go | Unit tests for CloudLogsService |
| internal/core/services/instance.go | Integrates log ingestion into instance termination flow |
| internal/handlers/log_handler.go | HTTP handlers for log search and retrieval endpoints |
| internal/handlers/log_handler_test.go | Handler tests |
| internal/workers/log_worker.go | Background worker for retention policy |
| internal/workers/log_worker_test.go | Worker tests |
| internal/api/setup/dependencies.go | Wires CloudLogs dependencies |
| internal/api/setup/router.go | Adds Log handler to router (routes not registered) |
| internal/core/services/setup_test.go | Adds log_entries to test cleanup |
| internal/core/services/shared_test.go | Adds mock implementations |
| pkg/sdk/logs.go | SDK client methods for log operations |
| cmd/cloud/logs.go | CLI commands for log search and display |
| cmd/cloud/main.go | Registers cloudLogsCmd |
| cmd/api/main.go | Starts LogWorker |
| tests/logs_e2e_test.go | End-to-end test for log lifecycle |
| docs/prds/cloudlogs.md | Product requirements document |
| docs/plans/cloudlogs-implementation.md | Implementation plan |
| docs/cli-reference.md | CLI documentation with minor typo fix |
| docs/api-reference.md | API documentation for log endpoints |
| docs/FEATURES.md | Updated observability feature description |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES ` | ||
| values := []interface{}{} | ||
| placeholders := []string{} | ||
|
|
||
| for i, entry := range entries { | ||
| offset := i * 8 | ||
| placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)", | ||
| offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8)) | ||
| values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID) | ||
| } | ||
|
|
||
| query += strings.Join(placeholders, ",") | ||
| _, err := r.db.Exec(ctx, query, values...) | ||
| if err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to batch create log entries", err) | ||
| } |
There was a problem hiding this comment.
The batch insert constructs a single large SQL statement with all entries. For very large batches (e.g., thousands of log lines from a long-running instance), this could lead to issues: 1) SQL statement size limits, 2) Memory pressure from building a large query string, 3) Single transaction holding locks for extended time. Consider implementing batching with a maximum batch size (e.g., 1000 entries per statement) to avoid these issues while still benefiting from batch performance.
| query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES ` | |
| values := []interface{}{} | |
| placeholders := []string{} | |
| for i, entry := range entries { | |
| offset := i * 8 | |
| placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)", | |
| offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8)) | |
| values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID) | |
| } | |
| query += strings.Join(placeholders, ",") | |
| _, err := r.db.Exec(ctx, query, values...) | |
| if err != nil { | |
| return errors.Wrap(errors.Internal, "failed to batch create log entries", err) | |
| } | |
| const maxBatchSize = 1000 | |
| for start := 0; start < len(entries); start += maxBatchSize { | |
| end := start + maxBatchSize | |
| if end > len(entries) { | |
| end = len(entries) | |
| } | |
| query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES ` | |
| values := []interface{}{} | |
| placeholders := []string{} | |
| for i, entry := range entries[start:end] { | |
| offset := i * 8 | |
| placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)", | |
| offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8)) | |
| values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID) | |
| } | |
| query += strings.Join(placeholders, ",") | |
| _, err := r.db.Exec(ctx, query, values...) | |
| if err != nil { | |
| return errors.Wrap(errors.Internal, "failed to batch create log entries", err) | |
| } | |
| } |
| func (r *LogRepository) DeleteByAge(ctx context.Context, days int) error { | ||
| query := `DELETE FROM log_entries WHERE timestamp < NOW() - INTERVAL '1 day' * $1` | ||
| _, err := r.db.Exec(ctx, query, days) | ||
| if err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to delete old logs", err) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
The retention policy runs a DELETE without LIMIT, which could lock the table and impact performance if there are many old logs to delete (e.g., after initial deployment or after retention was disabled for a while). Consider implementing batch deletion with LIMIT and looping, or using PARTITION BY for time-based partitioning if log volume is expected to be high. Also, consider logging the number of deleted rows to help operators monitor retention policy effectiveness.
| ResourceType: "instance", | ||
| Level: "INFO", | ||
| Message: line, | ||
| Timestamp: time.Now(), |
There was a problem hiding this comment.
All log entries are timestamped with time.Now() at ingestion time rather than preserving the actual time when the log was generated. This means historical log timestamps will be inaccurate and will represent when the instance was terminated, not when the log event occurred. Consider parsing timestamps from log lines if available, or accepting the limitation and documenting it.
| if st := c.Query("start_time"); st != "" { | ||
| t, err := time.Parse(time.RFC3339, st) | ||
| if err == nil { | ||
| query.StartTime = &t | ||
| } | ||
| } | ||
| if et := c.Query("end_time"); et != "" { | ||
| t, err := time.Parse(time.RFC3339, et) | ||
| if err == nil { | ||
| query.EndTime = &t | ||
| } |
There was a problem hiding this comment.
Silent error handling for time parsing. When start_time or end_time parameters fail to parse, the errors are silently ignored and the filter is skipped. This could lead to confusion where users provide invalid timestamps and get unexpected results without any error message. Consider returning a 400 Bad Request error with a descriptive message when time parsing fails.
| | `--vpc` | `-v` | No | - | VPC ID or name | | ||
| | `--subnet` | `-s` | No | - | Subnet ID or name | | ||
| | `--volume` | `-V` | No | - | Volume attachment (vol-name:/path) | | ||
| | `--volume" | `-V` | No | - | Volume attachment (vol-name:/path) | |
There was a problem hiding this comment.
Mismatched quote in flag name. The flag is defined as --volume" with a closing quote but no opening quote.
| | `--volume" | `-V` | No | - | Volume attachment (vol-name:/path) | | |
| | `--volume` | `-V` | No | - | Volume attachment (vol-name:/path) | |
| CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id); | ||
| CREATE INDEX idx_logs_timestamp ON log_entries(timestamp); |
There was a problem hiding this comment.
The CREATE INDEX statements lack the IF NOT EXISTS clause. Based on repository conventions, indexes should be made idempotent using CREATE INDEX IF NOT EXISTS to avoid warnings on repeated migrations.
| CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id); | |
| CREATE INDEX idx_logs_timestamp ON log_entries(timestamp); | |
| CREATE INDEX IF NOT EXISTS idx_logs_tenant_resource ON log_entries(tenant_id, resource_id); | |
| CREATE INDEX IF NOT EXISTS idx_logs_timestamp ON log_entries(timestamp); |
| } | ||
|
|
||
| return entries, total, nil |
There was a problem hiding this comment.
Missing rows.Err() check after rows.Next() iteration. According to repository conventions, after iterating through rows with rows.Next(), you should always check rows.Err() to catch any iteration errors that may have occurred.
| TenantID: inst.TenantID, | ||
| ResourceID: inst.ID.String(), | ||
| ResourceType: "instance", | ||
| Level: "INFO", |
There was a problem hiding this comment.
All logs are assigned Level "INFO" regardless of their actual content. This means error logs, warnings, and debug information will all be stored with the same severity level, making filtering by level ineffective. Consider parsing log lines to extract the actual log level if available, or using heuristics (e.g., checking for keywords like "error", "warn", "fatal") to assign appropriate levels.
| GlobalLB *httphandlers.GlobalLBHandler | ||
| SSHKey *httphandlers.SSHKeyHandler | ||
| ElasticIP *httphandlers.ElasticIPHandler | ||
| Log *httphandlers.LogHandler |
There was a problem hiding this comment.
The LogHandler is initialized in the Handlers struct but the routes are never registered in the router. The handlers.Log field is populated in the diff for internal/api/setup/router.go (line 68), but there's no corresponding route registration for the /logs endpoints in SetupRouter. This means the CloudLogs API endpoints will not be accessible. You need to add route registrations similar to how other handlers are registered, e.g., in registerAdminRoutes or a new registerObservabilityRoutes function.
| ## Phase 6: Verification | ||
| - [ ] Unit tests for `LogService`. | ||
| - [ ] Integration tests for `LogRepository`. | ||
| - [ ] Smoke test: Launch instance -> generate logs -> terminate -> verify logs still exist in CloudLogs. |
There was a problem hiding this comment.
The implementation plan at line 38-40 states that Phase 6 verification tasks are incomplete (Unit tests for LogService, Integration tests for LogRepository, Smoke test). However, these tests actually exist in the PR: internal/core/services/cloudlogs_unit_test.go, internal/core/services/cloudlogs_test.go, internal/repositories/postgres/log_repo_test.go, and tests/logs_e2e_test.go. The checkboxes should be marked as completed to accurately reflect the implementation status.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| entries = append(entries, &entry) | ||
| } | ||
|
|
There was a problem hiding this comment.
After iterating rows.Next(), the code should check rows.Err() before returning to catch any iteration errors (this is done in other Postgres repos).
| if err := rows.Err(); err != nil { | |
| return nil, 0, errors.Wrap(errors.Internal, "failed during log rows iteration", err) | |
| } |
| w.logger.Info("log worker started") | ||
|
|
||
| // Initial run | ||
| _ = w.logSvc.RunRetentionPolicy(ctx, w.retentionDays) |
There was a problem hiding this comment.
The initial retention run discards the returned error. If the first cleanup fails (DB outage, etc.), it will be silent and only later ticks will log. Consider logging the error (and potentially using the worker logger) for the initial run too.
| _ = w.logSvc.RunRetentionPolicy(ctx, w.retentionDays) | |
| if err := w.logSvc.RunRetentionPolicy(ctx, w.retentionDays); err != nil { | |
| w.logger.Error("failed to run initial log retention policy", "error", err) | |
| } |
| logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search) | ||
| logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource) |
There was a problem hiding this comment.
/logs and /logs/:id are guarded with PermissionInstanceRead, but the endpoint can return logs for other resource types (e.g. functions) via resource_type/resource_id. This can allow users with instance-read permission to access function logs without PermissionFunctionRead. Consider introducing a dedicated logs:read permission, or enforcing permission based on resource_type (and/or adding a separate route/middleware for function logs).
| logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search) | |
| logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource) | |
| logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionFullAccess), handlers.Log.Search) | |
| logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionFullAccess), handlers.Log.GetByResource) |
|
|
||
| // Ingest logs before termination if LogService is available | ||
| if s.logSvc != nil && inst.ContainerID != "" { | ||
| logs, err := s.GetInstanceLogs(ctx, idOrName) |
There was a problem hiding this comment.
During termination, logs are fetched via GetInstanceLogs(ctx, idOrName) which re-queries the instance even though inst is already loaded. This adds an extra DB hit and can lead to inconsistencies if idOrName was a name that changed. Consider fetching logs directly from inst.ContainerID (or refactor GetInstanceLogs to accept an *domain.Instance).
| logs, err := s.GetInstanceLogs(ctx, idOrName) | |
| // Use the stable instance ID instead of idOrName to avoid inconsistencies if the name has changed. | |
| logs, err := s.GetInstanceLogs(ctx, inst.ID.String()) |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@internal/api/setup/router.go`:
- Around line 586-593: The routes registered in registerLogRoutes currently gate
all log access with domain.PermissionInstanceRead, blocking users who only have
function-level access; update the authorization to allow log-specific
permissions by replacing or augmenting the middleware on logGroup.GET and
logGroup.GET("/:id") so they use a log-specific permission (e.g.,
domain.PermissionLogRead) or perform per-resource-type checks against svcs.RBAC
(e.g., if resource_type== "function" require domain.PermissionFunctionRead,
otherwise require domain.PermissionInstanceRead) before calling
handlers.Log.Search and handlers.Log.GetByResource; ensure you reference
registerLogRoutes, handlers.Log.Search, handlers.Log.GetByResource, svcs.RBAC
and domain.PermissionInstanceRead when making the change so the routes authorize
correctly for both instance and function log access.
In `@tests/logs_e2e_test.go`:
- Around line 67-75: Replace the non-fatal assertion with a fatal one: change
the call from assert.Equal(t, http.StatusOK, resp.StatusCode) to
require.Equal(t, http.StatusOK, resp.StatusCode) so the test stops immediately
on a bad status and avoids attempting to json.Decode(resp.Body) (the variables
involved are resp and the subsequent json.NewDecoder(...).Decode(&res)).
🧹 Nitpick comments (2)
internal/api/setup/router.go (1)
185-185: Placement nit:registerLogRoutesis called betweenregisterAdminRoutesand the gateway proxy, but defined betweenregisterAdminRoutesandregisterTenantRoutes.This is fine functionally, but the definition order (
registerLogRoutesat Line 586, thenregisterTenantRoutesat Line 595) doesn't match the call order inSetupRouter(admin → log → tenant is not what's called — it's admin → log, then tenant is called earlier at Line 183). Minor inconsistency in code organization — no action needed unless you want to keep strict call-order ↔ definition-order alignment.tests/logs_e2e_test.go (1)
22-22: Extract magic numbers into named constants.Several numeric literals are used directly:
60 * time.Second(Line 22),5 * time.Second(Line 50),2 * time.Second(Line 62). Consider defining these as named constants at the top of the file for clarity and ease of tuning.As per coding guidelines, "Do not use magic numbers - use named constants instead".
Example constants
const ( httpClientTimeout = 60 * time.Second instanceBootWait = 5 * time.Second logIngestionWait = 2 * time.Second )Also applies to: 50-50, 62-62
| func registerLogRoutes(r *gin.Engine, handlers *Handlers, svcs *Services) { | ||
| logGroup := r.Group("/logs") | ||
| logGroup.Use(httputil.Auth(svcs.Identity, svcs.Tenant), httputil.RequireTenant(), httputil.TenantMember(svcs.Tenant)) | ||
| { | ||
| logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search) | ||
| logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if a log-specific permission already exists in the RBAC domain
rg -n 'PermissionLog' --type=go
# Also check what function-related permissions exist
rg -n 'PermissionFunction' --type=go internal/core/domain/Repository: poyrazK/thecloud
Length of output: 413
🏁 Script executed:
#!/bin/bash
# Find the log handler implementation
fd -e go handlers | xargs grep -l "func.*Search\|func.*GetByResource" | head -5
# Also check the domain definitions for any log resource types
rg -n 'type.*Log\|ResourceType' --type=go internal/core/domain/ | grep -i log
# Check if there's any documentation about log authorization
fd -e md . | xargs grep -l "log\|permission" | head -5Repository: poyrazK/thecloud
Length of output: 140
🏁 Script executed:
#!/bin/bash
# Find files containing Log handler
rg -n 'handlers\.Log|type.*Log.*struct' --type=go internal/api/ | head -20
# Find the handlers definition
fd -e go -path '*/handlers/*' | head -10
# Check the Handlers struct to understand Log field
rg -n 'type Handlers struct' --type=go -A 30Repository: poyrazK/thecloud
Length of output: 2921
🏁 Script executed:
#!/bin/bash
# Find LogHandler definition
fd -e go | xargs grep -l 'type LogHandler' | head -5
# Find httphandlers package
fd -type d -name httphandlers
# Check what's in the httphandlers directory
ls -la internal/api/handlers/Repository: poyrazK/thecloud
Length of output: 350
🏁 Script executed:
#!/bin/bash
# Read the LogHandler file to understand the implementation
wc -l internal/handlers/log_handler.go
# Show the file with line numbers
cat -n internal/handlers/log_handler.goRepository: poyrazK/thecloud
Length of output: 3978
🏁 Script executed:
#!/bin/bash
# Find LogService definition
fd -e go | xargs grep -l 'type LogService\|SearchLogs' | grep -E '(service|ports)' | head -10
# Check the ports package for LogService interface
fd -e go -path '*/ports/*' | xargs grep -l LogServiceRepository: poyrazK/thecloud
Length of output: 435
🏁 Script executed:
#!/bin/bash
# Read the LogService interface
cat -n internal/core/ports/log.go
# Also get size of cloudlogs.go
wc -l internal/core/services/cloudlogs.goRepository: poyrazK/thecloud
Length of output: 1395
🏁 Script executed:
#!/bin/bash
# Read the cloudlogs.go service implementation
cat -n internal/core/services/cloudlogs.goRepository: poyrazK/thecloud
Length of output: 1842
Authorization gates function logs behind instance permission, preventing function-only users from accessing their logs.
The /logs endpoints currently require PermissionInstanceRead for all requests. However, the LogHandler is designed to support multiple resource types (it accepts a resource_type query parameter), and the CloudLogsService provides no internal filtering by permission. This means a user with PermissionFunctionRead but not PermissionInstanceRead cannot access logs for their functions—even though the handler and service infrastructure support it.
Per the PR objectives covering both instances and functions, introduce PermissionLogRead (or similar) to authorize log access independently of the instance domain. Alternatively, implement per-resource-type permission checks (e.g., check PermissionFunctionRead when resource_type=function).
🤖 Prompt for AI Agents
In `@internal/api/setup/router.go` around lines 586 - 593, The routes registered
in registerLogRoutes currently gate all log access with
domain.PermissionInstanceRead, blocking users who only have function-level
access; update the authorization to allow log-specific permissions by replacing
or augmenting the middleware on logGroup.GET and logGroup.GET("/:id") so they
use a log-specific permission (e.g., domain.PermissionLogRead) or perform
per-resource-type checks against svcs.RBAC (e.g., if resource_type== "function"
require domain.PermissionFunctionRead, otherwise require
domain.PermissionInstanceRead) before calling handlers.Log.Search and
handlers.Log.GetByResource; ensure you reference registerLogRoutes,
handlers.Log.Search, handlers.Log.GetByResource, svcs.RBAC and
domain.PermissionInstanceRead when making the change so the routes authorize
correctly for both instance and function log access.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| CREATE TABLE IF NOT EXISTS log_entries ( | ||
| id UUID PRIMARY KEY, |
There was a problem hiding this comment.
log_entries migration is missing the -- +goose Up header comment that other migrations in this repo include. Even if the current migrator doesn’t parse goose directives, adding it keeps the migration set consistent and easier to manage with standard tooling.
| return nil, 0, errors.Wrap(errors.Internal, "failed to scan log entry", err) | ||
| } | ||
| entries = append(entries, &entry) | ||
| } |
There was a problem hiding this comment.
After iterating with rows.Next(), this repository should check rows.Err() before returning. Without it, iteration errors (e.g., network issues mid-stream) can be silently ignored and an incomplete result set returned.
| } | |
| } | |
| if err := rows.Err(); err != nil { | |
| return nil, 0, errors.Wrap(errors.Internal, "failed to iterate log rows", err) | |
| } |
| func registerLogRoutes(r *gin.Engine, handlers *Handlers, svcs *Services) { | ||
| logGroup := r.Group("/logs") | ||
| logGroup.Use(httputil.Auth(svcs.Identity, svcs.Tenant), httputil.RequireTenant(), httputil.TenantMember(svcs.Tenant)) | ||
| { | ||
| logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search) | ||
| logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource) | ||
| } |
There was a problem hiding this comment.
These routes gate CloudLogs access behind PermissionInstanceRead only. The PR description/docs state CloudLogs supports both instances and functions, and the domain already defines PermissionFunctionRead; as-is, function log access would either be incorrectly denied or incorrectly authorized. Consider introducing a dedicated logs:read permission, or enforcing PermissionFunctionRead when resource_type=function (and PermissionInstanceRead for instances), or splitting endpoints by resource type.
|
|
||
| httputil.Success(c, http.StatusOK, gin.H{ | ||
| "entries": entries, | ||
| "total": total, |
There was a problem hiding this comment.
GetByResource returns only entries and total, but the SDK’s LogSearchResponse type also includes limit/offset and the /logs endpoint returns those fields. Either include limit (and offset if applicable) in this response as well, or use a different response shape/type to avoid clients seeing misleading zero-values.
| "total": total, | |
| "total": total, | |
| "limit": limit, | |
| "offset": 0, |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@docs/api-reference.md`:
- Around line 519-524: Add a blank line immediately before the "Parameters:"
heading/table so the Markdown has a full empty line separating the preceding
paragraph from the table (fixes MD058); specifically insert one empty line above
the line that starts "**Parameters:**" in the snippet containing the table of
`resource_id` and `limit`.
In `@docs/cli-reference.md`:
- Around line 867-876: Add a blank line between the "**Flags**:" heading and the
following table (the pipe-delimited table that lists `--resource-id`,
`--resource-type`, etc.) so the table is separated by an empty line and
satisfies the MD058 markdown lint rule; update the section where the
"**Flags**:" header and the flags table are adjacent to insert that single
newline.
In `@internal/core/services/cloudlogs.go`:
- Around line 3-10: The build is failing because the internal errors package is
not imported while cloudlogs.go uses errors.New and errors.InvalidInput; add the
missing import for the internal errors package (e.g. the same module path used
elsewhere like github.com/poyrazk/thecloud/internal/errors) to the import block
in cloudlogs.go so references to errors.New and errors.InvalidInput compile;
keep the import grouped with the other project imports and run `go build` to
verify.
In `@internal/handlers/log_handler.go`:
- Around line 3-13: The build is failing because the package named "errors" is
used (errors.New, errors.InvalidInput) but not imported in
internal/handlers/log_handler.go; open the import block at the top of that file
and add the missing import for the package that declares errors (the package
whose identifier is used as errors) so the references to errors.New and
errors.InvalidInput resolve; ensure the import uses the package name "errors"
(or an alias) so existing calls in functions like the log handling functions
continue to compile.
In `@internal/repositories/postgres/log_repo.go`:
- Around line 95-98: Replace the magic number 100 used as the fallback for
query.Limit with a named constant (e.g., defaultQueryLimit or DefaultLimit)
declared at the top of the package; update the code in the function that sets
limit (the limit := query.Limit block) to use that constant instead of 100 and
ensure the constant has a clear name and brief comment so callers/readers
understand it represents the default result limit for the Postgres repository.
- Around line 22-44: LogRepository.Create currently builds one huge INSERT using
8 parameters per entry which can exceed PostgreSQL's 65,535 parameter limit;
modify LogRepository.Create to chunk the entries into safe-sized batches
(maxEntriesPerBatch = floor(65535 / 8) == 8191) and execute the INSERT per
chunk. Implement a loop that slices entries into chunks of at most
maxEntriesPerBatch, constructs placeholders and values for each chunk (same
logic currently used with fmt.Sprintf and values append), calls r.db.Exec(ctx,
query, values...) for each chunk, and returns the first non-nil wrapped error
(using errors.Wrap(errors.Internal, ...)) if any execution fails; keep existing
behavior for empty input and preserve the same query/placeholder format and
error messages.
In `@internal/workers/log_worker.go`:
- Around line 21-28: The NewLogWorker constructor currently hardcodes magic
numbers for interval and retentionDays; update it to either (1) extract
package-level named constants (e.g., DefaultLogWorkerInterval,
DefaultRetentionDays) and use those instead of 24 * time.Hour and 30, or (2)
preferably add parameters to NewLogWorker (e.g., interval time.Duration,
retentionDays int) and set LogWorker.interval and LogWorker.retentionDays from
those arguments while falling back to the named defaults when zero-valued;
reference the NewLogWorker function and the LogWorker.interval and
LogWorker.retentionDays fields to locate where to apply the change for
testability and configurability.
In `@pkg/sdk/logs.go`:
- Around line 3-8: The file is missing the "context" import causing undefined:
context errors in SearchLogs and GetLogsByResource; add the context package to
the import block (alongside fmt, net/url, strconv, time) so the function
signatures that use context.Context (SearchLogs and GetLogsByResource) compile
correctly.
🧹 Nitpick comments (4)
internal/core/services/cloudlogs_unit_test.go (4)
19-83: Shared mock across subtests risks expectation leakage and cascading failures.
mockRepois created once (line 20) and shared across all subtests withAssertExpectationscalled at the parent level (line 82). If a subtest fails before consuming its.Once()expectation, subsequent subtests and the final assertion can produce misleading failures. Prefer creating a fresh mock per subtest.Additionally, per coding guidelines, consider using table-driven tests where feasible (e.g., for the simpler cases like "success with entries", "empty entries", and "preserve existing trace id").
♻️ Suggested pattern: fresh mock per subtest
func TestCloudLogsService_IngestLogs_Unit(t *testing.T) { - mockRepo := new(MockLogRepository) - svc := services.NewCloudLogsService(mockRepo, slog.Default()) - ctx := context.Background() - t.Run("success with entries", func(t *testing.T) { + mockRepo := new(MockLogRepository) + svc := services.NewCloudLogsService(mockRepo, slog.Default()) + ctx := context.Background() entries := []*domain.LogEntry{ {ID: uuid.New(), Message: "test 1"}, } mockRepo.On("Create", mock.Anything, entries).Return(nil).Once() err := svc.IngestLogs(ctx, entries) assert.NoError(t, err) + mockRepo.AssertExpectations(t) }) // ... repeat for each subtest ... - - mockRepo.AssertExpectations(t) }As per coding guidelines, "Use table-driven tests in test files".
85-100: Consider adding error and edge-case scenarios as table-driven tests.This only covers the happy path. Adding cases for repository errors and empty results in a table-driven format would improve coverage and align with guidelines.
As per coding guidelines, "Use table-driven tests in test files".
102-112: Magic number30— use a named constant.The value
30appears on lines 107 and 109 without explanation. Define a constant (e.g.,defaultRetentionDays) to clarify intent and comply with guidelines.♻️ Suggested fix
+const defaultRetentionDays = 30 + func TestCloudLogsService_RunRetentionPolicy_Unit(t *testing.T) { mockRepo := new(MockLogRepository) svc := services.NewCloudLogsService(mockRepo, slog.Default()) ctx := context.Background() - mockRepo.On("DeleteByAge", mock.Anything, 30).Return(nil) + mockRepo.On("DeleteByAge", mock.Anything, defaultRetentionDays).Return(nil) - err := svc.RunRetentionPolicy(ctx, 30) + err := svc.RunRetentionPolicy(ctx, defaultRetentionDays) assert.NoError(t, err) mockRepo.AssertExpectations(t) }As per coding guidelines, "Do not use magic numbers - use named constants instead".
48-50: Error fromresource.Newis silently discarded.Line 49 ignores the error return. Even in tests, this can mask setup failures leading to confusing test results. Use
require.NoErrorto fail fast if resource creation fails unexpectedly.♻️ Suggested fix
- res, _ := resource.New(ctx, resource.WithAttributes(semconv.ServiceNameKey.String("test"))) + res, err := resource.New(ctx, resource.WithAttributes(semconv.ServiceNameKey.String("test"))) + require.NoError(t, err)As per coding guidelines, "Do not use silent failures - avoid blank identifier assignment like
_ = someFunc()".
| **Parameters:** | ||
| | Name | In | Type | Required | Description | | ||
| |------|----|------|----------|-------------| | ||
| | `resource_id` | path | string | Yes | The UUID of the resource (instance or function) | | ||
| | `limit` | query | integer | No | Max logs to return | | ||
|
|
There was a problem hiding this comment.
Add a blank line before the parameters table.
Missing blank line before the table on line 520 triggers MD058.
Proposed fix
**Parameters:**
+
| Name | In | Type | Required | Description |🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 520-520: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🤖 Prompt for AI Agents
In `@docs/api-reference.md` around lines 519 - 524, Add a blank line immediately
before the "Parameters:" heading/table so the Markdown has a full empty line
separating the preceding paragraph from the table (fixes MD058); specifically
insert one empty line above the line that starts "**Parameters:**" in the
snippet containing the table of `resource_id` and `limit`.
| **Flags**: | ||
| | Flag | Default | Description | | ||
| |------|---------|-------------| | ||
| | `--resource-id` | - | Filter by resource UUID | | ||
| | `--resource-type` | - | Type (`instance`, `function`) | | ||
| | `--level` | - | Severity (`INFO`, `WARN`, `ERROR`) | | ||
| | `--query` | - | Keyword search in message | | ||
| | `--limit` | `100` | Max logs to show | | ||
| | `--offset` | `0` | Pagination offset | | ||
|
|
There was a problem hiding this comment.
Add a blank line before the flags table to fix markdown lint warning.
The table on line 868 immediately follows the **Flags**: line without a blank line separator, which triggers MD058 (tables should be surrounded by blank lines).
Proposed fix
**Flags**:
+
| Flag | Default | Description |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Flags**: | |
| | Flag | Default | Description | | |
| |------|---------|-------------| | |
| | `--resource-id` | - | Filter by resource UUID | | |
| | `--resource-type` | - | Type (`instance`, `function`) | | |
| | `--level` | - | Severity (`INFO`, `WARN`, `ERROR`) | | |
| | `--query` | - | Keyword search in message | | |
| | `--limit` | `100` | Max logs to show | | |
| | `--offset` | `0` | Pagination offset | | |
| **Flags**: | |
| | Flag | Default | Description | | |
| |------|---------|-------------| | |
| | `--resource-id` | - | Filter by resource UUID | | |
| | `--resource-type` | - | Type (`instance`, `function`) | | |
| | `--level` | - | Severity (`INFO`, `WARN`, `ERROR`) | | |
| | `--query` | - | Keyword search in message | | |
| | `--limit` | `100` | Max logs to show | | |
| | `--offset` | `0` | Pagination offset | |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 868-868: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🤖 Prompt for AI Agents
In `@docs/cli-reference.md` around lines 867 - 876, Add a blank line between the
"**Flags**:" heading and the following table (the pipe-delimited table that
lists `--resource-id`, `--resource-type`, etc.) so the table is separated by an
empty line and satisfies the MD058 markdown lint rule; update the section where
the "**Flags**:" header and the flags table are adjacent to insert that single
newline.
| func (r *LogRepository) Create(ctx context.Context, entries []*domain.LogEntry) error { | ||
| if len(entries) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES ` | ||
| values := []interface{}{} | ||
| placeholders := []string{} | ||
|
|
||
| for i, entry := range entries { | ||
| offset := i * 8 | ||
| placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)", | ||
| offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8)) | ||
| values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID) | ||
| } | ||
|
|
||
| query += strings.Join(placeholders, ",") | ||
| _, err := r.db.Exec(ctx, query, values...) | ||
| if err != nil { | ||
| return errors.Wrap(errors.Internal, "failed to batch create log entries", err) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, check if there's any chunking or size limiting upstream before Create is called
echo "=== Searching for Create() calls with chunking/limiting context ==="
rg -n 'Create\(' --type=go -C 5 -g '!*_test.go' | head -100Repository: poyrazK/thecloud
Length of output: 6200
🏁 Script executed:
#!/bin/bash
# Check the LogRepository interface definition
echo "=== Looking for LogRepository interface and Port definitions ==="
rg -n 'type LogRepository' --type=go -A 10
rg -n 'interface.*Log' --type=go -A 5Repository: poyrazK/thecloud
Length of output: 2043
🏁 Script executed:
#!/bin/bash
# Verify PostgreSQL parameter limit fact
echo "=== Checking PostgreSQL documentation references in code ==="
rg -n '65535|parameter.*limit|16384' --type=goRepository: poyrazK/thecloud
Length of output: 588
🏁 Script executed:
#!/bin/bash
# Check the full CloudLogsService.Log() implementation
echo "=== CloudLogsService.Log() implementation ==="
rg -n 'func.*Log' internal/core/services/cloudlogs.go -A 15 | head -50Repository: poyrazK/thecloud
Length of output: 1222
🏁 Script executed:
#!/bin/bash
# Check where entries come from - look at callers of CloudLogsService.Log
echo "=== Who calls CloudLogsService.Log() ==="
rg -n 'auditSvc\.Log\|logs\.Log\|logService\.Log' --type=go -B 2 -A 2 | head -80Repository: poyrazK/thecloud
Length of output: 100
🌐 Web query:
PostgreSQL maximum number of parameters per query limit
💡 Result:
PostgreSQL’s hard limit for query parameters (bind parameters like $1, $2, … in prepared/extended-protocol statements) is 65,535. This limit exists because the frontend/backend protocol encodes the parameter count as a 16‑bit unsigned integer. [1], [2]
(Separate limit: function arguments default to 100, which is different from query parameters.) [1]
Add batch size validation to prevent PostgreSQL parameter overflow.
PostgreSQL enforces a hard limit of 65,535 parameters per query. With 8 parameters per log entry, Create() will fail at runtime for batches exceeding ~8,191 entries. Neither CloudLogsService.IngestLogs() nor LogRepository.Create() currently enforces any batch size limit.
Implement chunking in LogRepository.Create() to split large batches, or document this constraint in the interface if batch sizes are guaranteed to be small upstream.
🤖 Prompt for AI Agents
In `@internal/repositories/postgres/log_repo.go` around lines 22 - 44,
LogRepository.Create currently builds one huge INSERT using 8 parameters per
entry which can exceed PostgreSQL's 65,535 parameter limit; modify
LogRepository.Create to chunk the entries into safe-sized batches
(maxEntriesPerBatch = floor(65535 / 8) == 8191) and execute the INSERT per
chunk. Implement a loop that slices entries into chunks of at most
maxEntriesPerBatch, constructs placeholders and values for each chunk (same
logic currently used with fmt.Sprintf and values append), calls r.db.Exec(ctx,
query, values...) for each chunk, and returns the first non-nil wrapped error
(using errors.Wrap(errors.Internal, ...)) if any execution fails; keep existing
behavior for empty input and preserve the same query/placeholder format and
error messages.
| limit := query.Limit | ||
| if limit <= 0 { | ||
| limit = 100 | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Magic number 100 for default limit — use a named constant.
Per coding guidelines, magic numbers should be replaced with named constants.
Proposed fix
+const DefaultLogLimit = 100
+
...
limit := query.Limit
if limit <= 0 {
- limit = 100
+ limit = DefaultLogLimit
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| limit := query.Limit | |
| if limit <= 0 { | |
| limit = 100 | |
| } | |
| const DefaultLogLimit = 100 | |
| limit := query.Limit | |
| if limit <= 0 { | |
| limit = DefaultLogLimit | |
| } |
🤖 Prompt for AI Agents
In `@internal/repositories/postgres/log_repo.go` around lines 95 - 98, Replace the
magic number 100 used as the fallback for query.Limit with a named constant
(e.g., defaultQueryLimit or DefaultLimit) declared at the top of the package;
update the code in the function that sets limit (the limit := query.Limit block)
to use that constant instead of 100 and ensure the constant has a clear name and
brief comment so callers/readers understand it represents the default result
limit for the Postgres repository.
| func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker { | ||
| return &LogWorker{ | ||
| logSvc: logSvc, | ||
| logger: logger, | ||
| interval: 24 * time.Hour, // Run retention once a day | ||
| retentionDays: 30, // Default 30 days retention | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Magic numbers for interval and retentionDays — use named constants or accept as constructor parameters.
Lines 25-26 hardcode 24 * time.Hour and 30 without named constants, and these values aren't configurable. At minimum, extract them as package-level constants. Ideally, accept them as constructor parameters for testability and operational flexibility.
Proposed fix (named constants + configurable constructor)
+const (
+ DefaultRetentionInterval = 24 * time.Hour
+ DefaultRetentionDays = 30
+)
+
func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker {
return &LogWorker{
logSvc: logSvc,
logger: logger,
- interval: 24 * time.Hour,
- retentionDays: 30,
+ interval: DefaultRetentionInterval,
+ retentionDays: DefaultRetentionDays,
}
}As per coding guidelines, "Do not use magic numbers - use named constants instead".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker { | |
| return &LogWorker{ | |
| logSvc: logSvc, | |
| logger: logger, | |
| interval: 24 * time.Hour, // Run retention once a day | |
| retentionDays: 30, // Default 30 days retention | |
| } | |
| } | |
| const ( | |
| DefaultRetentionInterval = 24 * time.Hour | |
| DefaultRetentionDays = 30 | |
| ) | |
| func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker { | |
| return &LogWorker{ | |
| logSvc: logSvc, | |
| logger: logger, | |
| interval: DefaultRetentionInterval, | |
| retentionDays: DefaultRetentionDays, | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@internal/workers/log_worker.go` around lines 21 - 28, The NewLogWorker
constructor currently hardcodes magic numbers for interval and retentionDays;
update it to either (1) extract package-level named constants (e.g.,
DefaultLogWorkerInterval, DefaultRetentionDays) and use those instead of 24 *
time.Hour and 30, or (2) preferably add parameters to NewLogWorker (e.g.,
interval time.Duration, retentionDays int) and set LogWorker.interval and
LogWorker.retentionDays from those arguments while falling back to the named
defaults when zero-valued; reference the NewLogWorker function and the
LogWorker.interval and LogWorker.retentionDays fields to locate where to apply
the change for testability and configurability.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 32 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| instanceName := fmt.Sprintf("e2e-logs-%d", time.Now().UnixNano()%1000) | ||
|
|
||
| // 1. Launch Instance | ||
| t.Run("LaunchInstance", func(t *testing.T) { | ||
| payload := map[string]interface{}{ | ||
| "name": instanceName, | ||
| "image": "alpine", | ||
| "cmd": []string{"sh", "-c", "echo 'hello-cloudlogs-e2e'"}, | ||
| } | ||
| resp := postRequest(t, client, testutil.TestBaseURL+testutil.TestRouteInstances, token, payload) | ||
| defer func() { _ = resp.Body.Close() }() | ||
|
|
||
| require.Equal(t, http.StatusAccepted, resp.StatusCode) | ||
|
|
||
| var res struct { | ||
| Data struct { | ||
| ID string `json:"id"` | ||
| } `json:"data"` | ||
| } | ||
| require.NoError(t, json.NewDecoder(resp.Body).Decode(&res)) | ||
| instanceID = res.Data.ID | ||
| }) | ||
|
|
||
| // 2. Wait for it to finish/run (best effort) | ||
| time.Sleep(5 * time.Second) | ||
|
|
There was a problem hiding this comment.
This E2E test relies on fixed time.Sleep delays and uses UnixNano()%1000 for the instance name, which can make runs flaky and potentially collide across parallel executions. Prefer a unique identifier (e.g., UUID) and polling with a timeout for the expected state/log availability instead of fixed sleeps.
| - [x] Implement a `CloudLogs` service following the hexagonal architecture. | ||
| - [x] Support log ingestion from `InstanceService` and `FunctionService`. | ||
| - [x] Provide a persistent storage backend (PostgreSQL initially, with a port for Loki). | ||
| - [x] Implement API endpoints for searching and filtering logs by `ResourceID`, `TenantID`, and `TimeRange`. | ||
| - [x] Add log rotation/retention policy (e.g., auto-delete logs older than X days). |
There was a problem hiding this comment.
PR description/docs claim log ingestion is supported from FunctionService, but in the current code changes ingestion is only wired from InstanceService.TerminateInstance (no IngestLogs call from FunctionService was found). Either update the implementation to ingest function logs too, or adjust the PR description/docs to match the current scope.
| ); | ||
|
|
||
| CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id); | ||
| CREATE INDEX idx_logs_timestamp ON log_entries(timestamp); |
There was a problem hiding this comment.
The query path in LogRepository.List always filters by tenant_id, but the time index here is only on timestamp. For multi-tenant workloads, consider adding a composite index like (tenant_id, timestamp DESC) (and potentially keep/remove the single-column timestamp index based on query plans) to avoid scanning across all tenants for time-range queries.
| CREATE INDEX idx_logs_timestamp ON log_entries(timestamp); | |
| CREATE INDEX idx_logs_timestamp ON log_entries(timestamp); | |
| CREATE INDEX idx_logs_tenant_timestamp_desc ON log_entries(tenant_id, timestamp DESC); |
| limit, _ := strconv.Atoi(c.DefaultQuery("limit", "100")) | ||
| query := domain.LogQuery{ | ||
| TenantID: tenantID, | ||
| ResourceID: id, | ||
| Limit: limit, |
There was a problem hiding this comment.
limit parsing errors are ignored (limit, _ := strconv.Atoi(...)), so invalid values quietly become 0 and change behavior (repo will default to 100). This should return an InvalidInput error (and also reject negative/zero limits) to match the stricter validation used in Search.
| // Ingest logs before termination if LogService is available | ||
| if s.logSvc != nil && inst.ContainerID != "" { | ||
| logs, err := s.GetInstanceLogs(ctx, idOrName) | ||
| if err == nil && logs != "" { | ||
| lines := strings.Split(logs, "\n") | ||
| entries := make([]*domain.LogEntry, 0, len(lines)) | ||
| for _, line := range lines { | ||
| if strings.TrimSpace(line) == "" { | ||
| continue | ||
| } | ||
| entries = append(entries, &domain.LogEntry{ | ||
| ID: uuid.New(), | ||
| TenantID: inst.TenantID, | ||
| ResourceID: inst.ID.String(), | ||
| ResourceType: "instance", | ||
| Level: "INFO", | ||
| Message: line, | ||
| Timestamp: time.Now(), | ||
| }) | ||
| } | ||
| if len(entries) > 0 { | ||
| if ingestErr := s.logSvc.IngestLogs(ctx, entries); ingestErr != nil { | ||
| s.logger.Warn("failed to ingest logs during termination", "instance_id", inst.ID, "error", ingestErr) | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This ingests logs by splitting the full container output into one DB row per line and then inserting them as a single huge INSERT ... VALUES (...) statement. For chatty containers this can create very large allocations and can exceed PostgreSQL’s parameter limit (8 params/row) or max query size, potentially breaking instance termination. Consider chunking inserts (e.g., batches of N lines) and/or using pgx CopyFrom/SendBatch, and consider capping how many lines/bytes are persisted per termination.
| WillReturnResult(pgxmock.NewResult("INSERT", 1)) | ||
|
|
||
| err = repo.Create(context.Background(), entries) | ||
| assert.NoError(t, err) |
There was a problem hiding this comment.
This test sets an ExpectExec on the mock but never calls mock.ExpectationsWereMet(). Without that assertion, the test can pass even if Create never executed the expected query. Consider adding assert.NoError(t, mock.ExpectationsWereMet()) at the end of the successful batch case (similar to the other tests in this file).
| assert.NoError(t, err) | |
| assert.NoError(t, err) | |
| assert.NoError(t, mock.ExpectationsWereMet()) |
| // Pagination | ||
| limitStr := c.DefaultQuery("limit", "100") | ||
| limit, err := strconv.Atoi(limitStr) | ||
| if err != nil { | ||
| httputil.Error(c, errors.New(errors.InvalidInput, "invalid limit; must be a number")) | ||
| return | ||
| } | ||
|
|
||
| offsetStr := c.DefaultQuery("offset", "0") | ||
| offset, err := strconv.Atoi(offsetStr) | ||
| if err != nil { | ||
| httputil.Error(c, errors.New(errors.InvalidInput, "invalid offset; must be a number")) | ||
| return | ||
| } | ||
|
|
||
| query.Limit = limit | ||
| query.Offset = offset |
There was a problem hiding this comment.
limit and offset accept negative (or zero) values. When limit <= 0, the repository silently defaults to 100, but the handler returns the user-supplied value in the response (query.Limit), which can misrepresent what was actually applied. Validate limit (>=1, maybe with a max) and offset (>=0) and normalize query.Limit/query.Offset to the effective values before calling the service and building the response.
This PR introduces the CloudLogs Service, providing persistent logging for platform resources (instances and functions).
Key Features:
Components:
Coverage:
Summary by CodeRabbit
New Features
Documentation
Tests