Skip to content

feat: Implement CloudLogs Service#60

Merged
poyrazK merged 26 commits intomainfrom
feature/cloudlogs
Feb 15, 2026
Merged

feat: Implement CloudLogs Service#60
poyrazK merged 26 commits intomainfrom
feature/cloudlogs

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Feb 14, 2026

This PR introduces the CloudLogs Service, providing persistent logging for platform resources (instances and functions).

Key Features:

  • Persistent Storage: Logs are automatically ingested into PostgreSQL upon instance termination.
  • Advanced Search: Filter logs by Resource ID, Type, Level, Time Range, and Keywords.
  • Trace correlation: Integrated with OpenTelemetry/Jaeger via TraceID extraction.
  • Background Retention: Automated daily cleanup of logs older than 30 days.
  • SDK & CLI: Full support for programmatic access and command-line log viewing.

Components:

  • (internal/core/services)
  • (internal/repositories/postgres)
  • (internal/handlers)
  • (internal/workers)
  • &

Coverage:

  • Unit Tests: 98%+ coverage for service, handler, and worker layers.
  • E2E Tests: Validated full lifecycle from launch to log retrieval.

Summary by CodeRabbit

  • New Features

    • CloudLogs: persistent, searchable logs with filtering, traceID correlation, and automatic capture of instance logs on termination
    • API: GET /logs and GET /logs/{id}
    • CLI: logs search and logs show
    • Background retention worker enforcing periodic cleanup (default 30 days)
  • Documentation

    • Added product, API, CLI, implementation and OpenAPI/Swagger docs for CloudLogs
  • Tests

    • Added unit, integration, and e2e tests for ingestion, search, retention, handlers, repo, and worker

Copilot AI review requested due to automatic review settings February 14, 2026 18:12
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 14, 2026

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds a CloudLogs subsystem: domain types and ports, PostgreSQL-backed repository and migrations, CloudLogs service (ingest/search/retention) with trace propagation, HTTP handlers and SDK/CLI commands, a background LogWorker for retention, InstanceService ingestion wiring, DI/router changes, and tests (unit, repo, worker, handler, e2e), plus API docs and Swagger updates.

Changes

Cohort / File(s) Summary
Domain & Ports
internal/core/domain/log.go, internal/core/ports/log.go
New LogEntry and LogQuery domain types and LogRepository/LogService interfaces defining persistence and business contracts.
Core Service & Tests
internal/core/services/cloudlogs.go, internal/core/services/cloudlogs_test.go, internal/core/services/cloudlogs_unit_test.go, internal/core/services/shared_test.go
Adds CloudLogsService implementing ingest/search/retention with trace extraction/propagation; integration and unit tests plus test mocks.
Repository, Migrations & Repo Tests
internal/repositories/postgres/log_repo.go, internal/repositories/postgres/log_repo_test.go, internal/repositories/postgres/migrations/089_create_log_entries_table.*.sql
Postgres LogRepository with batch INSERT, filtered LIST (count + pagination), and DeleteByAge; SQL migration to create log_entries; pgxmock-backed repo tests.
HTTP Handlers & Tests
internal/handlers/log_handler.go, internal/handlers/log_handler_test.go, internal/api/setup/router.go
New LogHandler with Search and GetByResource endpoints (RFC3339 time parsing, pagination); router wiring and handler tests added.
Workers & Wiring
internal/workers/log_worker.go, internal/workers/log_worker_test.go, internal/api/setup/dependencies.go, cmd/api/main.go
New LogWorker (default 24h interval, 30d retention); DI updated to include Log repo/service/worker and main now starts the Log worker.
InstanceService Integration
internal/core/services/instance.go
Instance termination optionally fetches container logs and calls LogService.IngestLogs (non-blocking on ingestion failures).
CLI & SDK
cmd/cloud/logs.go, cmd/cloud/main.go, pkg/sdk/logs.go
Adds CLI commands logs search and logs show; SDK types and client methods SearchLogs and GetLogsByResource.
Client Context Change
pkg/sdk/client.go
Public client HTTP methods now accept context.Context and propagate it to Resty requests.
Docs & API Spec
docs/*, docs/swagger/*, docs/plans/cloudlogs-implementation.md, docs/prds/cloudlogs.md, docs/cli-reference.md
Adds CloudLogs docs, PRD, implementation plan, CLI docs and API reference for /logs and /logs/{id}; docs/cli-reference.md contains a duplicated CloudLogs section/formatting issue.
E2E & Test infra
tests/logs_e2e_test.go, internal/core/services/setup_test.go
E2E test for instance lifecycle and log retrieval; test DB cleanup now clears log_entries.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as LogHandler
    participant Service as CloudLogsService
    participant Repo as LogRepository
    participant DB as PostgreSQL

    Client->>Handler: GET /logs?resource_type=instance
    Handler->>Handler: parse params (time, pagination)
    Handler->>Service: SearchLogs(LogQuery)
    Service->>Repo: List(ctx, query)
    Repo->>DB: COUNT / SELECT ... WHERE ... ORDER BY timestamp DESC LIMIT/OFFSET
    DB-->>Repo: rows + total
    Repo-->>Service: []*LogEntry, total
    Service-->>Handler: entries, total
    Handler-->>Client: 200 {data: {entries, total, limit, offset}}
Loading
sequenceDiagram
    participant Instance as InstanceService
    participant Container as ContainerRuntime
    participant LogSvc as CloudLogsService
    participant Repo as LogRepository
    participant DB as PostgreSQL

    Instance->>Container: fetch stdout/stderr logs
    Container-->>Instance: raw log lines
    Instance->>LogSvc: IngestLogs(entries)
    LogSvc->>LogSvc: extract/propagate TraceID from ctx
    LogSvc->>Repo: Create(ctx, entries)
    Repo->>DB: INSERT INTO log_entries (...)
    DB-->>Repo: success
    Repo-->>LogSvc: nil
    LogSvc-->>Instance: nil
    Instance->>Container: remove container
Loading
sequenceDiagram
    participant Worker as LogWorker
    participant Service as CloudLogsService
    participant Repo as LogRepository
    participant DB as PostgreSQL
    participant Timer as Ticker

    Worker->>Service: RunRetentionPolicy(30 days)
    Service->>Repo: DeleteByAge(ctx, 30)
    Repo->>DB: DELETE FROM log_entries WHERE timestamp < NOW() - INTERVAL '30 days'
    DB-->>Repo: rows deleted
    Repo-->>Service: nil
    Service-->>Worker: nil
    Worker->>Timer: wait 24h
    Timer-->>Worker: tick -> RunRetentionPolicy(...)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I munched through traces and stitched each line,
I hopped through timestamps, made every search shine,
A worker keeps watch through night and day,
Old logs swept kindly far away,
CloudLogs hums — rabbit-approved, hooray! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (14 files):

⚔️ cmd/api/main.go (content)
⚔️ cmd/cloud/main.go (content)
⚔️ docs/FEATURES.md (content)
⚔️ docs/api-reference.md (content)
⚔️ docs/cli-reference.md (content)
⚔️ docs/swagger/docs.go (content)
⚔️ docs/swagger/swagger.json (content)
⚔️ docs/swagger/swagger.yaml (content)
⚔️ internal/api/setup/dependencies.go (content)
⚔️ internal/api/setup/router.go (content)
⚔️ internal/core/services/instance.go (content)
⚔️ internal/core/services/setup_test.go (content)
⚔️ internal/core/services/shared_test.go (content)
⚔️ pkg/sdk/client.go (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: Implement CloudLogs Service' accurately and specifically summarizes the main objective of the changeset, which is to introduce a comprehensive CloudLogs service with persistent logging, search, trace correlation, and retention features.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/cloudlogs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Fix all issues with AI agents
In `@cmd/cloud/logs.go`:
- Around line 46-49: The json.MarshalIndent call in the outputJSON branch
currently discards its error (data, _ := json.MarshalIndent(res, "", "  "));
change this to capture and check the error, and on failure write an error
message to stderr (including the error detail) and exit non‑zero or return an
error so the CLI doesn't silently continue; do the same fix for the identical
pattern in logsShowCmd so both json.MarshalIndent usages validate errors and
report them (refer to the outputJSON branch, the json.MarshalIndent call, and
logsShowCmd).

In `@docs/api-reference.md`:
- Around line 516-517: Update the API reference route from "GET /logs/:id" to
"GET /logs/:resource_id" and change any accompanying parameter names and
examples to match (e.g., path templates, curl examples, request/response
samples, and the parameters table). Ensure the documented path parameter name
matches the implementation plan and handler semantics so all references to ":id"
in this endpoint are replaced with ":resource_id" for consistency.

In `@docs/cli-reference.md`:
- Line 122: Fix the typo in the CLI reference table: change the mismatched
quoted entry `--volume"` to the correct backticked flag `--volume` in the table
row (the cell currently showing `--volume"` | `-V` | No | - | Volume attachment
(vol-name:/path) ), ensuring the backticks correctly wrap the flag name.

In `@docs/plans/cloudlogs-implementation.md`:
- Around line 37-40: Update Phase 6 checklist to reflect the tests added in the
PR: mark "Unit tests for `LogService`" as done (checked), and if unit tests for
the handler and worker were also added, add and check items like "Unit tests for
`LogHandler`" and "Unit tests for `LogWorker`"; leave "Integration tests for
`LogRepository`" and the smoke test unchecked unless those were actually
implemented, in which case check them too so the plan matches reality.

In `@internal/api/setup/router.go`:
- Line 68: The Handlers struct's Log field is never initialized and no /logs
routes are registered; add initialization of handlers.Log using
NewLogHandler(svcs.Log) inside InitHandlers where other handlers are created,
and then register a top-level log group in SetupRouter (e.g., r.Group("/logs"))
that applies the same middleware (httputil.Auth(svcs.Identity, svcs.Tenant),
httputil.RequireTenant()) and wires handlers.Log.Search to GET "" and
handlers.Log.GetByResource to GET "/:id" so the LogHandler is both non-nil and
reachable at runtime.

In `@internal/core/services/cloudlogs_test.go`:
- Line 66: The test currently discards errors from svc.IngestLogs by assigning
to the blank identifier; update both calls to capture the returned error (from
svc.IngestLogs(ctx, []*domain.LogEntry{...})) and assert success instead of
ignoring it—e.g., replace `_ = svc.IngestLogs(...)` with `err :=
svc.IngestLogs(...)` followed by a test assertion like `require.NoError(t, err)`
(or t.Fatalf on error) to fail fast when setup fails; do this for both
occurrences of svc.IngestLogs in the test.

In `@internal/core/services/cloudlogs.go`:
- Around line 52-54: In CloudLogsService.RunRetentionPolicy validate the days
parameter and return a service-layer InvalidInput error instead of calling the
repo when days is non-positive: check if days <= 0, and if so return
internal/errors.New(internal/errors.InvalidInput, "invalid retention days; must
be > 0"); otherwise proceed to log and call s.repo.DeleteByAge(ctx, days).
Ensure you reference RunRetentionPolicy and s.repo.DeleteByAge in the change.

In `@internal/handlers/log_handler.go`:
- Around line 66-69: The code silently ignores strconv.Atoi errors for
pagination (limit, offset) which masks bad input; update the handler to check
the errors returned by strconv.Atoi when parsing c.DefaultQuery("limit", "100")
and c.DefaultQuery("offset", "0") and return a 400 Bad Request with a clear
error message if either value is non-numeric instead of assigning to the blank
identifier. Specifically, after parsing into limit and offset, if err != nil for
either parse, use the request context (the same c) to abort/respond with status
400 and a JSON or text error explaining which parameter is invalid; otherwise
assign the validated limit and offset to query.Limit and query.Offset as before.
- Around line 52-63: The handler currently swallows time.Parse errors when
reading c.Query("start_time") and c.Query("end_time") and silently ignores
invalid filters; change the logic to validate parsing and return a 400 Bad
Request with a clear error message when time.Parse(time.RFC3339, ...) fails.
Specifically, inside the same handler that reads c.Query use the returned err
from time.Parse and if non-nil call the request context's error response (e.g.,
abort/return 400 JSON) rather than setting nothing; continue to set
query.StartTime and query.EndTime only on successful parse. Ensure the response
includes which parameter ("start_time" or "end_time") was invalid and the parse
error text so callers know why the request was rejected.

In `@internal/repositories/postgres/log_repo.go`:
- Around line 114-124: After the rows.Next() loop in the function that builds
entries (using domain.LogEntry), add a check for rows.Err() immediately before
returning: if err := rows.Err(); err != nil { return nil, total,
errors.Wrap(errors.Internal, "error iterating log rows", err) } so iteration
errors (network/scan failures) are propagated instead of returning partial
entries; place this check after the loop and before the final return of entries,
total, nil and use the same errors.Wrap pattern used elsewhere.

In `@internal/workers/log_worker.go`:
- Line 35: The initial call to w.logSvc.RunRetentionPolicy(ctx, w.retentionDays)
currently discards errors; update the startup call in log_worker.go to capture
the returned error and handle it the same way as the tick handler does (check
the error and call the logger to record it), e.g. assign the result to err and
log a descriptive message if err != nil so that failures from
w.logSvc.RunRetentionPolicy and the ctx/w.retentionDays invocation are not
silently ignored.

In `@pkg/sdk/logs.go`:
- Around line 43-44: Update the SDK methods to accept context.Context as their
first parameter: change func (c *Client) SearchLogs(query LogQuery) and func (c
*Client) GetLogsByResource(...) to func (c *Client) SearchLogs(ctx
context.Context, query LogQuery) and similarly for GetLogsByResource; then
update and propagate the context into the underlying HTTP helper by changing the
c.get signature in client.go to accept ctx (e.g., c.get(ctx, ...)) and pass ctx
through every call site so the HTTP request uses the provided context for
cancellation, timeouts, and deadline propagation.

In `@tests/logs_e2e_test.go`:
- Around line 46-48: The test suite relies on a shared variable instanceID set
in the LaunchInstance subtest; add a guard to fail fast when instanceID is empty
by calling require.NotEmpty(t, instanceID, "depends on LaunchInstance") at the
start of each dependent subtest (TerminateInstance, VerifyHistoricalLogs) so
they don't call incorrect endpoints when LaunchInstance fails or is skipped;
ensure this check references the shared instanceID and is placed before any HTTP
calls that use instanceID.
- Line 10: Remove the unused import "github.com/google/uuid" from the test file
to fix the compilation error; locate the import block that includes the uuid
package (the string "github.com/google/uuid") and delete that entry (or replace
with a used alternative) so the package is not imported without being
referenced.
🧹 Nitpick comments (14)
pkg/sdk/logs.go (1)

83-87: resourceID is not URL-path-encoded.

If resourceID ever contains characters that are meaningful in URL paths (e.g., /, ?), the constructed path will be malformed. Consider using url.PathEscape(resourceID).

Proposed fix
-	path := fmt.Sprintf("/logs/%s", resourceID)
+	path := fmt.Sprintf("/logs/%s", url.PathEscape(resourceID))
docs/cli-reference.md (1)

855-885: CloudLogs CLI documentation looks good. One minor formatting nit from static analysis: add a blank line before the flags table on line 868 to satisfy markdownlint (MD058).

Proposed fix
 **Flags**:
+
 | Flag | Default | Description |
internal/repositories/postgres/migrations/089_create_log_entries_table.up.sql (2)

13-14: Consider adding a composite index on (tenant_id, timestamp) for the primary query pattern.

Most log queries are tenant-scoped with time-range filters. The current idx_logs_timestamp on (timestamp) alone won't be used efficiently when tenant_id is also in the WHERE clause. A composite index would cover the dominant access pattern (tenant + time range + ORDER BY timestamp DESC).

 CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id);
-CREATE INDEX idx_logs_timestamp ON log_entries(timestamp);
+CREATE INDEX idx_logs_timestamp ON log_entries(tenant_id, timestamp DESC);

6-6: Consider adding a CHECK constraint on level to enforce valid values.

The plan specifies INFO, WARN, ERROR as valid levels. Without a constraint, any string can be inserted, risking inconsistent data.

Proposed change
-    level TEXT NOT NULL,
+    level TEXT NOT NULL CHECK (level IN ('INFO', 'WARN', 'ERROR')),
internal/core/services/cloudlogs_test.go (1)

17-24: Service test uses a real repository instead of a mock.

This test directly instantiates postgres.NewLogRepository(db) and passes it to the service. The coding guidelines for internal/core/services/**/*_test.go require mocking repositories in service tests. The separate cloudlogs_unit_test.go likely handles the mocked path — consider either moving these integration tests to a dedicated integration test package (e.g., tests/) or adding a build tag to separate them from unit tests.

As per coding guidelines, "Mock repositories in service tests".

internal/workers/log_worker_test.go (1)

24-27: SearchLogs mock will panic if called with a nil first return value.

Unlike the handler test's mock (which guards with a nil check on args.Get(0)), this mock directly type-asserts without checking. While SearchLogs isn't exercised in these worker tests, a future test addition could hit this.

Defensive fix
 func (m *mockLogService) SearchLogs(ctx context.Context, query domain.LogQuery) ([]*domain.LogEntry, int, error) {
 	args := m.Called(ctx, query)
-	return args.Get(0).([]*domain.LogEntry), args.Int(1), args.Error(2)
+	if args.Get(0) == nil {
+		return nil, args.Int(1), args.Error(2)
+	}
+	return args.Get(0).([]*domain.LogEntry), args.Int(1), args.Error(2)
 }
internal/core/services/instance.go (1)

641-667: Log ingestion ordering and best-effort pattern look correct.

Fetching logs before removeInstanceContainer is the right call. The silent-on-failure approach for ingestion is appropriate for a termination path.

Two minor observations:

  1. Redundant lookup: Line 643 calls s.GetInstanceLogs(ctx, idOrName), which internally calls s.GetInstance(ctx, idOrName) again (line 597). You already have inst — passing inst.ContainerID directly to s.compute.GetInstanceLogs would avoid the extra DB round-trip.

  2. All entries stamped INFO / time.Now(): Every line is marked Level: "INFO" and Timestamp: time.Now(). This means log severity is lost and timestamps reflect ingestion rather than production time. Acceptable as a v1 simplification, but worth a TODO for future parsing.

♻️ Suggested refactor for point 1
-		logs, err := s.GetInstanceLogs(ctx, idOrName)
-		if err == nil && logs != "" {
+		stream, logErr := s.compute.GetInstanceLogs(ctx, inst.ContainerID)
+		if logErr == nil {
+			defer stream.Close()
+			raw, _ := io.ReadAll(stream)
+			logs := string(raw)
+			if logs != "" {
internal/core/services/cloudlogs.go (1)

48-50: Consider defaulting Limit when zero/unset.

SearchLogs passes the query straight to the repository. If query.Limit is 0 (the zero-value), depending on the repository implementation this could either return nothing or return an unbounded result set. Adding a sensible default here would be defensive:

 func (s *CloudLogsService) SearchLogs(ctx context.Context, query domain.LogQuery) ([]*domain.LogEntry, int, error) {
+	if query.Limit <= 0 {
+		query.Limit = 100
+	}
 	return s.repo.List(ctx, query)
 }
internal/core/ports/log.go (1)

9-19: LogRepository is missing GetByID, Update, and Delete per coding guidelines.

The guideline states port repository interfaces must include CRUD methods. LogRepository only has Create, List, and DeleteByAge. While individual log entry mutation isn't typical for a logging system, a GetByID method would at minimum support the GET /logs/:id endpoint referenced in the PRD and API docs.

Consider adding at least:

GetByID(ctx context.Context, id uuid.UUID) (*domain.LogEntry, error)

As per coding guidelines, Port repository interfaces must include CRUD methods (Create, GetByID, Update, Delete) with context as first parameter.

internal/repositories/postgres/log_repo.go (3)

27-43: No upper bound on batch size — may exceed PostgreSQL's parameter limit.

Each entry contributes 8 parameters. PostgreSQL supports a maximum of 65,535 parameters per query, so batches exceeding ~8,191 entries will fail. Consider chunking the inserts.

Sketch: chunk the batch
+const maxBatchSize = 5000 // stay well within the 65535 param limit (5000 * 8 = 40000)
+
 func (r *LogRepository) Create(ctx context.Context, entries []*domain.LogEntry) error {
 	if len(entries) == 0 {
 		return nil
 	}
+	for start := 0; start < len(entries); start += maxBatchSize {
+		end := start + maxBatchSize
+		if end > len(entries) {
+			end = len(entries)
+		}
+		if err := r.createBatch(ctx, entries[start:end]); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+func (r *LogRepository) createBatch(ctx context.Context, entries []*domain.LogEntry) error {
 	query := `INSERT INTO log_entries ...`
 	// ... existing batch logic ...
+}

68-71: ILIKE wildcards in user input are not escaped.

Characters % and _ in query.Search are treated as ILIKE wildcards rather than literals. If a user searches for a literal % or _, they'll get broader results than expected. Consider escaping these characters.

Sketch
 	if query.Search != "" {
-		filters += fmt.Sprintf(" AND message ILIKE $%d", placeholderIdx)
-		args = append(args, "%"+query.Search+"%")
+		filters += fmt.Sprintf(" AND message ILIKE $%d", placeholderIdx)
+		escaped := strings.NewReplacer(`%`, `\%`, `_`, `\_`).Replace(query.Search)
+		args = append(args, "%"+escaped+"%")
 		placeholderIdx++
 	}

95-98: Magic number 100 for default limit — duplicated with handler.

The default limit of 100 is hard-coded here and in the handler. Extract a shared named constant to keep them in sync.

As per coding guidelines, "Do not use magic numbers - use named constants instead."

internal/core/services/cloudlogs_unit_test.go (2)

19-83: Shared mock across subtests — AssertExpectations at parent level can mask which subtest failed.

mockRepo.AssertExpectations(t) on Line 82 uses the parent test's t, so if an expectation from a specific subtest isn't met, the failure won't be attributed to that subtest. Consider moving the assertion into each subtest or creating a fresh mock per subtest.


85-112: Tests cover the happy paths well; consider adding error-case subtests.

SearchLogs and RunRetentionPolicy each have a single success scenario. Adding cases where the mock repository returns an error would improve coverage of the service's error propagation.

Comment thread cmd/cloud/logs.go
Comment thread docs/api-reference.md Outdated
Comment thread docs/cli-reference.md Outdated
Comment thread docs/plans/cloudlogs-implementation.md
Comment thread internal/api/setup/router.go
Comment thread internal/repositories/postgres/log_repo.go
Comment thread internal/workers/log_worker.go Outdated
Comment thread pkg/sdk/logs.go Outdated
Comment thread tests/logs_e2e_test.go Outdated
Comment thread tests/logs_e2e_test.go
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the CloudLogs Service, a comprehensive persistent logging solution for platform resources (instances and functions). The implementation follows hexagonal architecture with proper separation of concerns across domain, ports, services, repositories, handlers, and workers.

Changes:

  • Adds CloudLogs service for persistent storage and querying of resource logs in PostgreSQL
  • Implements automatic log ingestion during instance termination
  • Adds background worker for automated 30-day retention policy
  • Provides SDK and CLI support for log search and retrieval

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
internal/core/domain/log.go Defines LogEntry and LogQuery domain models for log storage
internal/core/ports/log.go Defines LogRepository and LogService interfaces
internal/repositories/postgres/log_repo.go Implements PostgreSQL-backed log storage with batch insert and filtering
internal/repositories/postgres/log_repo_test.go Unit tests for repository layer
internal/repositories/postgres/migrations/089_create_log_entries_table.up.sql Creates log_entries table with indexes
internal/repositories/postgres/migrations/089_create_log_entries_table.down.sql Rollback migration
internal/core/services/cloudlogs.go Implements CloudLogsService with trace ID extraction
internal/core/services/cloudlogs_test.go Integration tests for CloudLogsService
internal/core/services/cloudlogs_unit_test.go Unit tests for CloudLogsService
internal/core/services/instance.go Integrates log ingestion into instance termination flow
internal/handlers/log_handler.go HTTP handlers for log search and retrieval endpoints
internal/handlers/log_handler_test.go Handler tests
internal/workers/log_worker.go Background worker for retention policy
internal/workers/log_worker_test.go Worker tests
internal/api/setup/dependencies.go Wires CloudLogs dependencies
internal/api/setup/router.go Adds Log handler to router (routes not registered)
internal/core/services/setup_test.go Adds log_entries to test cleanup
internal/core/services/shared_test.go Adds mock implementations
pkg/sdk/logs.go SDK client methods for log operations
cmd/cloud/logs.go CLI commands for log search and display
cmd/cloud/main.go Registers cloudLogsCmd
cmd/api/main.go Starts LogWorker
tests/logs_e2e_test.go End-to-end test for log lifecycle
docs/prds/cloudlogs.md Product requirements document
docs/plans/cloudlogs-implementation.md Implementation plan
docs/cli-reference.md CLI documentation with minor typo fix
docs/api-reference.md API documentation for log endpoints
docs/FEATURES.md Updated observability feature description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to +42
query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES `
values := []interface{}{}
placeholders := []string{}

for i, entry := range entries {
offset := i * 8
placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)",
offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8))
values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID)
}

query += strings.Join(placeholders, ",")
_, err := r.db.Exec(ctx, query, values...)
if err != nil {
return errors.Wrap(errors.Internal, "failed to batch create log entries", err)
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch insert constructs a single large SQL statement with all entries. For very large batches (e.g., thousands of log lines from a long-running instance), this could lead to issues: 1) SQL statement size limits, 2) Memory pressure from building a large query string, 3) Single transaction holding locks for extended time. Consider implementing batching with a maximum batch size (e.g., 1000 entries per statement) to avoid these issues while still benefiting from batch performance.

Suggested change
query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES `
values := []interface{}{}
placeholders := []string{}
for i, entry := range entries {
offset := i * 8
placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)",
offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8))
values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID)
}
query += strings.Join(placeholders, ",")
_, err := r.db.Exec(ctx, query, values...)
if err != nil {
return errors.Wrap(errors.Internal, "failed to batch create log entries", err)
}
const maxBatchSize = 1000
for start := 0; start < len(entries); start += maxBatchSize {
end := start + maxBatchSize
if end > len(entries) {
end = len(entries)
}
query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES `
values := []interface{}{}
placeholders := []string{}
for i, entry := range entries[start:end] {
offset := i * 8
placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)",
offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8))
values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID)
}
query += strings.Join(placeholders, ",")
_, err := r.db.Exec(ctx, query, values...)
if err != nil {
return errors.Wrap(errors.Internal, "failed to batch create log entries", err)
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +133
func (r *LogRepository) DeleteByAge(ctx context.Context, days int) error {
query := `DELETE FROM log_entries WHERE timestamp < NOW() - INTERVAL '1 day' * $1`
_, err := r.db.Exec(ctx, query, days)
if err != nil {
return errors.Wrap(errors.Internal, "failed to delete old logs", err)
}
return nil
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retention policy runs a DELETE without LIMIT, which could lock the table and impact performance if there are many old logs to delete (e.g., after initial deployment or after retention was disabled for a while). Consider implementing batch deletion with LIMIT and looping, or using PARTITION BY for time-based partitioning if log volume is expected to be high. Also, consider logging the number of deleted rows to help operators monitor retention policy effectiveness.

Copilot uses AI. Check for mistakes.
ResourceType: "instance",
Level: "INFO",
Message: line,
Timestamp: time.Now(),
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All log entries are timestamped with time.Now() at ingestion time rather than preserving the actual time when the log was generated. This means historical log timestamps will be inaccurate and will represent when the instance was terminated, not when the log event occurred. Consider parsing timestamps from log lines if available, or accepting the limitation and documenting it.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +62
if st := c.Query("start_time"); st != "" {
t, err := time.Parse(time.RFC3339, st)
if err == nil {
query.StartTime = &t
}
}
if et := c.Query("end_time"); et != "" {
t, err := time.Parse(time.RFC3339, et)
if err == nil {
query.EndTime = &t
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent error handling for time parsing. When start_time or end_time parameters fail to parse, the errors are silently ignored and the filter is skipped. This could lead to confusion where users provide invalid timestamps and get unexpected results without any error message. Consider returning a 400 Bad Request error with a descriptive message when time parsing fails.

Copilot uses AI. Check for mistakes.
Comment thread docs/cli-reference.md Outdated
| `--vpc` | `-v` | No | - | VPC ID or name |
| `--subnet` | `-s` | No | - | Subnet ID or name |
| `--volume` | `-V` | No | - | Volume attachment (vol-name:/path) |
| `--volume" | `-V` | No | - | Volume attachment (vol-name:/path) |
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatched quote in flag name. The flag is defined as --volume" with a closing quote but no opening quote.

Suggested change
| `--volume" | `-V` | No | - | Volume attachment (vol-name:/path) |
| `--volume` | `-V` | No | - | Volume attachment (vol-name:/path) |

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +14
CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id);
CREATE INDEX idx_logs_timestamp ON log_entries(timestamp);
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CREATE INDEX statements lack the IF NOT EXISTS clause. Based on repository conventions, indexes should be made idempotent using CREATE INDEX IF NOT EXISTS to avoid warnings on repeated migrations.

Suggested change
CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id);
CREATE INDEX idx_logs_timestamp ON log_entries(timestamp);
CREATE INDEX IF NOT EXISTS idx_logs_tenant_resource ON log_entries(tenant_id, resource_id);
CREATE INDEX IF NOT EXISTS idx_logs_timestamp ON log_entries(timestamp);

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +124
}

return entries, total, nil
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing rows.Err() check after rows.Next() iteration. According to repository conventions, after iterating through rows with rows.Next(), you should always check rows.Err() to catch any iteration errors that may have occurred.

Copilot uses AI. Check for mistakes.
TenantID: inst.TenantID,
ResourceID: inst.ID.String(),
ResourceType: "instance",
Level: "INFO",
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All logs are assigned Level "INFO" regardless of their actual content. This means error logs, warnings, and debug information will all be stored with the same severity level, making filtering by level ineffective. Consider parsing log lines to extract the actual log level if available, or using heuristics (e.g., checking for keywords like "error", "warn", "fatal") to assign appropriate levels.

Copilot uses AI. Check for mistakes.
GlobalLB *httphandlers.GlobalLBHandler
SSHKey *httphandlers.SSHKeyHandler
ElasticIP *httphandlers.ElasticIPHandler
Log *httphandlers.LogHandler
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogHandler is initialized in the Handlers struct but the routes are never registered in the router. The handlers.Log field is populated in the diff for internal/api/setup/router.go (line 68), but there's no corresponding route registration for the /logs endpoints in SetupRouter. This means the CloudLogs API endpoints will not be accessible. You need to add route registrations similar to how other handlers are registered, e.g., in registerAdminRoutes or a new registerObservabilityRoutes function.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +40
## Phase 6: Verification
- [ ] Unit tests for `LogService`.
- [ ] Integration tests for `LogRepository`.
- [ ] Smoke test: Launch instance -> generate logs -> terminate -> verify logs still exist in CloudLogs.
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation plan at line 38-40 states that Phase 6 verification tasks are incomplete (Unit tests for LogService, Integration tests for LogRepository, Smoke test). However, these tests actually exist in the PR: internal/core/services/cloudlogs_unit_test.go, internal/core/services/cloudlogs_test.go, internal/repositories/postgres/log_repo_test.go, and tests/logs_e2e_test.go. The checkboxes should be marked as completed to accurately reflect the implementation status.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 14, 2026 18:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
entries = append(entries, &entry)
}

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After iterating rows.Next(), the code should check rows.Err() before returning to catch any iteration errors (this is done in other Postgres repos).

Suggested change
if err := rows.Err(); err != nil {
return nil, 0, errors.Wrap(errors.Internal, "failed during log rows iteration", err)
}

Copilot uses AI. Check for mistakes.
Comment thread internal/workers/log_worker.go Outdated
w.logger.Info("log worker started")

// Initial run
_ = w.logSvc.RunRetentionPolicy(ctx, w.retentionDays)
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial retention run discards the returned error. If the first cleanup fails (DB outage, etc.), it will be silent and only later ticks will log. Consider logging the error (and potentially using the worker logger) for the initial run too.

Suggested change
_ = w.logSvc.RunRetentionPolicy(ctx, w.retentionDays)
if err := w.logSvc.RunRetentionPolicy(ctx, w.retentionDays); err != nil {
w.logger.Error("failed to run initial log retention policy", "error", err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +590 to +591
logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search)
logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource)
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/logs and /logs/:id are guarded with PermissionInstanceRead, but the endpoint can return logs for other resource types (e.g. functions) via resource_type/resource_id. This can allow users with instance-read permission to access function logs without PermissionFunctionRead. Consider introducing a dedicated logs:read permission, or enforcing permission based on resource_type (and/or adding a separate route/middleware for function logs).

Suggested change
logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search)
logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource)
logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionFullAccess), handlers.Log.Search)
logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionFullAccess), handlers.Log.GetByResource)

Copilot uses AI. Check for mistakes.

// Ingest logs before termination if LogService is available
if s.logSvc != nil && inst.ContainerID != "" {
logs, err := s.GetInstanceLogs(ctx, idOrName)
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During termination, logs are fetched via GetInstanceLogs(ctx, idOrName) which re-queries the instance even though inst is already loaded. This adds an extra DB hit and can lead to inconsistencies if idOrName was a name that changed. Consider fetching logs directly from inst.ContainerID (or refactor GetInstanceLogs to accept an *domain.Instance).

Suggested change
logs, err := s.GetInstanceLogs(ctx, idOrName)
// Use the stable instance ID instead of idOrName to avoid inconsistencies if the name has changed.
logs, err := s.GetInstanceLogs(ctx, inst.ID.String())

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@internal/api/setup/router.go`:
- Around line 586-593: The routes registered in registerLogRoutes currently gate
all log access with domain.PermissionInstanceRead, blocking users who only have
function-level access; update the authorization to allow log-specific
permissions by replacing or augmenting the middleware on logGroup.GET and
logGroup.GET("/:id") so they use a log-specific permission (e.g.,
domain.PermissionLogRead) or perform per-resource-type checks against svcs.RBAC
(e.g., if resource_type== "function" require domain.PermissionFunctionRead,
otherwise require domain.PermissionInstanceRead) before calling
handlers.Log.Search and handlers.Log.GetByResource; ensure you reference
registerLogRoutes, handlers.Log.Search, handlers.Log.GetByResource, svcs.RBAC
and domain.PermissionInstanceRead when making the change so the routes authorize
correctly for both instance and function log access.

In `@tests/logs_e2e_test.go`:
- Around line 67-75: Replace the non-fatal assertion with a fatal one: change
the call from assert.Equal(t, http.StatusOK, resp.StatusCode) to
require.Equal(t, http.StatusOK, resp.StatusCode) so the test stops immediately
on a bad status and avoids attempting to json.Decode(resp.Body) (the variables
involved are resp and the subsequent json.NewDecoder(...).Decode(&res)).
🧹 Nitpick comments (2)
internal/api/setup/router.go (1)

185-185: Placement nit: registerLogRoutes is called between registerAdminRoutes and the gateway proxy, but defined between registerAdminRoutes and registerTenantRoutes.

This is fine functionally, but the definition order (registerLogRoutes at Line 586, then registerTenantRoutes at Line 595) doesn't match the call order in SetupRouter (admin → log → tenant is not what's called — it's admin → log, then tenant is called earlier at Line 183). Minor inconsistency in code organization — no action needed unless you want to keep strict call-order ↔ definition-order alignment.

tests/logs_e2e_test.go (1)

22-22: Extract magic numbers into named constants.

Several numeric literals are used directly: 60 * time.Second (Line 22), 5 * time.Second (Line 50), 2 * time.Second (Line 62). Consider defining these as named constants at the top of the file for clarity and ease of tuning.

As per coding guidelines, "Do not use magic numbers - use named constants instead".

Example constants
const (
	httpClientTimeout     = 60 * time.Second
	instanceBootWait      = 5 * time.Second
	logIngestionWait      = 2 * time.Second
)

Also applies to: 50-50, 62-62

Comment on lines +586 to +593
func registerLogRoutes(r *gin.Engine, handlers *Handlers, svcs *Services) {
logGroup := r.Group("/logs")
logGroup.Use(httputil.Auth(svcs.Identity, svcs.Tenant), httputil.RequireTenant(), httputil.TenantMember(svcs.Tenant))
{
logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search)
logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if a log-specific permission already exists in the RBAC domain
rg -n 'PermissionLog' --type=go

# Also check what function-related permissions exist
rg -n 'PermissionFunction' --type=go internal/core/domain/

Repository: poyrazK/thecloud

Length of output: 413


🏁 Script executed:

#!/bin/bash
# Find the log handler implementation
fd -e go handlers | xargs grep -l "func.*Search\|func.*GetByResource" | head -5

# Also check the domain definitions for any log resource types
rg -n 'type.*Log\|ResourceType' --type=go internal/core/domain/ | grep -i log

# Check if there's any documentation about log authorization
fd -e md . | xargs grep -l "log\|permission" | head -5

Repository: poyrazK/thecloud

Length of output: 140


🏁 Script executed:

#!/bin/bash
# Find files containing Log handler
rg -n 'handlers\.Log|type.*Log.*struct' --type=go internal/api/ | head -20

# Find the handlers definition
fd -e go -path '*/handlers/*' | head -10

# Check the Handlers struct to understand Log field
rg -n 'type Handlers struct' --type=go -A 30

Repository: poyrazK/thecloud

Length of output: 2921


🏁 Script executed:

#!/bin/bash
# Find LogHandler definition
fd -e go | xargs grep -l 'type LogHandler' | head -5

# Find httphandlers package
fd -type d -name httphandlers

# Check what's in the httphandlers directory
ls -la internal/api/handlers/

Repository: poyrazK/thecloud

Length of output: 350


🏁 Script executed:

#!/bin/bash
# Read the LogHandler file to understand the implementation
wc -l internal/handlers/log_handler.go

# Show the file with line numbers
cat -n internal/handlers/log_handler.go

Repository: poyrazK/thecloud

Length of output: 3978


🏁 Script executed:

#!/bin/bash
# Find LogService definition
fd -e go | xargs grep -l 'type LogService\|SearchLogs' | grep -E '(service|ports)' | head -10

# Check the ports package for LogService interface
fd -e go -path '*/ports/*' | xargs grep -l LogService

Repository: poyrazK/thecloud

Length of output: 435


🏁 Script executed:

#!/bin/bash
# Read the LogService interface
cat -n internal/core/ports/log.go

# Also get size of cloudlogs.go
wc -l internal/core/services/cloudlogs.go

Repository: poyrazK/thecloud

Length of output: 1395


🏁 Script executed:

#!/bin/bash
# Read the cloudlogs.go service implementation
cat -n internal/core/services/cloudlogs.go

Repository: poyrazK/thecloud

Length of output: 1842


Authorization gates function logs behind instance permission, preventing function-only users from accessing their logs.

The /logs endpoints currently require PermissionInstanceRead for all requests. However, the LogHandler is designed to support multiple resource types (it accepts a resource_type query parameter), and the CloudLogsService provides no internal filtering by permission. This means a user with PermissionFunctionRead but not PermissionInstanceRead cannot access logs for their functions—even though the handler and service infrastructure support it.

Per the PR objectives covering both instances and functions, introduce PermissionLogRead (or similar) to authorize log access independently of the instance domain. Alternatively, implement per-resource-type permission checks (e.g., check PermissionFunctionRead when resource_type=function).

🤖 Prompt for AI Agents
In `@internal/api/setup/router.go` around lines 586 - 593, The routes registered
in registerLogRoutes currently gate all log access with
domain.PermissionInstanceRead, blocking users who only have function-level
access; update the authorization to allow log-specific permissions by replacing
or augmenting the middleware on logGroup.GET and logGroup.GET("/:id") so they
use a log-specific permission (e.g., domain.PermissionLogRead) or perform
per-resource-type checks against svcs.RBAC (e.g., if resource_type== "function"
require domain.PermissionFunctionRead, otherwise require
domain.PermissionInstanceRead) before calling handlers.Log.Search and
handlers.Log.GetByResource; ensure you reference registerLogRoutes,
handlers.Log.Search, handlers.Log.GetByResource, svcs.RBAC and
domain.PermissionInstanceRead when making the change so the routes authorize
correctly for both instance and function log access.

Comment thread tests/logs_e2e_test.go
Copilot AI review requested due to automatic review settings February 15, 2026 09:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +2
CREATE TABLE IF NOT EXISTS log_entries (
id UUID PRIMARY KEY,
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_entries migration is missing the -- +goose Up header comment that other migrations in this repo include. Even if the current migrator doesn’t parse goose directives, adding it keeps the migration set consistent and easier to manage with standard tooling.

Copilot uses AI. Check for mistakes.
return nil, 0, errors.Wrap(errors.Internal, "failed to scan log entry", err)
}
entries = append(entries, &entry)
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After iterating with rows.Next(), this repository should check rows.Err() before returning. Without it, iteration errors (e.g., network issues mid-stream) can be silently ignored and an incomplete result set returned.

Suggested change
}
}
if err := rows.Err(); err != nil {
return nil, 0, errors.Wrap(errors.Internal, "failed to iterate log rows", err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +586 to +592
func registerLogRoutes(r *gin.Engine, handlers *Handlers, svcs *Services) {
logGroup := r.Group("/logs")
logGroup.Use(httputil.Auth(svcs.Identity, svcs.Tenant), httputil.RequireTenant(), httputil.TenantMember(svcs.Tenant))
{
logGroup.GET("", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.Search)
logGroup.GET("/:id", httputil.Permission(svcs.RBAC, domain.PermissionInstanceRead), handlers.Log.GetByResource)
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These routes gate CloudLogs access behind PermissionInstanceRead only. The PR description/docs state CloudLogs supports both instances and functions, and the domain already defines PermissionFunctionRead; as-is, function log access would either be incorrectly denied or incorrectly authorized. Consider introducing a dedicated logs:read permission, or enforcing PermissionFunctionRead when resource_type=function (and PermissionInstanceRead for instances), or splitting endpoints by resource type.

Copilot uses AI. Check for mistakes.

httputil.Success(c, http.StatusOK, gin.H{
"entries": entries,
"total": total,
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetByResource returns only entries and total, but the SDK’s LogSearchResponse type also includes limit/offset and the /logs endpoint returns those fields. Either include limit (and offset if applicable) in this response as well, or use a different response shape/type to avoid clients seeing misleading zero-values.

Suggested change
"total": total,
"total": total,
"limit": limit,
"offset": 0,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@docs/api-reference.md`:
- Around line 519-524: Add a blank line immediately before the "Parameters:"
heading/table so the Markdown has a full empty line separating the preceding
paragraph from the table (fixes MD058); specifically insert one empty line above
the line that starts "**Parameters:**" in the snippet containing the table of
`resource_id` and `limit`.

In `@docs/cli-reference.md`:
- Around line 867-876: Add a blank line between the "**Flags**:" heading and the
following table (the pipe-delimited table that lists `--resource-id`,
`--resource-type`, etc.) so the table is separated by an empty line and
satisfies the MD058 markdown lint rule; update the section where the
"**Flags**:" header and the flags table are adjacent to insert that single
newline.

In `@internal/core/services/cloudlogs.go`:
- Around line 3-10: The build is failing because the internal errors package is
not imported while cloudlogs.go uses errors.New and errors.InvalidInput; add the
missing import for the internal errors package (e.g. the same module path used
elsewhere like github.com/poyrazk/thecloud/internal/errors) to the import block
in cloudlogs.go so references to errors.New and errors.InvalidInput compile;
keep the import grouped with the other project imports and run `go build` to
verify.

In `@internal/handlers/log_handler.go`:
- Around line 3-13: The build is failing because the package named "errors" is
used (errors.New, errors.InvalidInput) but not imported in
internal/handlers/log_handler.go; open the import block at the top of that file
and add the missing import for the package that declares errors (the package
whose identifier is used as errors) so the references to errors.New and
errors.InvalidInput resolve; ensure the import uses the package name "errors"
(or an alias) so existing calls in functions like the log handling functions
continue to compile.

In `@internal/repositories/postgres/log_repo.go`:
- Around line 95-98: Replace the magic number 100 used as the fallback for
query.Limit with a named constant (e.g., defaultQueryLimit or DefaultLimit)
declared at the top of the package; update the code in the function that sets
limit (the limit := query.Limit block) to use that constant instead of 100 and
ensure the constant has a clear name and brief comment so callers/readers
understand it represents the default result limit for the Postgres repository.
- Around line 22-44: LogRepository.Create currently builds one huge INSERT using
8 parameters per entry which can exceed PostgreSQL's 65,535 parameter limit;
modify LogRepository.Create to chunk the entries into safe-sized batches
(maxEntriesPerBatch = floor(65535 / 8) == 8191) and execute the INSERT per
chunk. Implement a loop that slices entries into chunks of at most
maxEntriesPerBatch, constructs placeholders and values for each chunk (same
logic currently used with fmt.Sprintf and values append), calls r.db.Exec(ctx,
query, values...) for each chunk, and returns the first non-nil wrapped error
(using errors.Wrap(errors.Internal, ...)) if any execution fails; keep existing
behavior for empty input and preserve the same query/placeholder format and
error messages.

In `@internal/workers/log_worker.go`:
- Around line 21-28: The NewLogWorker constructor currently hardcodes magic
numbers for interval and retentionDays; update it to either (1) extract
package-level named constants (e.g., DefaultLogWorkerInterval,
DefaultRetentionDays) and use those instead of 24 * time.Hour and 30, or (2)
preferably add parameters to NewLogWorker (e.g., interval time.Duration,
retentionDays int) and set LogWorker.interval and LogWorker.retentionDays from
those arguments while falling back to the named defaults when zero-valued;
reference the NewLogWorker function and the LogWorker.interval and
LogWorker.retentionDays fields to locate where to apply the change for
testability and configurability.

In `@pkg/sdk/logs.go`:
- Around line 3-8: The file is missing the "context" import causing undefined:
context errors in SearchLogs and GetLogsByResource; add the context package to
the import block (alongside fmt, net/url, strconv, time) so the function
signatures that use context.Context (SearchLogs and GetLogsByResource) compile
correctly.
🧹 Nitpick comments (4)
internal/core/services/cloudlogs_unit_test.go (4)

19-83: Shared mock across subtests risks expectation leakage and cascading failures.

mockRepo is created once (line 20) and shared across all subtests with AssertExpectations called at the parent level (line 82). If a subtest fails before consuming its .Once() expectation, subsequent subtests and the final assertion can produce misleading failures. Prefer creating a fresh mock per subtest.

Additionally, per coding guidelines, consider using table-driven tests where feasible (e.g., for the simpler cases like "success with entries", "empty entries", and "preserve existing trace id").

♻️ Suggested pattern: fresh mock per subtest
 func TestCloudLogsService_IngestLogs_Unit(t *testing.T) {
-	mockRepo := new(MockLogRepository)
-	svc := services.NewCloudLogsService(mockRepo, slog.Default())
-	ctx := context.Background()
-
 	t.Run("success with entries", func(t *testing.T) {
+		mockRepo := new(MockLogRepository)
+		svc := services.NewCloudLogsService(mockRepo, slog.Default())
+		ctx := context.Background()
 		entries := []*domain.LogEntry{
 			{ID: uuid.New(), Message: "test 1"},
 		}
 		mockRepo.On("Create", mock.Anything, entries).Return(nil).Once()
 		err := svc.IngestLogs(ctx, entries)
 		assert.NoError(t, err)
+		mockRepo.AssertExpectations(t)
 	})
 	// ... repeat for each subtest ...
-
-	mockRepo.AssertExpectations(t)
 }

As per coding guidelines, "Use table-driven tests in test files".


85-100: Consider adding error and edge-case scenarios as table-driven tests.

This only covers the happy path. Adding cases for repository errors and empty results in a table-driven format would improve coverage and align with guidelines.

As per coding guidelines, "Use table-driven tests in test files".


102-112: Magic number 30 — use a named constant.

The value 30 appears on lines 107 and 109 without explanation. Define a constant (e.g., defaultRetentionDays) to clarify intent and comply with guidelines.

♻️ Suggested fix
+const defaultRetentionDays = 30
+
 func TestCloudLogsService_RunRetentionPolicy_Unit(t *testing.T) {
 	mockRepo := new(MockLogRepository)
 	svc := services.NewCloudLogsService(mockRepo, slog.Default())
 	ctx := context.Background()
 
-	mockRepo.On("DeleteByAge", mock.Anything, 30).Return(nil)
+	mockRepo.On("DeleteByAge", mock.Anything, defaultRetentionDays).Return(nil)
 
-	err := svc.RunRetentionPolicy(ctx, 30)
+	err := svc.RunRetentionPolicy(ctx, defaultRetentionDays)
 	assert.NoError(t, err)
 	mockRepo.AssertExpectations(t)
 }

As per coding guidelines, "Do not use magic numbers - use named constants instead".


48-50: Error from resource.New is silently discarded.

Line 49 ignores the error return. Even in tests, this can mask setup failures leading to confusing test results. Use require.NoError to fail fast if resource creation fails unexpectedly.

♻️ Suggested fix
-		res, _ := resource.New(ctx, resource.WithAttributes(semconv.ServiceNameKey.String("test")))
+		res, err := resource.New(ctx, resource.WithAttributes(semconv.ServiceNameKey.String("test")))
+		require.NoError(t, err)

As per coding guidelines, "Do not use silent failures - avoid blank identifier assignment like _ = someFunc()".

Comment thread docs/api-reference.md
Comment on lines +519 to +524
**Parameters:**
| Name | In | Type | Required | Description |
|------|----|------|----------|-------------|
| `resource_id` | path | string | Yes | The UUID of the resource (instance or function) |
| `limit` | query | integer | No | Max logs to return |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a blank line before the parameters table.

Missing blank line before the table on line 520 triggers MD058.

Proposed fix
 **Parameters:**
+
 | Name | In | Type | Required | Description |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 520-520: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🤖 Prompt for AI Agents
In `@docs/api-reference.md` around lines 519 - 524, Add a blank line immediately
before the "Parameters:" heading/table so the Markdown has a full empty line
separating the preceding paragraph from the table (fixes MD058); specifically
insert one empty line above the line that starts "**Parameters:**" in the
snippet containing the table of `resource_id` and `limit`.

Comment thread docs/cli-reference.md
Comment on lines +867 to +876
**Flags**:
| Flag | Default | Description |
|------|---------|-------------|
| `--resource-id` | - | Filter by resource UUID |
| `--resource-type` | - | Type (`instance`, `function`) |
| `--level` | - | Severity (`INFO`, `WARN`, `ERROR`) |
| `--query` | - | Keyword search in message |
| `--limit` | `100` | Max logs to show |
| `--offset` | `0` | Pagination offset |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a blank line before the flags table to fix markdown lint warning.

The table on line 868 immediately follows the **Flags**: line without a blank line separator, which triggers MD058 (tables should be surrounded by blank lines).

Proposed fix
 **Flags**:
+
 | Flag | Default | Description |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Flags**:
| Flag | Default | Description |
|------|---------|-------------|
| `--resource-id` | - | Filter by resource UUID |
| `--resource-type` | - | Type (`instance`, `function`) |
| `--level` | - | Severity (`INFO`, `WARN`, `ERROR`) |
| `--query` | - | Keyword search in message |
| `--limit` | `100` | Max logs to show |
| `--offset` | `0` | Pagination offset |
**Flags**:
| Flag | Default | Description |
|------|---------|-------------|
| `--resource-id` | - | Filter by resource UUID |
| `--resource-type` | - | Type (`instance`, `function`) |
| `--level` | - | Severity (`INFO`, `WARN`, `ERROR`) |
| `--query` | - | Keyword search in message |
| `--limit` | `100` | Max logs to show |
| `--offset` | `0` | Pagination offset |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 868-868: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🤖 Prompt for AI Agents
In `@docs/cli-reference.md` around lines 867 - 876, Add a blank line between the
"**Flags**:" heading and the following table (the pipe-delimited table that
lists `--resource-id`, `--resource-type`, etc.) so the table is separated by an
empty line and satisfies the MD058 markdown lint rule; update the section where
the "**Flags**:" header and the flags table are adjacent to insert that single
newline.

Comment thread internal/core/services/cloudlogs.go
Comment thread internal/handlers/log_handler.go
Comment on lines +22 to +44
func (r *LogRepository) Create(ctx context.Context, entries []*domain.LogEntry) error {
if len(entries) == 0 {
return nil
}

query := `INSERT INTO log_entries (id, tenant_id, resource_id, resource_type, level, message, timestamp, trace_id) VALUES `
values := []interface{}{}
placeholders := []string{}

for i, entry := range entries {
offset := i * 8
placeholders = append(placeholders, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d, $%d, $%d)",
offset+1, offset+2, offset+3, offset+4, offset+5, offset+6, offset+7, offset+8))
values = append(values, entry.ID, entry.TenantID, entry.ResourceID, entry.ResourceType, entry.Level, entry.Message, entry.Timestamp, entry.TraceID)
}

query += strings.Join(placeholders, ",")
_, err := r.db.Exec(ctx, query, values...)
if err != nil {
return errors.Wrap(errors.Internal, "failed to batch create log entries", err)
}
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check if there's any chunking or size limiting upstream before Create is called
echo "=== Searching for Create() calls with chunking/limiting context ==="
rg -n 'Create\(' --type=go -C 5 -g '!*_test.go' | head -100

Repository: poyrazK/thecloud

Length of output: 6200


🏁 Script executed:

#!/bin/bash
# Check the LogRepository interface definition
echo "=== Looking for LogRepository interface and Port definitions ==="
rg -n 'type LogRepository' --type=go -A 10
rg -n 'interface.*Log' --type=go -A 5

Repository: poyrazK/thecloud

Length of output: 2043


🏁 Script executed:

#!/bin/bash
# Verify PostgreSQL parameter limit fact
echo "=== Checking PostgreSQL documentation references in code ==="
rg -n '65535|parameter.*limit|16384' --type=go

Repository: poyrazK/thecloud

Length of output: 588


🏁 Script executed:

#!/bin/bash
# Check the full CloudLogsService.Log() implementation
echo "=== CloudLogsService.Log() implementation ==="
rg -n 'func.*Log' internal/core/services/cloudlogs.go -A 15 | head -50

Repository: poyrazK/thecloud

Length of output: 1222


🏁 Script executed:

#!/bin/bash
# Check where entries come from - look at callers of CloudLogsService.Log
echo "=== Who calls CloudLogsService.Log() ==="
rg -n 'auditSvc\.Log\|logs\.Log\|logService\.Log' --type=go -B 2 -A 2 | head -80

Repository: poyrazK/thecloud

Length of output: 100


🌐 Web query:

PostgreSQL maximum number of parameters per query limit

💡 Result:

PostgreSQL’s hard limit for query parameters (bind parameters like $1, $2, … in prepared/extended-protocol statements) is 65,535. This limit exists because the frontend/backend protocol encodes the parameter count as a 16‑bit unsigned integer. [1], [2]

(Separate limit: function arguments default to 100, which is different from query parameters.) [1]


Add batch size validation to prevent PostgreSQL parameter overflow.

PostgreSQL enforces a hard limit of 65,535 parameters per query. With 8 parameters per log entry, Create() will fail at runtime for batches exceeding ~8,191 entries. Neither CloudLogsService.IngestLogs() nor LogRepository.Create() currently enforces any batch size limit.

Implement chunking in LogRepository.Create() to split large batches, or document this constraint in the interface if batch sizes are guaranteed to be small upstream.

🤖 Prompt for AI Agents
In `@internal/repositories/postgres/log_repo.go` around lines 22 - 44,
LogRepository.Create currently builds one huge INSERT using 8 parameters per
entry which can exceed PostgreSQL's 65,535 parameter limit; modify
LogRepository.Create to chunk the entries into safe-sized batches
(maxEntriesPerBatch = floor(65535 / 8) == 8191) and execute the INSERT per
chunk. Implement a loop that slices entries into chunks of at most
maxEntriesPerBatch, constructs placeholders and values for each chunk (same
logic currently used with fmt.Sprintf and values append), calls r.db.Exec(ctx,
query, values...) for each chunk, and returns the first non-nil wrapped error
(using errors.Wrap(errors.Internal, ...)) if any execution fails; keep existing
behavior for empty input and preserve the same query/placeholder format and
error messages.

Comment on lines +95 to +98
limit := query.Limit
if limit <= 0 {
limit = 100
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Magic number 100 for default limit — use a named constant.

Per coding guidelines, magic numbers should be replaced with named constants.

Proposed fix
+const DefaultLogLimit = 100
+
 ...
 	limit := query.Limit
 	if limit <= 0 {
-		limit = 100
+		limit = DefaultLogLimit
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
limit := query.Limit
if limit <= 0 {
limit = 100
}
const DefaultLogLimit = 100
limit := query.Limit
if limit <= 0 {
limit = DefaultLogLimit
}
🤖 Prompt for AI Agents
In `@internal/repositories/postgres/log_repo.go` around lines 95 - 98, Replace the
magic number 100 used as the fallback for query.Limit with a named constant
(e.g., defaultQueryLimit or DefaultLimit) declared at the top of the package;
update the code in the function that sets limit (the limit := query.Limit block)
to use that constant instead of 100 and ensure the constant has a clear name and
brief comment so callers/readers understand it represents the default result
limit for the Postgres repository.

Comment on lines +21 to +28
func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker {
return &LogWorker{
logSvc: logSvc,
logger: logger,
interval: 24 * time.Hour, // Run retention once a day
retentionDays: 30, // Default 30 days retention
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Magic numbers for interval and retentionDays — use named constants or accept as constructor parameters.

Lines 25-26 hardcode 24 * time.Hour and 30 without named constants, and these values aren't configurable. At minimum, extract them as package-level constants. Ideally, accept them as constructor parameters for testability and operational flexibility.

Proposed fix (named constants + configurable constructor)
+const (
+	DefaultRetentionInterval = 24 * time.Hour
+	DefaultRetentionDays     = 30
+)
+
 func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker {
 	return &LogWorker{
 		logSvc:        logSvc,
 		logger:        logger,
-		interval:      24 * time.Hour,
-		retentionDays: 30,
+		interval:      DefaultRetentionInterval,
+		retentionDays: DefaultRetentionDays,
 	}
 }

As per coding guidelines, "Do not use magic numbers - use named constants instead".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker {
return &LogWorker{
logSvc: logSvc,
logger: logger,
interval: 24 * time.Hour, // Run retention once a day
retentionDays: 30, // Default 30 days retention
}
}
const (
DefaultRetentionInterval = 24 * time.Hour
DefaultRetentionDays = 30
)
func NewLogWorker(logSvc ports.LogService, logger *slog.Logger) *LogWorker {
return &LogWorker{
logSvc: logSvc,
logger: logger,
interval: DefaultRetentionInterval,
retentionDays: DefaultRetentionDays,
}
}
🤖 Prompt for AI Agents
In `@internal/workers/log_worker.go` around lines 21 - 28, The NewLogWorker
constructor currently hardcodes magic numbers for interval and retentionDays;
update it to either (1) extract package-level named constants (e.g.,
DefaultLogWorkerInterval, DefaultRetentionDays) and use those instead of 24 *
time.Hour and 30, or (2) preferably add parameters to NewLogWorker (e.g.,
interval time.Duration, retentionDays int) and set LogWorker.interval and
LogWorker.retentionDays from those arguments while falling back to the named
defaults when zero-valued; reference the NewLogWorker function and the
LogWorker.interval and LogWorker.retentionDays fields to locate where to apply
the change for testability and configurability.

Comment thread pkg/sdk/logs.go
Copilot AI review requested due to automatic review settings February 15, 2026 14:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/logs_e2e_test.go
Comment on lines +26 to +51
instanceName := fmt.Sprintf("e2e-logs-%d", time.Now().UnixNano()%1000)

// 1. Launch Instance
t.Run("LaunchInstance", func(t *testing.T) {
payload := map[string]interface{}{
"name": instanceName,
"image": "alpine",
"cmd": []string{"sh", "-c", "echo 'hello-cloudlogs-e2e'"},
}
resp := postRequest(t, client, testutil.TestBaseURL+testutil.TestRouteInstances, token, payload)
defer func() { _ = resp.Body.Close() }()

require.Equal(t, http.StatusAccepted, resp.StatusCode)

var res struct {
Data struct {
ID string `json:"id"`
} `json:"data"`
}
require.NoError(t, json.NewDecoder(resp.Body).Decode(&res))
instanceID = res.Data.ID
})

// 2. Wait for it to finish/run (best effort)
time.Sleep(5 * time.Second)

Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This E2E test relies on fixed time.Sleep delays and uses UnixNano()%1000 for the instance name, which can make runs flaky and potentially collide across parallel executions. Prefer a unique identifier (e.g., UUID) and polling with a timeout for the expected state/log availability instead of fixed sleeps.

Copilot uses AI. Check for mistakes.
Comment thread docs/prds/cloudlogs.md
Comment on lines +12 to +16
- [x] Implement a `CloudLogs` service following the hexagonal architecture.
- [x] Support log ingestion from `InstanceService` and `FunctionService`.
- [x] Provide a persistent storage backend (PostgreSQL initially, with a port for Loki).
- [x] Implement API endpoints for searching and filtering logs by `ResourceID`, `TenantID`, and `TimeRange`.
- [x] Add log rotation/retention policy (e.g., auto-delete logs older than X days).
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/docs claim log ingestion is supported from FunctionService, but in the current code changes ingestion is only wired from InstanceService.TerminateInstance (no IngestLogs call from FunctionService was found). Either update the implementation to ingest function logs too, or adjust the PR description/docs to match the current scope.

Copilot uses AI. Check for mistakes.
);

CREATE INDEX idx_logs_tenant_resource ON log_entries(tenant_id, resource_id);
CREATE INDEX idx_logs_timestamp ON log_entries(timestamp);
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query path in LogRepository.List always filters by tenant_id, but the time index here is only on timestamp. For multi-tenant workloads, consider adding a composite index like (tenant_id, timestamp DESC) (and potentially keep/remove the single-column timestamp index based on query plans) to avoid scanning across all tenants for time-range queries.

Suggested change
CREATE INDEX idx_logs_timestamp ON log_entries(timestamp);
CREATE INDEX idx_logs_timestamp ON log_entries(timestamp);
CREATE INDEX idx_logs_tenant_timestamp_desc ON log_entries(tenant_id, timestamp DESC);

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +120
limit, _ := strconv.Atoi(c.DefaultQuery("limit", "100"))
query := domain.LogQuery{
TenantID: tenantID,
ResourceID: id,
Limit: limit,
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit parsing errors are ignored (limit, _ := strconv.Atoi(...)), so invalid values quietly become 0 and change behavior (repo will default to 100). This should return an InvalidInput error (and also reject negative/zero limits) to match the stricter validation used in Search.

Copilot uses AI. Check for mistakes.
Comment on lines +641 to +667
// Ingest logs before termination if LogService is available
if s.logSvc != nil && inst.ContainerID != "" {
logs, err := s.GetInstanceLogs(ctx, idOrName)
if err == nil && logs != "" {
lines := strings.Split(logs, "\n")
entries := make([]*domain.LogEntry, 0, len(lines))
for _, line := range lines {
if strings.TrimSpace(line) == "" {
continue
}
entries = append(entries, &domain.LogEntry{
ID: uuid.New(),
TenantID: inst.TenantID,
ResourceID: inst.ID.String(),
ResourceType: "instance",
Level: "INFO",
Message: line,
Timestamp: time.Now(),
})
}
if len(entries) > 0 {
if ingestErr := s.logSvc.IngestLogs(ctx, entries); ingestErr != nil {
s.logger.Warn("failed to ingest logs during termination", "instance_id", inst.ID, "error", ingestErr)
}
}
}
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ingests logs by splitting the full container output into one DB row per line and then inserting them as a single huge INSERT ... VALUES (...) statement. For chatty containers this can create very large allocations and can exceed PostgreSQL’s parameter limit (8 params/row) or max query size, potentially breaking instance termination. Consider chunking inserts (e.g., batches of N lines) and/or using pgx CopyFrom/SendBatch, and consider capping how many lines/bytes are persisted per termination.

Copilot uses AI. Check for mistakes.
WillReturnResult(pgxmock.NewResult("INSERT", 1))

err = repo.Create(context.Background(), entries)
assert.NoError(t, err)
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets an ExpectExec on the mock but never calls mock.ExpectationsWereMet(). Without that assertion, the test can pass even if Create never executed the expected query. Consider adding assert.NoError(t, mock.ExpectationsWereMet()) at the end of the successful batch case (similar to the other tests in this file).

Suggested change
assert.NoError(t, err)
assert.NoError(t, err)
assert.NoError(t, mock.ExpectationsWereMet())

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +86
// Pagination
limitStr := c.DefaultQuery("limit", "100")
limit, err := strconv.Atoi(limitStr)
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, "invalid limit; must be a number"))
return
}

offsetStr := c.DefaultQuery("offset", "0")
offset, err := strconv.Atoi(offsetStr)
if err != nil {
httputil.Error(c, errors.New(errors.InvalidInput, "invalid offset; must be a number"))
return
}

query.Limit = limit
query.Offset = offset
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit and offset accept negative (or zero) values. When limit <= 0, the repository silently defaults to 100, but the handler returns the user-supplied value in the response (query.Limit), which can misrepresent what was actually applied. Validate limit (>=1, maybe with a max) and offset (>=0) and normalize query.Limit/query.Offset to the effective values before calling the service and building the response.

Copilot uses AI. Check for mistakes.
@poyrazK poyrazK merged commit a163641 into main Feb 15, 2026
43 checks passed
@poyrazK poyrazK deleted the feature/cloudlogs branch April 12, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants