Skip to content

improvement(logs): object storage backed tracespans#4787

Merged
icecrasher321 merged 13 commits into
stagingfrom
feat/trace-spans-s3-usage-log-cost
May 29, 2026
Merged

improvement(logs): object storage backed tracespans#4787
icecrasher321 merged 13 commits into
stagingfrom
feat/trace-spans-s3-usage-log-cost

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

@icecrasher321 icecrasher321 commented May 29, 2026

Summary

Object storage backed execution log trace spans to keep table lean

Type of Change

  • Other: Performance

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 29, 2026 8:17pm

Request Review

@icecrasher321 icecrasher321 marked this pull request as ready for review May 29, 2026 07:00
@cursor
Copy link
Copy Markdown

cursor Bot commented May 29, 2026

PR Summary

High Risk
Changes touch billing reconciliation, usage_log writes, invoice rollover, and many read paths for execution cost/data; regressions could mis-bill, show wrong costs, or fail to load traces after externalization.

Overview
This PR externalizes heavy workflow execution data (trace spans and related payloads) to object storage on completion, keeping slim execution_data rows with a pointer. Read paths (logs APIs, export, v1 logs, copilot tools, notifications, data drains) call materializeExecutionData with bounded concurrency so large batches do not fan out unbounded storage reads; read-only materialization skips re-registering large-value references.

Billing and cost display shift to the usage_log ledger as the source of truth: workflow logs use costTotal / modelsUsed projections instead of the deprecated cost JSONB. Completion recordExecutionUsage reconciles ledger rows per execution (pause/resume-safe deltas, advisory lock, optional tx on recordUsage), records standalone tool charges, strips per-span costs from persisted traces, and refines cost_total to the ledger sum. Org/admin/billing APIs add per-member usage_log totals to currentPeriodCost; copilot totals fold in COPILOT_USAGE_SOURCES from the ledger; invoice period rollover includes copilot ledger in lastPeriodCopilotCost.

The log details UI shows an itemized costLedger breakdown with apportionCredits so line items match the run total; the Trace tab uses the authoritative run cost instead of per-span costs. Admin/seat APIs drop lastActive-based activity and legacy lifetime usage counters. A v1 logs limit is softly capped at 1000.

Reviewed by Cursor Bugbot for commit 0014218. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR moves heavy execution_data (traceSpans, finalOutput, workflowInput) out of the workflow_execution_logs table into object storage, keeping only a slim pointer + inline markers on the row. It also migrates cost tracking from a live-updated cost JSONB column to a cost_total decimal projection of the usage_log ledger, which now includes an idempotent, advisory-lock-protected reconciliation step that correctly handles pause/resume without double-charging.

  • Schema + migration: adds cost_total (decimal), models_used (text[]) columns with supporting indexes; backfills them via a looping batched SQL procedure; adds a separate one-shot TypeScript backfill script for externalizing existing trace spans.
  • Write path: finalizeLog calls stripSpanCostsexternalizeExecutionData → advisory-locked recordExecutionUsage that reconciles to the usage_log ledger and writes the exact cost_total atomically.
  • Read path: every API route and data-drain calls materializeExecutionData, which resolves the pointer transparently and degrades gracefully to metadata-only after object expiry.

Confidence Score: 5/5

Safe to merge. The write path ordering is correct, the GREATEST guard prevents cost regression on resume, and all read-path routes properly materialize externalized data.

The externalization, advisory-locked ledger reconciliation, and GREATEST-guarded cost_total projection are internally consistent and correct. All API routes and data-drain paths that need full execution payload call materializeExecutionData. The migration batching procedure and CONCURRENTLY indexes follow standard PostgreSQL patterns.

apps/sim/lib/billing/core/usage-log.ts (inArray empty-array guard), apps/sim/scripts/backfill-trace-spans.ts (orphan-on-tx-failure for one-shot script)

Important Files Changed

Filename Overview
apps/sim/lib/logs/execution/trace-store.ts New module: implements externalize/materialize round-trip; fallback-on-error for both paths; inline marker preservation is correct.
apps/sim/lib/logs/execution/logger.ts Rewrites cost recording to advisory-lock-protected usage_log reconciliation; stripSpanCosts ordering correct; GREATEST + conditional modelsUsed addresses prior resume regression.
apps/sim/lib/logs/fetch-log-detail.ts Adds buildCostLedger and materializeExecutionData; Math.max for token aggregation is correct since metadata carries cumulative counts.
packages/db/migrations/0220_early_hellion.sql Adds cost_total/models_used columns, CONCURRENTLY indexes, batching backfill procedure; COMMIT before ALTER TYPE ADD VALUE is correct.
apps/sim/scripts/backfill-trace-spans.ts One-shot idempotent backfill; keyset cursor avoids infinite loops; object can be orphaned if DB transaction fails after externalization succeeds.
apps/sim/lib/billing/core/usage-log.ts Adds deriveBillingContext, COPILOT_USAGE_SOURCES, source-filter overload; inArray with empty array would produce invalid SQL but no current callers pass one.
apps/sim/lib/logs/execution/logging-factory.ts calculateCostSummary now returns charges for non-model billable spans; BASE_EXECUTION_CHARGE handling unchanged and correct.
apps/sim/lib/copilot/tools/server/workflow/get-execution-summary.ts Uses unbounded Promise.all (up to 20 rows) for materialization instead of mapWithConcurrency pattern used elsewhere.
apps/sim/lib/core/utils/concurrency.ts New bounded-concurrency mapWithConcurrency utility; correct cursor-based worker implementation with well-documented contract.
apps/sim/lib/data-drains/sources/workflow-logs.ts Addresses prior review comment: uses mapWithConcurrency return value and index loop write-back instead of in-place mutations.

Sequence Diagram

sequenceDiagram
    participant Exec as Execution Engine
    participant Session as LoggingSession
    participant Logger as ExecutionLogger
    participant DB as PostgreSQL
    participant ObjStore as Object Storage
    participant UsageLog as usage_log ledger

    Exec->>Session: completedRun(traceSpans, finalOutput)
    Session->>Session: calculateCostSummary(traceSpans)
    Session->>Logger: completeExecutionWithFinalization(costSummary, traceSpans)

    Logger->>Logger: stripSpanCosts(traceSpans)
    Logger->>ObjStore: externalizeExecutionData(executionData)
    ObjStore-->>Logger: slim row (traceStoreRef + markers)

    Logger->>DB: "UPDATE executionData=slim, costTotal=GREATEST(...), modelsUsed"

    Logger->>DB: BEGIN advisory lock (pg_advisory_xact_lock)
    DB->>UsageLog: SELECT SUM already-billed for executionId
    UsageLog-->>DB: alreadyBilled map
    DB->>UsageLog: INSERT delta entries (onConflictDoNothing)
    DB->>DB: "UPDATE costTotal=exact ledger sum"
    Logger->>DB: COMMIT

    Exec->>Logger: GET /api/logs/execution/:id
    Logger->>DB: SELECT executionData (slim row)
    Logger->>ObjStore: materializeExecutionData
    ObjStore-->>Logger: full executionData
    Logger-->>Exec: WorkflowExecutionLog (with traceSpans)
Loading

Reviews (6): Last reviewed commit: "incorrect type cast" | Re-trigger Greptile

Comment thread apps/sim/lib/data-drains/sources/workflow-logs.ts Outdated
Comment thread apps/sim/lib/logs/execution/logging-session.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/lib/logs/fetch-log-detail.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

icecrasher321 and others added 4 commits May 29, 2026 12:07
Drops the 0219_robust_shard SQL, its snapshot, and the journal entry so the
trace-spans/cost schema migration can be regenerated on top of the latest
staging migration chain (avoids a number collision with staging's migrations).

Co-authored-by: Cursor <cursoragent@cursor.com>
Per-member/per-user usage in the org-member routes now adds the usage_log
ledger to the currentPeriodCost baseline (which is no longer incremented),
via a shared getOrgMemberLedgerByUser helper to avoid repeating the
subscription→period→ledger lookup across the admin and member-facing routes.

Co-authored-by: Cursor <cursoragent@cursor.com>
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/app/api/billing/route.ts
Comment thread apps/sim/lib/core/utils/concurrency.ts
Comment thread apps/sim/app/api/logs/export/route.ts
Comment thread apps/sim/app/api/v1/logs/route.ts Outdated
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/app/api/logs/route.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/app/api/organizations/[id]/members/route.ts Outdated
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 0014218. Configure here.

@icecrasher321 icecrasher321 merged commit 4967305 into staging May 29, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the feat/trace-spans-s3-usage-log-cost branch May 30, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant