Skip to content

Refactor: Simplify checkpoint path and Job-based TUI history#83

Merged
FL4TLiN3 merged 4 commits intoepic/job-conceptfrom
feat/job-persistence
Dec 9, 2025
Merged

Refactor: Simplify checkpoint path and Job-based TUI history#83
FL4TLiN3 merged 4 commits intoepic/job-conceptfrom
feat/job-persistence

Conversation

@FL4TLiN3
Copy link
Contributor

@FL4TLiN3 FL4TLiN3 commented Dec 9, 2025

Summary

  • Simplify checkpoint storage: jobs/{jobId}/checkpoints/{id}.json (removed timestamp from filename)
  • --resume-from now strictly requires --continue-job <jobId>
  • TUI history displays Jobs instead of Runs
  • Show jobId and checkpointId in TUI for easier CLI usage

Changes

Checkpoint Storage

  • Path simplified from jobs/{jobId}/runs/{runId}/checkpoint-{timestamp}-{step}-{id}.json to jobs/{jobId}/checkpoints/{id}.json
  • Direct path construction without directory scanning

CLI

  • --resume-from <checkpointId> requires --continue-job <jobId>
  • Updated docs and E2E tests

TUI

  • History shows Job list instead of Run list
  • Displays: expertKey - {totalSteps} steps ({jobId}) (startedAt)
  • Checkpoint view shows: Step {stepNumber} ({checkpointId})

Closes #81
Closes #82


Note

Switch TUI history to Jobs and simplify checkpoint storage to job-level files; enforce --resume-from requires --continue-job, with runtime/store APIs and docs/tests updated.

  • Runtime/Storage
    • Simplify checkpoints to perstack/jobs/<jobId>/checkpoints/<checkpointId>.json (remove timestamp/run folder for checkpoints).
    • Add job store (job.json per job) with helpers to create/retrieve/list and track status/usage; update run() to persist job lifecycle and usage on stop/complete.
    • Expose new runtime APIs: getCheckpointPath, getCheckpointsByJobId, getEventsByRun, getEventContents, getAllJobs, getAllRuns.
    • Update checkpoint/event retrieval and writing to use new paths and sync helpers; update executeStateMachine storeCheckpoint signature (drops timestamp).
  • Perstack (CLI integration)
    • resolveRunContext now requires --continue-job when using --resume-from; uses new getCheckpointById(jobId, checkpointId) and latest checkpoint by job.
    • run-manager refactored to use runtime-provided getters and new checkpoint/job APIs.
    • start command TUI wiring updated to load jobs, checkpoints, and events via new APIs.
  • TUI
    • Switch history UI from Runs to Jobs (JobHistoryItem); update components, state, actions, and types accordingly.
    • Checkpoint list shows Step {stepNumber} ({checkpointId}); history row shows {expertKey} - {totalSteps} steps ({jobId}).
  • Docs/Tests
    • Docs updated to state --resume-from requires --continue-job and to reflect job/checkpoint model.
    • E2E test asserts new error message for invalid --resume-from.

Written by Cursor Bugbot for commit 7b8b47f. This will update automatically on new commits. Configure here.

- Simplify checkpoint storage path to jobs/{jobId}/checkpoints/{id}.json
- Remove timestamp from checkpoint filename
- Update --resume-from to strictly require --continue-job
- Change TUI history from Run-based to Job-based display
- Show jobId and checkpointId in TUI for easier CLI usage

Closes #81
Closes #82
@vercel
Copy link

vercel bot commented Dec 9, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
perstack Ignored Ignored Preview Dec 9, 2025 8:48am

@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 11.22449% with 87 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
packages/runtime/src/default-store.ts 21.42% 33 Missing ⚠️
packages/runtime/src/job-store.ts 6.45% 29 Missing ⚠️
packages/runtime/src/run-setting-store.ts 0.00% 24 Missing ⚠️
packages/runtime/src/execute-state-machine.ts 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

})
job = {
...job,
totalSteps: job.totalSteps + runResultCheckpoint.stepNumber,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Job totalSteps double-counts steps across iterations

The totalSteps calculation incorrectly adds runResultCheckpoint.stepNumber to job.totalSteps on each iteration of the while loop. Since stepNumber is cumulative within a job (preserved across delegations and incremented by createNextStepCheckpoint), this causes double-counting. For example, if a run completes at step 3, then delegates and completes at step 5, totalSteps becomes 0 + 3 + 5 = 8 instead of the correct 5. The calculation should either set totalSteps directly to runResultCheckpoint.stepNumber or compute the delta between initial and result step numbers.

Fix in Cursor Fix in Web

job = {
...job,
totalSteps: job.totalSteps + runResultCheckpoint.stepNumber,
usage: sumUsage(job.usage, runResultCheckpoint.usage),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Job usage double-counts tokens across iterations

The usage calculation has the same double-counting issue as totalSteps. The checkpoint's usage is cumulative (accumulated via sumUsage in the state machine and preserved when createNextStepCheckpoint spreads the previous checkpoint). Adding runResultCheckpoint.usage to job.usage on each iteration causes token counts to be double-counted during delegation or continuation. This would result in inflated inputTokens, outputTokens, and totalTokens values in the job's usage tracking.

Fix in Cursor Fix in Web

- Add getAllJobs() to job-store.ts
- Add getAllRuns() to run-setting-store.ts
- Add getCheckpointsByJobId(), getEventsByRun(), getEventContents() to default-store.ts
- Export new functions from runtime index
- Simplify run-manager.ts to delegate to runtime functions
stepNumber and usage in checkpoints are cumulative within a Job,
so directly assign instead of summing to avoid double-counting.
@FL4TLiN3 FL4TLiN3 merged commit 7cb873b into epic/job-concept Dec 9, 2025
9 checks passed
@FL4TLiN3 FL4TLiN3 deleted the feat/job-persistence branch December 9, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant