Skip to content

fix: recover orphaned jobs on startup with error message (#100)#102

Merged
myakove merged 4 commits into
mainfrom
fix/issue-100-recover-orphaned-jobs
Jun 5, 2026
Merged

fix: recover orphaned jobs on startup with error message (#100)#102
myakove merged 4 commits into
mainfrom
fix/issue-100-recover-orphaned-jobs

Conversation

@myakove

@myakove myakove commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Closes #100

Summary

Extend mark_stale_results_failed() to set proper error message and completed_at on orphaned pending/running jobs during server startup.

Changes

  • storage.py: New error column migration. Module-level ORPHAN_STATUSES and RESTART_ERROR_MSG constants. mark_stale_results_failed() returns (waiting_jobs, recovered_jobs) tuple, sets error + completed_at on all failed rows. Dashboard and detail views expose error field.
  • main.py: Per-job INFO log for recovered orphaned jobs.
  • Tests: Enhanced existing tests + error/completed_at assertions.

3 files, +82/-22 lines. All tests pass.

@myakove-bot

Copy link
Copy Markdown
Collaborator

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: Disabled for this repository
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: All label categories are enabled (default configuration)

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /automerge - Enable automatic merging when all requirements are met (maintainers and approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest python-module-install - Test Python package installation
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. Status Checks: All required status checks must pass
  3. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  4. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

  • myakove

Reviewers:

  • myakove
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
  • automerge
AI Features
  • Conventional Title: Mode: fix (claude/claude-opus-4-6[1m])
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@qodo-code-review

qodo-code-review Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (2) 🎨 UX issues (0)

Grey Divider


Action required

1. ResultResponse missing error ✓ Resolved 📘 Rule violation ≡ Correctness
Description
StatusPage now accesses res.error, but the ResultResponse TypeScript interface does not define
an error field. With strict type-checking enabled, this will cause a TypeScript compile failure
(e.g., "Property 'error' does not exist on type 'ResultResponse'") and can break the frontend
portion of tox/builds.
Code

frontend/src/pages/StatusPage.tsx[R128-132]

+          setError(res.error || res.result?.error || 'Analysis failed')
          return 'terminal'
        } else if (res.status === 'aborted') {
          setTerminalErrorKind('aborted')
-          setError(res.result?.error ?? 'Analysis was aborted')
+          setError(res.error || res.result?.error || 'Analysis was aborted')
Evidence
PR Compliance ID 9 requires the full tox suite (including the frontend build/tests) to pass, and the
updated StatusPage code reads res.error while the only ResultResponse interface definition
lacks an error property. Under strict TypeScript settings, accessing an undeclared property is a
type error, so the frontend typecheck/build step typically run in tox will fail unless the interface
is updated or unsafe typing is introduced.

AGENTS.md: All Tests Must Pass via Tox (Backend + Frontend)
frontend/src/pages/StatusPage.tsx[123-134]
frontend/src/types/index.ts[306-329]
frontend/src/types/index.ts[304-329]
frontend/tsconfig.app.json[1-32]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`frontend/src/pages/StatusPage.tsx` uses `res.error`, but `ResultResponse` (defined in `frontend/src/types/index.ts` and imported from `@/types`) does not include an `error` property. With `strict: true`, this mismatch is a compile-time TypeScript error and will fail the frontend build.

## Issue Context
Backend responses now include a top-level `error`, and the Status page is correctly trying to display it (preferring `res.error`). The frontend type definition must be updated to match the API response shape; add an `error` field to `ResultResponse` (e.g., `error: string` if always present, or `error?: string` if optional) and ensure callers handle `undefined` if it’s optional.

## Fix Focus Areas
- frontend/src/pages/StatusPage.tsx[128-132]
- frontend/src/types/index.ts[306-329]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Restart error ignored ✓ Resolved 🐞 Bug ≡ Correctness
Description
mark_stale_results_failed() writes the restart failure explanation only into the new
results.error column (exposed by storage.get_result() as a top-level error), but the Status
page and frontend ResultResponse still read only res.result?.error from result_json, while
most normal failure paths never populate results.error. As a result, orphaned jobs can show a
generic "Analysis failed" instead of the restart reason, and API consumers relying on the new
top-level error may see it empty even when result_json contains an error message.
Code

src/rootcoz/storage.py[R2721-2726]

+        now = datetime.now(UTC).isoformat()
        cursor = await db.execute(
-            "UPDATE results SET status = 'failed' "
-            "WHERE status IN ('pending', 'running')"
+            f"UPDATE results SET status = 'failed', "
+            f"error = ?, completed_at = ? "
+            f"WHERE status IN ({placeholders})",
+            (RESTART_ERROR_MSG, now, *ORPHAN_STATUSES),
Evidence
The backend startup recovery path stores the restart message in results.error and
storage.get_result() surfaces that column as a new top-level error, but the StatusPage derives
its displayed error from res.result?.error and the frontend ResultResponse contract doesn’t
include a top-level error, so the restart message is ignored in the UI. Separately, existing
status/result writers (save_result/update_status) continue to update status, result_json, and
timestamps without writing the new results.error column, so get_result()’s new top-level error
will often be '' for typical failures unless get_result() falls back to parsed or writers are
updated to write-through.

src/rootcoz/storage.py[2689-2787]
src/rootcoz/storage.py[1072-1114]
src/rootcoz/main.py[3350-3373]
frontend/src/pages/StatusPage.tsx[118-133]
frontend/src/types/index.ts[306-329]
src/rootcoz/storage.py[823-918]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The backend now has a `results.error` column and `storage.get_result()` exposes it as a top-level `error`, but (a) the frontend StatusPage and `ResultResponse` type still only use `result_json` (`res.result?.error`) so restart-recovery failure explanations written to `results.error` aren’t shown, and (b) most non-restart failure paths still only write errors into `result_json`, leaving the new top-level `error` inconsistent/usually empty.

## Issue Context
- Startup recovery marks orphaned jobs failed and records the restart failure message in `results.error`, which is returned by `get_result()` as a top-level `error`.
- The StatusPage UI currently reads the error from `res.result?.error`, and the frontend `ResultResponse` type does not include the new top-level `error` field.
- `save_result()` / `update_status()` update status/result_json and timestamps but do not set `results.error`, and `get_result()` currently returns `row['error']` without falling back to `parsed.get('error')`.

## Fix Focus Areas
- frontend/src/pages/StatusPage.tsx[118-133]
- frontend/src/types/index.ts[306-329]
- src/rootcoz/storage.py[2689-2727]
- src/rootcoz/main.py[3350-3373]
- src/rootcoz/storage.py[823-918]
- src/rootcoz/storage.py[1072-1114]

## Suggested implementation direction
1. Extend the frontend `ResultResponse` type to include `error?: string`.
2. In StatusPage, prefer `res.error` when present, otherwise fall back to `res.result?.error`.
3. Make the backend top-level `error` consistent for all failures by choosing one (or both):
  - **Derive on read:** In `get_result()`, set top-level `error` to `row['error'] or (parsed.get('error') if isinstance(parsed, dict) else '')`.
  - **Write-through on update:** In `save_result()` / `update_status()`, when `result` contains an `error` and status becomes terminal (e.g., failed/aborted), also write `results.error = result['error']`.
4. (Optional hardening) If `row['error']` is non-empty and `parsed` is missing/has no `error`, inject it into the returned `result` dict as well to support older clients that only read `result.error`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Missing statuses in ORPHAN_STATUSES 📎 Requirement gap ≡ Correctness
Description
Startup recovery only treats pending/running as orphaned, so jobs in analyzing, peer_review,
or queued will not be transitioned to failed on restart and can remain stuck indefinitely.
Code

src/rootcoz/storage.py[R1922-1926]

+# Statuses whose background task is irrecoverably lost after a restart.
+# These are a subset of ACTIVE_STATUSES — "waiting" is excluded because
+# waiting jobs can be safely resumed.
+ORPHAN_STATUSES = ("pending", "running")
+RESTART_ERROR_MSG = "Analysis interrupted by server restart. Please re-submit."
Evidence
PR Compliance ID 1 requires startup recovery to fail jobs in statuses `{analyzing, peer_review,
queued}. The new code defines ORPHAN_STATUSES as only (pending, running)` and
mark_stale_results_failed() uses ORPHAN_STATUSES to select/update rows to failed, so the
required statuses are not covered.

Startup recovery marks analyzing/peer_review/queued jobs as failed
src/rootcoz/storage.py[1922-1926]
src/rootcoz/storage.py[2689-2726]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Startup recovery does not mark `analyzing`, `peer_review`, and `queued` jobs as `failed` because `ORPHAN_STATUSES` only includes `pending` and `running`.

## Issue Context
PR Compliance requires extending startup recovery to ensure jobs left in transient in-progress states (`analyzing`, `peer_review`, `queued`) are not stuck after a restart.

## Fix Focus Areas
- src/rootcoz/storage.py[1920-1926]
- src/rootcoz/storage.py[2689-2727]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. No tests for new statuses 📎 Requirement gap ☼ Reliability
Description
The updated startup-recovery tests cover pending and running but do not add coverage for the
required analyzing, peer_review, and queued scenarios, risking regressions and unverified
behavior.
Code

tests/test_storage.py[R718-748]

        """Pending jobs are marked failed on startup."""
        with patch.object(storage, "DB_PATH", setup_test_db):
            await storage.save_result("pending-1", "http://j/1", "pending")
-            waiting = await storage.mark_stale_results_failed()
+            waiting, recovered = await storage.mark_stale_results_failed()
            assert waiting == []
+            assert len(recovered) == 1
+            assert recovered[0] == {"job_id": "pending-1", "previous_status": "pending"}
            result = await storage.get_result("pending-1")
            assert result["status"] == "failed"
+            assert (
+                result["error"]
+                == "Analysis interrupted by server restart. Please re-submit."
+            )
+            assert result["completed_at"] is not None

    async def test_marks_running_as_failed(self, setup_test_db: Path) -> None:
        """Running jobs are marked failed on startup."""
        with patch.object(storage, "DB_PATH", setup_test_db):
            await storage.save_result("running-1", "http://j/2", "running")
-            waiting = await storage.mark_stale_results_failed()
+            waiting, recovered = await storage.mark_stale_results_failed()
            assert waiting == []
+            assert len(recovered) == 1
+            assert recovered[0] == {"job_id": "running-1", "previous_status": "running"}
            result = await storage.get_result("running-1")
            assert result["status"] == "failed"
+            assert (
+                result["error"]
+                == "Analysis interrupted by server restart. Please re-submit."
+            )
+            assert result["completed_at"] is not None
Evidence
PR Compliance ID 5 requires tests validating startup recovery for analyzing, peer_review, and
queued. The modified test suite only demonstrates recovery behavior for pending and running
(and waiting behaviors), with no scenarios added for the required additional statuses.

Test coverage for startup recovery scenarios and tox passes
tests/test_storage.py[717-748]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Test coverage for startup recovery does not include the required statuses `analyzing`, `peer_review`, and `queued`.

## Issue Context
PR Compliance requires tests that create jobs in these statuses and confirm startup recovery marks them `failed`, with `error` and `completed_at` asserted, while preserving existing `waiting` resume behavior.

## Fix Focus Areas
- tests/test_storage.py[714-873]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. Recovered jobs may be wrong 🐞 Bug ◔ Observability
Description
mark_stale_results_failed() populates recovered_jobs from a SELECT before performing the UPDATE,
so recovered_jobs can include rows that no longer match the UPDATE criteria when it runs.
lifespan() then logs each entry as “marked failed”, which can produce misleading recovery logs.
Code

src/rootcoz/storage.py[R2718-2725]

+            f"SELECT job_id, status FROM results WHERE status IN ({placeholders}) AND completed_at IS NULL",
+            ORPHAN_STATUSES,
+        )
+        orphaned_rows = await cursor.fetchall()
+        for row in orphaned_rows:
+            recovered_jobs.append(
+                {"job_id": row["job_id"], "previous_status": row["status"]}
+            )
Evidence
The function builds recovered_jobs from rows selected as orphaned, then separately updates rows to
failed. Because those are distinct statements, recovered_jobs is not guaranteed to correspond to
rows actually updated; nevertheless, lifespan() logs each recovered job as “marked failed”.

src/rootcoz/storage.py[2696-2734]
src/rootcoz/main.py[1234-1241]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`recovered_jobs` is currently derived from a pre-update SELECT, but the code later assumes those rows were successfully updated to `failed`. If any concurrent writer changes a row between the SELECT and UPDATE, the returned/logged `recovered_jobs` becomes inaccurate.

## Issue Context
This list is used by `src/rootcoz/main.py` to emit per-job INFO logs claiming the job was “marked failed”. The DB update itself is correctly constrained, but the returned list is not guaranteed to represent the rows actually updated.

## Fix Focus Areas
- Use an atomic transaction (e.g., `BEGIN IMMEDIATE`) around the SELECT+UPDATE so the selected rows cannot be modified by other writers before the UPDATE.
- Or, build `recovered_jobs` from rows proven updated (e.g., after UPDATE, select rows with `completed_at = <now>` and `error = <msg>` and intersect with the initially selected `job_id`→`previous_status` map).

- src/rootcoz/storage.py[2718-2733]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Orphan query ignores completed_at ✓ Resolved 📎 Requirement gap ≡ Correctness
Description
mark_stale_results_failed() selects/updates all pending/running rows without restricting to
completed_at IS NULL, which does not match the orphan definition in the compliance objective and
can incorrectly overwrite rows that already have a completion timestamp.
Code

src/rootcoz/storage.py[R2716-2733]

+        placeholders = ", ".join("?" for _ in ORPHAN_STATUSES)
+        cursor = await db.execute(
+            f"SELECT job_id, status FROM results WHERE status IN ({placeholders})",
+            ORPHAN_STATUSES,
+        )
+        orphaned_rows = await cursor.fetchall()
+        for row in orphaned_rows:
+            recovered_jobs.append(
+                {"job_id": row["job_id"], "previous_status": row["status"]}
+            )
+
+        # Mark orphaned jobs as failed with error message and completion timestamp
+        now = datetime.now(UTC).isoformat()
        cursor = await db.execute(
-            "UPDATE results SET status = 'failed' "
-            "WHERE status IN ('pending', 'running')"
+            f"UPDATE results SET status = 'failed', "
+            f"error = ?, completed_at = ? "
+            f"WHERE status IN ({placeholders})",
+            (RESTART_ERROR_MSG, now, *ORPHAN_STATUSES),
Evidence
The compliance objective defines orphaned jobs as pending/running with completed_at=None. The
PR code selects and updates by status only, with no completed_at IS NULL filter, so it may affect
rows outside the defined orphan set.

Fail orphaned pending/running jobs on startup with user-visible error message
src/rootcoz/storage.py[2716-2733]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Startup orphan recovery should only transition rows that are actually orphaned per compliance definition (`status in ('pending','running')` with `completed_at IS NULL`). The current SELECT/UPDATE targets all rows in those statuses regardless of `completed_at`, which can overwrite valid timestamps and error info.

## Issue Context
Compliance requires failing orphaned `pending`/`running` jobs specifically defined as having `completed_at=None`.

## Fix Focus Areas
- src/rootcoz/storage.py[2716-2733]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread src/rootcoz/storage.py
Comment thread tests/test_storage.py
Comment thread src/rootcoz/storage.py
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Recover orphaned jobs on startup with error message and timestamp

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add error column to results table with migration
• Mark orphaned pending/running jobs failed with error message and timestamp
• Return tuple of (waiting_jobs, recovered_jobs) from mark_stale_results_failed()
• Log per-job recovery details during startup for orphaned jobs
• Expose error field in dashboard and detail API responses
Diagram
flowchart LR
  A["Server Startup"] --> B["mark_stale_results_failed()"]
  B --> C["Collect Orphaned Jobs"]
  C --> D["Update with Error & Timestamp"]
  D --> E["Return Tuple"]
  E --> F["waiting_jobs"]
  E --> G["recovered_jobs"]
  G --> H["Per-job INFO Logging"]
  F --> I["Resume Background Tasks"]
  D --> J["Expose error in API"]

Loading

Grey Divider

File Changes

1. src/rootcoz/storage.py ✨ Enhancement +48/-14

Add error column and recovery tracking to orphaned jobs

• Add error column migration to results table with default empty string
• Define module-level constants ORPHAN_STATUSES and RESTART_ERROR_MSG
• Refactor mark_stale_results_failed() to return tuple of (waiting_jobs, recovered_jobs)
• Collect orphaned job details before marking failed, set error message and completed_at timestamp
• Expose error field in get_result() and list_results_for_dashboard() responses

src/rootcoz/storage.py


2. src/rootcoz/main.py ✨ Enhancement +7/-1

Log per-job recovery details during startup

• Unpack tuple return from mark_stale_results_failed() into waiting_jobs and recovered_jobs
• Add per-job INFO logging loop for each recovered orphaned job with job_id and previous_status

src/rootcoz/main.py


3. tests/test_storage.py 🧪 Tests +29/-7

Update tests for tuple return and error assertions

• Update all test calls to mark_stale_results_failed() to unpack tuple return value
• Add assertions for recovered list containing job_id and previous_status
• Add assertions verifying error message and completed_at timestamp on failed jobs
• Verify recovered list is empty for waiting jobs without result_json

tests/test_storage.py


Grey Divider

Qodo Logo

@myakove

myakove commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@qodo-code-review[bot]

The following review comments were reviewed and a decision was made:

src/rootcoz/storage.py:1922 (qodo requirement gap) — Missing statuses in ORPHAN_STATUSES

Addressed: By design per updated issue #100 spec — analyzing, peer_review, and queued are not DB statuses. They are progress phases inside result_json or API-only response values. ORPHAN_STATUSES correctly covers only (pending, running). See issue #100 Design Note section.

tests/test_storage.py:718 (qodo requirement gap) — No tests for new statuses

Addressed: By design per updated issue #100 spec — analyzing, peer_review, and queued are not DB statuses. They are progress phases inside result_json or API-only response values. ORPHAN_STATUSES correctly covers only (pending, running). See issue #100 Design Note section.

src/rootcoz/storage.py:2721 (qodo bug) — Restart error ignored

Addressed: Already addressed — the error field is exposed from both get_result() and list_results_for_dashboard(). The frontend reads job.error in DashboardPage (failureHint = job.summary || job.error) and StatusPage (res.result?.error). Restart-killed jobs show the error message in both dashboard and detail views.

@qodo-code-review

qodo-code-review Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit b5bd2b5

Comment thread frontend/src/pages/StatusPage.tsx
@qodo-code-review

qodo-code-review Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit e9f9f4d

@myakove

myakove commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@qodo-code-review[bot]

The following review comments were reviewed and a decision was made:

frontend/src/pages/StatusPage.tsx:128 (qodo rule violation) — ResultResponse missing error

Addressed: Fixed — added error?: string to ResultResponse interface in types/index.ts. See commit e9f9f4d.

myakove added 3 commits June 4, 2026 17:19
…on timestamp

- Add 'error' column to results table via migration
- Set error message and completed_at on orphaned pending/running jobs
  marked failed during startup recovery
- Return per-job recovery details (job_id + previous_status) from
  mark_stale_results_failed() for per-job INFO logging in lifespan
- Add ORPHAN_STATUSES and RESTART_ERROR_MSG module-level constants
  cross-referenced with ACTIVE_STATUSES
- Expose error field in get_result() and list_results_for_dashboard()
- Update existing tests for tuple return type; add error/completed_at
  assertions

Closes #100
@myakove myakove force-pushed the fix/issue-100-recover-orphaned-jobs branch from e9f9f4d to b6f2bdd Compare June 4, 2026 14:19
@qodo-code-review

qodo-code-review Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit b6f2bdd

Comment thread src/rootcoz/storage.py
@qodo-code-review

qodo-code-review Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 132bf4e

@myakove

myakove commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@qodo-code-review[bot]

The following review comments were reviewed and a decision was made:

src/rootcoz/storage.py:2716 (qodo requirement gap) — Orphan query ignores completed_at

Addressed: Fixed — added AND completed_at IS NULL to both SELECT and UPDATE queries. Jobs with completed_at already set are skipped. See commit 132bf4e.

Comment thread src/rootcoz/storage.py
@myakove

myakove commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@qodo-code-review[bot]

The following review comments were reviewed and a decision was made:

src/rootcoz/storage.py:2718 (qodo bug) — Recovered jobs may be wrong

Addressed: Already fixed in commit 132bf4e — both SELECT and UPDATE queries now include AND completed_at IS NULL, so recovered_jobs exactly matches the rows that get updated.

@myakove myakove merged commit d627221 into main Jun 5, 2026
4 of 6 checks passed
@myakove myakove deleted the fix/issue-100-recover-orphaned-jobs branch June 5, 2026 22:02
@myakove-bot

Copy link
Copy Markdown
Collaborator

New container for ghcr.io/ghcr.io/myk-org/rootcoz:latest published

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Recover orphaned jobs on startup

2 participants