Skip to content

Comments

feat(python): Move scheduler files and tests from the prototype#540

Merged
markstory merged 3 commits intomainfrom
feat-client-scheduler
Jan 23, 2026
Merged

feat(python): Move scheduler files and tests from the prototype#540
markstory merged 3 commits intomainfrom
feat-client-scheduler

Conversation

@markstory
Copy link
Member

Add the scheduler and its tests to the client libraries.

Refs STREAM-605

Add the scheduler and its tests to the client libraries.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@markstory markstory requested a review from a team as a code owner January 23, 2026 16:32
@linear
Copy link

linear bot commented Jan 23, 2026

return checkin_config


class ScheduleRunner:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the worker, the client libraries will provide most of the component, but the application still needs to build an entrypoint that loads config, and sets up the TaskbrokerApp that tasks are run from, and define all the schedules.

Comment on lines +224 to +229
try:
self._try_spawn(entry)
except Exception as e:
# Trap errors from spawning/update state so that the heap stays consistent.
capture_exception(e)
heapq.heappush(self._heap, (entry.remaining_seconds(), entry))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: An exception in delay_task() prevents _last_run from updating, causing an infinite loop in ScheduleRunner.tick() as the same task is processed repeatedly.
Severity: HIGH

Suggested Fix

Move the heapq.heappush call that re-schedules the entry into the try block in ScheduleRunner.tick(). This ensures that the task is only re-queued for its next run if the current attempt to spawn it succeeds without raising an exception. Failed tasks will be logged but not immediately retried in a tight loop.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: clients/python/src/taskbroker_client/scheduler/runner.py#L224-L229

Potential issue: In `ScheduleRunner.tick()`, if `entry.delay_task()` raises an exception
(e.g., from a Kafka producer error), the exception is caught, but the task's `_last_run`
timestamp is not updated because `entry.set_last_run(now)` is skipped. The task is then
immediately pushed back onto the heap with a remaining time near zero. This causes the
`while` loop to process the same failed task repeatedly, leading to a tight infinite
loop that consumes 100% CPU, logs excessive errors, and prevents any other scheduled
tasks from running.

Did we get this right? 👍 / 👎 to inform future reviews.

@markstory markstory merged commit bad069b into main Jan 23, 2026
26 of 27 checks passed
@markstory markstory deleted the feat-client-scheduler branch January 23, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants