Skip to content

fix(taskworker): Improve Queue Size Metrics#612

Merged
george-sentry merged 1 commit intomainfrom
george/push-taskbroker/better-worker-queue-size-metrics
Apr 29, 2026
Merged

fix(taskworker): Improve Queue Size Metrics#612
george-sentry merged 1 commit intomainfrom
george/push-taskbroker/better-worker-queue-size-metrics

Conversation

@george-sentry
Copy link
Copy Markdown
Member

@george-sentry george-sentry commented Apr 29, 2026

Linear

Completes STREAM-910

Description

  • Right now, the task queue size is only emitted on calls to the PushTask RPC endpoint, which means if a worker isn't being used, it won't be emitting its queue size metrics, giving us no insight into idle workers
  • We aren't emitting metrics for the result queue size, which is also very important
  • Finally, we aren't tagging queue size metrics by pod name, so the most we can do right now is an average across an entire processing pool, which is only marginally helpful

This PR fixes those things 💪

@george-sentry george-sentry requested a review from a team as a code owner April 29, 2026 17:09
Comment on lines +130 to 133
pod_name: str | None = None,
process_type: str = "spawn",
health_check_file_path: str | None = None,
health_check_sec_per_touch: float = DEFAULT_WORKER_HEALTH_CHECK_SEC_PER_TOUCH,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The TaskWorker class does not accept a pod_name parameter, causing queue size metrics to be incorrectly tagged with pod_name="unknown".
Severity: MEDIUM

Suggested Fix

Update the TaskWorker.__init__ method to accept a pod_name parameter and pass it to the TaskWorkerProcessingPool constructor, similar to how PushTaskWorker handles it.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: clients/python/src/taskbroker_client/worker/worker.py#L130-L133

Potential issue: The `TaskWorker` class, used for pull-mode workers, does not accept a
`pod_name` parameter in its `__init__` method. Consequently, it cannot pass the pod name
when it instantiates `TaskWorkerProcessingPool`. The `TaskWorkerProcessingPool` then
defaults the `pod_name` to `"unknown"`. This results in all queue size metrics for
pull-mode workers being tagged with `pod_name="unknown"`, making it difficult to monitor
queue sizes on a per-pod basis for this worker type.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care about pull workers.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8d8f444. Configure here.

child_tasks_queue_maxsize=child_tasks_queue_maxsize,
result_queue_maxsize=result_queue_maxsize,
processing_pool_name=processing_pool_name,
pod_name=pod_name,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskWorker missing pod_name parameter for metrics tagging

Medium Severity

The new pod_name parameter was added to PushTaskWorker and plumbed through to TaskWorkerProcessingPool, but the TaskWorker class (PULL mode) was not updated. Since TaskWorkerProcessingPool.result_thread now emits gauge metrics tagged with pod_name, all PULL-mode workers will always report pod_name="unknown", defeating the PR's goal of per-pod metric visibility for those workers.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8d8f444. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care about pull workers.

@linear-code
Copy link
Copy Markdown

linear-code Bot commented Apr 29, 2026

Copy link
Copy Markdown
Member

@evanh evanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved as is but would be better if pod name can be automatically added.

@george-sentry
Copy link
Copy Markdown
Member Author

Approved as is but would be better if pod name can be automatically added.

Agreed, will look into this in the future.

@george-sentry george-sentry merged commit c786fc6 into main Apr 29, 2026
23 checks passed
@george-sentry george-sentry deleted the george/push-taskbroker/better-worker-queue-size-metrics branch April 29, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants