Skip to content

Conversation

@okhat
Copy link
Collaborator

@okhat okhat commented Mar 6, 2025

  • Added new parameters timeout_seconds (default 120) and straggler_limit (default 3) to control when slow tasks are re-executed.
  • Modified multi-thread execution to record the actual start time of tasks inside the worker thread, ensuring accurate timeout detection even when tasks are queued.
  • Updated the executor loop to poll for completed futures and, if only a few tasks remain and any exceed the timeout, resubmit them exactly once.
  • Ensured that once a task’s result is returned (from either the original or resubmitted copy), it is recorded and the overall execution terminates immediately without waiting for the slow original.
  • Maintained all existing behaviors including thread-local overrides, error handling, progress bar updates, and signal (SIGINT) interruption handling.

@okhat okhat merged commit a2fc6a1 into main Mar 6, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants