Skip to content

Conversation

@cemde
Copy link
Collaborator

@cemde cemde commented Dec 6, 2025

Description

Purpose

This PR introduces a parallel task execution engine for maseval, enabling benchmarks to run tasks concurrently with configurable scheduling strategies and cooperative timeout handling.

Goals

  • Parallel execution: Allow benchmarks to execute tasks in parallel using num_workers parameter, significantly improving throughput for large-scale evaluations
  • Thread-safe architecture: Introduce ComponentRegistry with thread-local storage to ensure components are isolated across worker threads
  • Flexible task scheduling: Provide pluggable task queue abstractions (SequentialQueue, PriorityQueue, AdaptiveTaskQueue) for different scheduling needs
  • Cooperative timeout handling: Add TaskContext for tasks to check timeout status and gracefully handle time limits
  • Task-level execution control: Introduce TaskProtocol with timeout_seconds, timeout_action, max_retries, priority, and tags fields

Key Changes

  • New num_workers parameter in Benchmark.run() using ThreadPoolExecutor
  • ComponentRegistry class for thread-safe component registration
  • TaskContext with check_timeout(), elapsed, remaining, and is_expired properties
  • TaskProtocol dataclass for task-level execution configuration
  • TimeoutAction enum (SKIP, RETRY, RAISE) for configurable timeout behavior
  • TaskQueue abstract base class with SequentialQueue, PriorityQueue, and AdaptiveTaskQueue implementations
  • TaskTimeoutError exception with partial trace preservation
  • New TASK_TIMEOUT status in TaskExecutionStatus enum

Checklist

Contribution

Documentation

  • Added/updated docstrings for new/modified functions as instructed CONTRIBUTING.md
  • Updated relevant documentation in docs/ (if applicable)
  • Tag github issue with this PR (if applicable)

Changelog

  • Added entry to CHANGELOG.md under [Unreleased] section
    • Use Added section for new features
    • Use Changed section for modifications to existing functionality
    • Use Fixed section for bug fixes
    • Use Removed section for deprecated/removed features
  • OR this is a documentation-only change (no changelog needed)

Example:
- Support for multi-agent tracing (PR:#123)

Architecture (if applicable)

  • Core/Interface separation: Changes in maseval/core/ do NOT import from maseval/interface/
  • Dependencies: New core dependencies added sparingly; framework integrations go to optional dependencies

Additional Notes

@github-actions
Copy link

github-actions bot commented Dec 6, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  maseval
  __init__.py
  maseval/benchmark/macs
  data_loader.py 249, 297
  macs.py
  maseval/benchmark/tau2
  data_loader.py
  evaluator.py 537, 623
  tau2.py
  maseval/core
  benchmark.py 971-991, 1026-1034, 1047-1048, 1063-1064, 1079-1080, 1104-1110, 1122-1130, 1271-1273, 1300-1305, 1435, 1455-1456, 1473
  context.py
  exceptions.py
  registry.py 158, 169, 171, 174, 216, 227, 229, 232, 243
  simulator.py
  task.py 412, 428, 440-443
Project Total  

This report was generated by python-coverage-comment-action

@cemde cemde force-pushed the new-running-engine branch from 892c1fe to 68851b4 Compare December 21, 2025 11:24
Copy link
Collaborator Author

@cemde cemde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cemde cemde merged commit d242b95 into main Jan 18, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants