[Feature] Use seperate log files for different workers #897

jan-janssen · 2026-02-02T19:13:41Z

Summary by CodeRabbit

New Features
- Added worker identification to interactive spawners and introduced per-worker log filenames for clearer diagnostics and tracking.
Tests
- Updated tests to expect worker-specific Flux log filenames.

coderabbitai · 2026-02-02T19:14:00Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

A worker_id: int = 0 parameter was added to BaseSpawner and propagated through SubprocessSpawner, FluxPythonSpawner, PysqaSpawner, and SrunSpawner; blockallocation now forwards worker_id when creating spawners. FluxPythonSpawner uses worker_id to generate per-worker flux_<id>.out/.err filenames when flux_log_files is enabled.

Changes

Cohort / File(s)	Summary
Base Spawner Infrastructure `src/executorlib/standalone/interactive/spawner.py`	Added `worker_id: int = 0` to `BaseSpawner.__init__` (stored as `self._worker_id`) and to `SubprocessSpawner.__init__`, forwarded via `super()`.
Flux Spawner Implementation `src/executorlib/task_scheduler/interactive/spawner_flux.py`	Added `worker_id` parameter to `FluxPythonSpawner.__init__` and passed to base; when `flux_log_files` is enabled, bootup uses `flux_<worker_id>.out` and `flux_<worker_id>.err` (cwd and absolute paths) for per-worker logs.
PYSQA & SLURM Spawners `src/executorlib/task_scheduler/interactive/spawner_pysqa.py`, `src/executorlib/task_scheduler/interactive/spawner_slurm.py`	Added `worker_id: int = 0` to `PysqaSpawner` and `SrunSpawner` constructors and forwarded to base via `super().__init__(..., worker_id=worker_id)`.
Block Allocation Integration `src/executorlib/task_scheduler/interactive/blockallocation.py`	Changed `_execute_multiple_tasks` signature from `worker_id: Optional[int] = None` to `worker_id: int = 0` and pass `worker_id=worker_id` when creating spawners.
Tests `tests/unit/executor/test_flux_job.py`	Updated expected Flux log filenames in tests to include worker id suffix (e.g., `flux_0.out` / `flux_0.err`).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[Feature] Use seperate log files for different workers #897 — Adds the same worker_id parameter across BaseSpawner and spawner subclasses and updates Flux log naming.
Add worker ID #748 — Similar plumbing changes to propagate worker_id through interactive spawners and allocation.
Execute single task #796 — Touches block-allocation and spawner wiring; related to _execute_multiple_tasks and spawner creation changes.

Poem

🐰 I hop with a tiny worker id,
I stamp each log with flux_0 beside,
Small numbered files where traces hide,
A rabbit's trail for every stride,
Debugging hops made warm and wide.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main feature: adding worker_id parameters across spawners to enable separate log files for different workers.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch worker_id_log_file_prefix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/executorlib/task_scheduler/interactive/blockallocation.py (1)

110-120: ⚠️ Potential issue | 🟠 Major

New workers added via max_workers setter don't receive unique worker_id values.

When max_workers is increased, new threads are created without passing worker_id. This means dynamically added workers will all have worker_id=None (or the default 0), defeating the purpose of per-worker log file separation for these workers.

🔧 Suggested fix

             elif self._max_workers < max_workers:
                 new_process_lst = [
                     Thread(
                         target=_execute_multiple_tasks,
-                        kwargs=self._process_kwargs,
+                        kwargs=self._process_kwargs
+                        | {
+                            "worker_id": self._max_workers + i,
+                            "stop_function": lambda: _interrupt_bootup_dict[self._self_id],
+                        },
                     )
-                    for _ in range(max_workers - self._max_workers)
+                    for i in range(max_workers - self._max_workers)
                 ]

🤖 Fix all issues with AI agents

In `@src/executorlib/task_scheduler/interactive/blockallocation.py`:
- Line 247: _spawner is being called with worker_id that can be None (from
_execute_multiple_tasks) while spawner constructors expect int; update the call
site so an int is always passed: either coerce None to a default (e.g., 0)
inside _execute_multiple_tasks when invoking
spawner(connections=spawner(cores=cores, worker_id=worker_id or 0, **kwargs)) or
change the max_workers setter to populate unique integer worker_id values into
_process_kwargs for each new worker before _execute_multiple_tasks is invoked;
touch the _execute_multiple_tasks function, the spawner(...) invocation, and the
max_workers setter/_process_kwargs population to ensure worker_id is never None.

🧹 Nitpick comments (2)

src/executorlib/standalone/interactive/spawner.py (2)

21-24: Docstring parameter order doesn't match signature.

The signature has worker_id before openmpi_oversubscribe, but the docstring lists them in reverse order. Consider reordering for consistency.

📝 Suggested docstring reorder

         Args:
             cwd (str): The current working directory.
             cores (int, optional): The number of cores to use. Defaults to 1.
+            worker_id (int): The worker ID. Defaults to 0.
             openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False.
-            worker_id (int): The worker ID. Defaults to 0.

82-87: Docstring parameter order doesn't match signature.

Similar to BaseSpawner, the docstring parameter order differs from the signature. The signature has worker_id at position 3, but the docstring lists it after threads_per_core.

📝 Suggested docstring reorder

         Args:
             cwd (str, optional): The current working directory. Defaults to None.
             cores (int, optional): The number of cores to use. Defaults to 1.
+            worker_id (int): The worker ID. Defaults to 0.
+            openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False.
             threads_per_core (int, optional): The number of threads per core. Defaults to 1.
-            worker_id (int): The worker ID. Defaults to 0.
-            openmpi_oversubscribe (bool, optional): Whether to oversubscribe the cores. Defaults to False.

coderabbitai · 2026-02-02T19:17:03Z

src/executorlib/task_scheduler/interactive/blockallocation.py

            cores=cores,
        ),
-        connections=spawner(cores=cores, **kwargs),
+        connections=spawner(cores=cores, worker_id=worker_id, **kwargs),


⚠️ Potential issue | 🟡 Minor

Potential type mismatch: worker_id can be None but spawner expects int.

The worker_id parameter in _execute_multiple_tasks is typed as Optional[int] = None (Line 211), but all spawner constructors expect worker_id: int = 0. While Python won't raise a runtime error (the default would be used), the explicit worker_id=worker_id pass would send None to a parameter expecting int.

This will manifest when using the max_workers setter (Lines 111-116), which creates new threads without passing worker_id in _process_kwargs.

🛡️ Suggested fix

Either handle None in the spawner call:

- connections=spawner(cores=cores, worker_id=worker_id, **kwargs), + connections=spawner(cores=cores, worker_id=worker_id if worker_id is not None else 0, **kwargs),

Or update the max_workers setter to pass unique worker_id values for new workers.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

connections=spawner(cores=cores, worker_id=worker_id, **kwargs),

connections=spawner(cores=cores, worker_id=worker_id if worker_id is not None else 0, **kwargs),

🤖 Prompt for AI Agents

In `@src/executorlib/task_scheduler/interactive/blockallocation.py` at line 247, _spawner is being called with worker_id that can be None (from _execute_multiple_tasks) while spawner constructors expect int; update the call site so an int is always passed: either coerce None to a default (e.g., 0) inside _execute_multiple_tasks when invoking spawner(connections=spawner(cores=cores, worker_id=worker_id or 0, **kwargs)) or change the max_workers setter to populate unique integer worker_id values into _process_kwargs for each new worker before _execute_multiple_tasks is invoked; touch the _execute_multiple_tasks function, the spawner(...) invocation, and the max_workers setter/_process_kwargs population to ensure worker_id is never None.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/executor/test_flux_job.py (1)

113-133: ⚠️ Potential issue | 🔴 Critical

Test file names are incorrect and will fail.

The implementation in spawner_flux.py line 127 creates log files with the format flux_<worker_id>.out/err (with underscore: flux_0.out, flux_0.err), but the test at lines 116-117 and 136-137 checks for flux0.out/flux0.err (without underscore). The test will fail when executed because the expected file names do not match what the implementation actually creates.

Update the test to expect flux_0.out and flux_0.err instead of flux0.out and flux0.err.

jan-janssen · 2026-02-02T20:05:47Z

Feb 02, 2026 - 19:07 UTC:

Update - GitHub Actions hosted runners are experiencing high wait times across all labels. Self-hosted runners are not impacted.

codecov · 2026-02-03T05:55:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.47%. Comparing base (8f4f45e) to head (4b353cc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #897   +/-   ##
=======================================
  Coverage   93.46%   93.47%           
=======================================
  Files          38       38           
  Lines        1851     1853    +2     
=======================================
+ Hits         1730     1732    +2     
  Misses        121      121

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[Feature] Use seperate log files for different workers

4289683

jan-janssen linked an issue Feb 2, 2026 that may be closed by this pull request

[Feature] One log file per flux worker #896

Closed

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

fixes

7ac89f4

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

jan-janssen marked this pull request as draft February 2, 2026 20:05

Update test_flux_job.py

4b353cc

jan-janssen marked this pull request as ready for review February 3, 2026 06:32

jan-janssen merged commit 1ad100c into main Feb 3, 2026
112 of 119 checks passed

jan-janssen deleted the worker_id_log_file_prefix branch February 3, 2026 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Use seperate log files for different workers #897

[Feature] Use seperate log files for different workers #897

jan-janssen commented Feb 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 2, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

jan-janssen commented Feb 2, 2026

Uh oh!

codecov bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	connections=spawner(cores=cores, worker_id=worker_id, **kwargs),
	connections=spawner(cores=cores, worker_id=worker_id if worker_id is not None else 0, **kwargs),

[Feature] Use seperate log files for different workers #897

[Feature] Use seperate log files for different workers #897

Conversation

jan-janssen commented Feb 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

jan-janssen commented Feb 2, 2026

Uh oh!

codecov bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jan-janssen commented Feb 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

codecov bot commented Feb 3, 2026 •

edited

Loading