Interactive: Separate future and taskdict #800

jan-janssen · 2025-08-31T17:12:14Z

Summary by CodeRabbit

Refactor
- Task execution now passes task completion handles explicitly, unifying threaded and cached execution paths.
Bug Fixes
- Improved cancellation handling, more reliable result/error propagation, and reduced race conditions and stale active-task tracking.
- Fixed occasional hangs when returning cached results.
Documentation
- Updated inline docs and type hints to reflect the revised execution flow.

coderabbitai · 2025-08-31T17:12:21Z

Walkthrough

Refactors interactive schedulers to extract the Future from task_dict and pass it explicitly as a new future_obj parameter to execute_task_dict and its helpers; updates callers (onetoone, blockallocation) to pop and propagate the Future and to track active tasks by Future identity. All result/exception propagation uses the provided Future.

Changes

Cohort / File(s)	Summary of Edits
Shared execution API `executorlib/task_scheduler/interactive/shared.py`	Added `future_obj: Future` parameter to `execute_task_dict`, `_execute_task_with_cache`, and `_execute_task_without_cache`; added `task_done` helper; replaced in-dict Future usage with the passed `future_obj` for cancellation, set_running_or_notify_cancel, set_result, and set_exception. Updated imports, type hints, and docstrings.
Onetoone scheduler updates `executorlib/task_scheduler/interactive/onetoone.py`	Pop `future` from `task_dict` early, store as local `f`; use `f` as key in `active_task_dict`; thread/task kwargs now include `future_obj`: `f`; `_execute_task_in_thread` signature extended with `future_obj: Future`; call to `execute_task_dict` passes `future_obj=future_obj`.
Block allocation scheduler updates `executorlib/task_scheduler/interactive/blockallocation.py`	Pop `future` from `task_dict` and pass it as `future_obj` to `execute_task_dict` (removed `future` from task_dict). Adjusted caller sites to propagate the Future separately.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant P as Producer
  participant S as Scheduler (onetoone/blockallocation)
  participant T as Worker Thread
  participant X as execute_task_dict (shared)
  participant C as Cache Helpers
  participant F as Future

  P->>S: submit(task_dict {fn, future})
  S->>S: f = task_dict.pop("future")
  S->>S: active_task_dict[f] = task_dict
  S->>T: start _execute_task_in_thread(task_dict, future_obj=f)

  T->>X: execute_task_dict(task_dict, future_obj=f, interface, ...)
  alt cache enabled
    X->>C: _execute_task_with_cache(..., future_obj=f, ...)
    C->>F: set_running_or_notify_cancel()
    C-->>F: set_result(...) / set_exception(...)
  else no cache
    X->>C: _execute_task_without_cache(..., future_obj=f)
    C->>F: set_running_or_notify_cancel()
    C-->>F: set_result(...) / set_exception(...)
  end

  X-->>T: done
  T-->>S: task_done
  S->>S: del active_task_dict[f]

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Interactive: refactor task done #795 — Modifies the same interactive/shared functions and moves completion signalling toward an externally-passed Future.
Refactor interactive task scheduler #798 — Adjusts onetoone execution path to propagate an external Future into execute_task_dict, similar caller-side changes.

Poem

In burrows deep where callbacks play,
I pull the Future, hop away.
No longer nested in dicts I roam—
I carry promise to the worker's home.
Result or error, I gently set;
A rabbit's thump: the task is met. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch separate_future_and_taskdict

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

for more information, see https://pre-commit.ci

codecov · 2025-08-31T17:17:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.75%. Comparing base (2842dd4) to head (8ac69af).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #800   +/-   ##
=======================================
  Coverage   97.75%   97.75%           
=======================================
  Files          32       32           
  Lines        1468     1468           
=======================================
  Hits         1435     1435           
  Misses         33       33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

executorlib/task_scheduler/interactive/blockallocation.py (1)

243-253: Ensure queue.task_done is always called

If execute_task_dict unexpectedly raises, the queue item would never be marked done, risking deadlock on join(). Wrap in try/finally.

-        f = task_dict.pop("future")
-        execute_task_dict(
-            task_dict=task_dict,
-            future_obj=f,
-            interface=interface,
-            cache_directory=cache_directory,
-            cache_key=cache_key,
-            error_log_file=error_log_file,
-        )
-        task_done(future_queue=future_queue)
+        f = task_dict.pop("future")
+        try:
+            execute_task_dict(
+                task_dict=task_dict,
+                future_obj=f,
+                interface=interface,
+                cache_directory=cache_directory,
+                cache_key=cache_key,
+                error_log_file=error_log_file,
+            )
+        finally:
+            task_done(future_queue=future_queue)

🧹 Nitpick comments (2)

executorlib/task_scheduler/interactive/shared.py (1)

73-76: Unify Future state transition (minor)

The explicit not future_obj.done() guard is redundant; set_running_or_notify_cancel() already handles done/cancelled states. Consider simplifying for consistency with the cache path.
-    if not future_obj.done() and future_obj.set_running_or_notify_cancel():
+    if future_obj.set_running_or_notify_cancel():

executorlib/task_scheduler/interactive/onetoone.py (1)

259-274: Potential process leak: interface not shut down after single-task execution

_in execute_task_in_thread, a new interface is created per call but never explicitly shut down on success. If the remote process persists, this leaks resources. Consider capturing the interface and shutting it down in a finally block. If the design intends a persistent one-shot child that auto-exits, please ignore after confirming.

-    execute_task_dict(
-        task_dict=task_dict,
-        future_obj=future_obj,
-        interface=interface_bootup(
-            command_lst=get_interactive_execute_command(
-                cores=cores,
-            ),
-            connections=spawner(cores=cores, **kwargs),
-            hostname_localhost=hostname_localhost,
-            log_obj_size=log_obj_size,
-            worker_id=worker_id,
-        ),
-        cache_directory=cache_directory,
-        cache_key=cache_key,
-        error_log_file=error_log_file,
-    )
+    iface = interface_bootup(
+        command_lst=get_interactive_execute_command(cores=cores),
+        connections=spawner(cores=cores, **kwargs),
+        hostname_localhost=hostname_localhost,
+        log_obj_size=log_obj_size,
+        worker_id=worker_id,
+    )
+    try:
+        execute_task_dict(
+            task_dict=task_dict,
+            future_obj=future_obj,
+            interface=iface,
+            cache_directory=cache_directory,
+            cache_key=cache_key,
+            error_log_file=error_log_file,
+        )
+    finally:
+        # Safe even if already closed inside execute_task_dict on error
+        iface.shutdown(wait=True)

If helpful, I can add a short test/diagnostic to assert no lingering child processes after a simple submit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2842dd4 and abe1d3c.

📒 Files selected for processing (3)

executorlib/task_scheduler/interactive/blockallocation.py (1 hunks)
executorlib/task_scheduler/interactive/onetoone.py (6 hunks)
executorlib/task_scheduler/interactive/shared.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

executorlib/task_scheduler/interactive/blockallocation.py (1)

executorlib/task_scheduler/interactive/shared.py (1)

execute_task_dict (12-47)

executorlib/task_scheduler/interactive/shared.py (3)

executorlib/task_scheduler/base.py (2)

future_queue (63-70)

shutdown (146-169)

executorlib/standalone/interactive/communication.py (3)

SocketInterface (14-153)

send_and_receive_dict (82-94)

shutdown (120-134)

executorlib/standalone/hdf.py (2)

dump (20-37)

get_output (71-87)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
GitHub Check: unittest_flux_mpich
GitHub Check: notebooks_integration
GitHub Check: unittest_slurm_mpich
GitHub Check: notebooks
GitHub Check: unittest_flux_openmpi
GitHub Check: unittest_old
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
GitHub Check: unittest_win

🔇 Additional comments (3)

executorlib/task_scheduler/interactive/shared.py (2)

51-56: Docstring addition looks good

Thanks for documenting task_done; clear and accurate.

37-47: API threading of future_obj is correct

Passing future_obj through both code paths is consistent and maintains external ownership.

executorlib/task_scheduler/interactive/onetoone.py (1)

190-211: Future propagation and active task tracking look correct

Popping future, using it as the active_task_dict key, and threading via future_obj is coherent.

coderabbitai · 2025-08-31T17:20:55Z

executorlib/task_scheduler/interactive/shared.py

    else:
        _, _, result = get_output(file_name=file_name)
-        future = task_dict["future"]
-        future.set_result(result)
+        future_obj.set_result(result)


⚠️ Potential issue

Cache-hit path ignores cancellation; may raise InvalidStateError

When the result is served from cache, the Future is completed unconditionally. If the Future was cancelled/prematurely completed, this can raise InvalidStateError and violate expected cancellation semantics. Mirror the non-cache path by honoring set_running_or_notify_cancel().

Apply:

- else: - _, _, result = get_output(file_name=file_name) - future_obj.set_result(result) + else: + _, _, result = get_output(file_name=file_name) + if future_obj.set_running_or_notify_cancel(): + future_obj.set_result(result)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

_, _, result = get_output(file_name=file_name)

future = task_dict["future"]

future.set_result(result)

future_obj.set_result(result)

else:

_, _, result = get_output(file_name=file_name)

if future_obj.set_running_or_notify_cancel():

future_obj.set_result(result)

🤖 Prompt for AI Agents

In executorlib/task_scheduler/interactive/shared.py around lines 122 to 124, the cache-hit branch unconditionally calls future_obj.set_result(result) which can raise InvalidStateError if the Future was cancelled; mirror the non-cache path by first calling future_obj.set_running_or_notify_cancel() and if it returns False stop/return without setting the result, otherwise call set_result(result) (optionally catch InvalidStateError as a safeguard).

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

executorlib/task_scheduler/interactive/shared.py (1)
121-124: Cache-hit bug: propagate cached errors via set_exception; cancellation concern from earlier review is now moot

The previous comment about honoring cancellation in the cache-hit branch is addressed by gating at the entrypoint with set_running_or_notify_cancel(). However, a separate issue remains: get_output() returns a success flag; when it’s False (cached error), you currently call set_result(result), which would incorrectly mark the Future successful with an exception object.

Apply:
-    else:
-        _, _, result = get_output(file_name=file_name)
-        future_obj.set_result(result)
+    else:
+        try:
+            _found, _success, _obj = get_output(file_name=file_name)
+            if _success:
+                future_obj.set_result(_obj)
+            else:
+                # Cached execution resulted in an error; propagate as exception
+                future_obj.set_exception(_obj if isinstance(_obj, BaseException) else RuntimeError(f"Cached error: {_obj!r}"))
+        except Exception as e:
+            # Reading cache failed; surface the failure instead of silently succeeding
+            future_obj.set_exception(e)

🧹 Nitpick comments (2)

executorlib/task_scheduler/interactive/shared.py (2)

52-57: Docstring/typing nit for task_done

Tighten wording and the parameter type; current text mentions “task dictionaries”.

Apply:

-    """
-    Mark the current task as done in the current queue.
-
-    Args:
-        future_queue (queue): Queue of task dictionaries waiting for execution.
-    """
+    """
+    Mark the current task as done on the given queue.
+
+    Args:
+        future_queue (queue.Queue): Work queue whose current task has finished.
+    """

111-120: Use high-resolution clock for runtime measurement (minor)

Prefer time.perf_counter() for durations.

Apply:

-            time_start = time.time()
+            time_start = time.perf_counter()
@@
-            data_dict["runtime"] = time.time() - time_start
+            data_dict["runtime"] = time.perf_counter() - time_start

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between abe1d3c and 8ac69af.

📒 Files selected for processing (1)

executorlib/task_scheduler/interactive/shared.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

executorlib/task_scheduler/interactive/shared.py (2)

executorlib/standalone/interactive/communication.py (3)

SocketInterface (14-153)

send_and_receive_dict (82-94)

shutdown (120-134)

executorlib/standalone/hdf.py (2)

dump (20-37)

get_output (71-87)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
GitHub Check: unittest_flux_openmpi
GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
GitHub Check: notebooks_integration
GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
GitHub Check: unittest_openmpi (macos-latest, 3.13)
GitHub Check: notebooks
GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
GitHub Check: unittest_mpich (macos-latest, 3.13)
GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
GitHub Check: unittest_win
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)

🔇 Additional comments (4)

executorlib/task_scheduler/interactive/shared.py (4)

34-48: Cancellation respected before dispatch — LGTM

The entrypoint correctly gates execution with done() and set_running_or_notify_cancel().

74-79: Non-cache path error handling — LGTM

Shutting down the interface on exception and completing the Future via set_exception is appropriate.

12-19: All execute_task_dict invocations updated to include future_obj; no stale or external helper calls remain.

35-36: Cache key excludes error_log_file
serialize_funct only serializes fn_args, fn_kwargs, resource_dict (and an optional cache_key) when computing the task key, so adding error_log_file to task_dict won’t fragment the cache.

Interactive: Separate future and taskdict

3672e57

pyiron-runner and others added 2 commits August 31, 2025 17:12

Format black

527695c

[pre-commit.ci] auto fixes from pre-commit.com hooks

abe1d3c

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Aug 31, 2025

View reviewed changes

jan-janssen and others added 2 commits August 31, 2025 19:30

Update shared.py

bf24f79

refactor

8ac69af

coderabbitai bot reviewed Aug 31, 2025

View reviewed changes

jan-janssen merged commit 016a2ba into main Aug 31, 2025
62 of 63 checks passed

jan-janssen deleted the separate_future_and_taskdict branch August 31, 2025 17:52

coderabbitai bot mentioned this pull request Aug 31, 2025

Interactive: Interrupt interface bootup when the executor is shutdown during bootup #801

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interactive: Separate future and taskdict #800

Interactive: Separate future and taskdict #800

Uh oh!

jan-janssen commented Aug 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 31, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 31, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 31, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Interactive: Separate future and taskdict #800

Interactive: Separate future and taskdict #800

Uh oh!

Conversation

jan-janssen commented Aug 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jan-janssen commented Aug 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 31, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Aug 31, 2025 •

edited

Loading