Worker for overflow queue #763

jan-janssen · 2025-07-27T21:14:31Z

A different formulation would be block_allocation for the SlurmClusterExecutor and the FluxClusterExecutor.

Open Tasks:

Handle workers which are deleted from the queuing system. Check that new workers are submitted as needed.
Resubmit functions when the worker is killed during the execution.
There might be an issue that the worker is never deleted when the job does not even start running.
Improve test coverage to match the coverage of the overall project.
Replace hash with number of worker and check the worker directory is generated based on the cache_directory and not the working directory.

Summary by CodeRabbit

New Features
- Added optional block-allocation mode powered by PySQA for Flux and Slurm executors, enabling faster submission of multiple tasks with shared resources.
- Automatic scheduler selection when block allocation is enabled or disabled.
- Enhanced command generation for Slurm and Flux backends with PMI mode support and validation of unsupported options.
Tests
- Expanded coverage for block allocation workflows and PySQA-backed spawner lifecycle.
- Added conditional test skipping when PySQA or Flux is unavailable.
- Introduced initialization tests for Slurm executor modes and error-path validations.

# Conflicts: # executorlib/standalone/scheduler.py # executorlib/task_scheduler/file/queue_spawner.py

# Conflicts: # executorlib/standalone/command.py # executorlib/task_scheduler/file/shared.py # executorlib/task_scheduler/interactive/slurmspawner.py # tests/test_standalone_command.py

coderabbitai · 2025-07-27T21:14:39Z

Walkthrough

Adds a PySQA-backed block-allocation pathway. Flux and Slurm executors now conditionally create a block-allocation scheduler via create_pysqa_block_allocation_scheduler when block_allocation=True; otherwise they use the file-based executor. Introduces PysqaSpawner and a factory to build BlockAllocationTaskScheduler. Tests added for Flux block allocation, PysqaSpawner lifecycle/commands, and Slurm init gating.

Changes

Cohort / File(s)	Summary
Executors: conditional block-allocation routing `executorlib/executor/flux.py`, `executorlib/executor/slurm.py`	Add conditional constructor path: when block_allocation=True, import/use create_pysqa_block_allocation_scheduler (backend set accordingly); else retain create_file_executor. Flux path passes additional options (e.g., log_obj_size, pmi_mode).
PySQA spawner and factory `executorlib/task_scheduler/interactive/spawner_pysqa.py`	Add PysqaSpawner (BaseSpawner subclass) with bootup/generate_command/shutdown/poll; supports Slurm and Flux command generation and job lifecycle via QueueAdapter. Add create_pysqa_block_allocation_scheduler factory returning BlockAllocationTaskScheduler configured with PysqaSpawner and validated worker/core settings.
Flux tests: block allocation and spawner `tests/test_fluxclusterexecutor.py`	Add tests for FluxClusterExecutor with block allocation (results, cache checks) and PysqaSpawner lifecycle (sleep job, large-core failure handling). Conditional skips based on Flux availability.
Slurm tests: init gating `tests/test_slurmclusterexecutor.py`	Add PySQA availability gating and SlurmClusterExecutor init tests for block_allocation True/False behavior.
Standalone interactive backend tests: PySQA command/error paths `tests/test_standalone_interactive_backend.py`	Add conditional PySQA import, tests for PysqaSpawner command generation (Slurm/Flux) and error scenarios, plus create_pysqa_block_allocation_scheduler validation errors.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User code
  participant Exec as Flux/Slurm Executor
  participant FFactory as create_file_executor
  participant BFactory as create_pysqa_block_allocation_scheduler
  participant Sched as BlockAllocationTaskScheduler
  participant Sp as PysqaSpawner

  U->>Exec: __init__(block_allocation, resource_dict, ...)
  alt block_allocation == True
    Exec->>BFactory: build scheduler (backend="flux" or "slurm", config, resources)
    BFactory->>Sp: construct PysqaSpawner(config/resources)
    BFactory->>Sched: BlockAllocationTaskScheduler(spawner=Sp, max_workers)
    Exec-->>U: executor with block-allocation scheduler
  else
    Exec->>FFactory: create_file_executor(...)
    Exec-->>U: executor with file-based scheduler
  end

sequenceDiagram
  autonumber
  participant Sched as BlockAllocationTaskScheduler
  participant Sp as PysqaSpawner
  participant QA as QueueAdapter
  participant Q as Queue (Slurm/Flux)
  participant W as Worker process

  Sched->>Sp: bootup(command_lst, stop_function?)
  Sp->>QA: submit(cores, cwd, cmd)
  QA->>Q: enqueue job
  Sp->>Sp: _check_process_helper()
  Q-->>QA: status (PENDING/RUNNING/ERROR)
  QA-->>Sp: status
  alt status == RUNNING
    Sp-->>Sched: True (started)
  else status == ERROR
    Sp->>QA: terminate()
    Sp-->>Sched: False
  end

  loop polling
    Sched->>Sp: poll()
    Sp->>QA: status()
    QA-->>Sp: RUNNING/PENDING/DONE
    Sp-->>Sched: bool
  end

  Sched->>Sp: shutdown(wait=True)
  Sp->>QA: terminate()
  Sp-->>Sched: complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Refactor interactive task scheduler #798 — Refactors BlockAllocationTaskScheduler execution; directly related to adding a PySQA-backed factory that constructs this scheduler.
Refactor Interface #400 — Adjusts executor/spawner interfaces; aligns with introducing PysqaSpawner and routing executors through spawner-based block allocation.
Move scheduler to standalone #758 — Centralizes PySQA utilities; complements the new PysqaSpawner and block-allocation factory relying on PySQA helpers.

Poem

A rabbit taps keys with delighted flair,
Blocks of cores reserved, jobs fill the air.
Slurm and Flux now dance in queue,
PySQA whispers, “I’ve got you.”
From cache to spawn, the tasks align—
Hop, submit, and watch them shine. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch worker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

for more information, see https://pre-commit.ci

codecov · 2025-07-27T21:19:42Z

Codecov Report

❌ Patch coverage is 99.18699% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 98.12%. Comparing base (d93e777) to head (5f7b676).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...torlib/task_scheduler/interactive/spawner_pysqa.py	99.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #763      +/-   ##
==========================================
+ Coverage   97.84%   98.12%   +0.27%     
==========================================
  Files          32       33       +1     
  Lines        1530     1649     +119     
==========================================
+ Hits         1497     1618     +121     
+ Misses         33       31       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jan-janssen · 2025-08-19T09:32:21Z

Test it on

for more information, see https://pre-commit.ci

# Conflicts: # executorlib/standalone/interactive/communication.py # executorlib/standalone/interactive/spawner.py # executorlib/task_scheduler/interactive/blockallocation.py # executorlib/task_scheduler/interactive/shared.py # executorlib/task_scheduler/interactive/spawner_flux.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (11)

executorlib/executor/slurm.py (1)

189-211: Non-block path: init_function will raise in create_file_executor; consider an early, clearer error.

Passing init_function to create_file_executor triggers a ValueError (“not available with the pysqa based backend”). A short pre-check here with a tailored message would be more direct for users.

                 super().__init__(
                     executor=create_file_executor(
                         max_workers=max_workers,
                         backend="slurm",
                         max_cores=max_cores,
                         cache_directory=cache_directory,
                         resource_dict=resource_dict,
                         pmi_mode=pmi_mode,
                         flux_executor=None,
                         flux_executor_nesting=False,
                         flux_log_files=False,
                         pysqa_config_directory=pysqa_config_directory,
                         hostname_localhost=hostname_localhost,
-                        block_allocation=block_allocation,
-                        init_function=init_function,
+                        block_allocation=block_allocation,
+                        init_function=init_function,  # will raise downstream
                         disable_dependencies=disable_dependencies,
                     )
                 )

Alternatively (preferred), add an explicit guard just before the call:

if init_function is not None:
    raise ValueError("init_function is unsupported with the pysqa file backend; use block_allocation=True.")

executorlib/executor/flux.py (1)

381-401: Non-block path mirrors existing behavior; minor polish only.

Looks fine. If you adopt the earlier pre-check for init_function in Slurm, consider mirroring it here for consistency.

tests/test_fluxclusterexecutor.py (4)

64-78: Avoid brittle assertions on cache directory size.

len(os.listdir("executorlib_cache")) is environment- and implementation-dependent. Assert minimum expectations or patterns instead.

Apply this diff:
-            self.assertEqual(len(os.listdir("executorlib_cache")), 2)
+            # block-allocation creates at least 2 artifacts (task + input)
+            self.assertGreaterEqual(len(os.listdir("executorlib_cache")), 2)
79-97: Same here: relax directory-size assertion for robustness.

Use a lower bound rather than an exact count.
-            self.assertEqual(len(os.listdir("executorlib_cache")), 4)
+            # two tasks -> expect at least four artifacts
+            self.assertGreaterEqual(len(os.listdir("executorlib_cache")), 4)
28-31: Speed up test without reducing signal.

A full second sleep slows CI. 100–200 ms is typically enough to exercise scheduling here.
-    sleep(1)
+    sleep(0.2)
171-190: Prefer public APIs over private members in tests.

The test reaches into _process and _check_process_helper, which is fragile. Assert via poll() and observable behavior instead.
-        process_id = interface_flux._process
-        interface_flux.shutdown(wait=True)
-        interface_flux._process = process_id
-        self.assertFalse(interface_flux.poll())
-        self.assertFalse(interface_flux._check_process_helper(command_lst=["sleep", "1"]))
+        interface_flux.shutdown(wait=True)
+        self.assertFalse(interface_flux.poll())

tests/test_standalone_interactive_backend.py (1)

138-140: Make the failure assertion backend-agnostic.

On some systems SLURM binaries exist but submission fails, raising subprocess.CalledProcessError instead of FileNotFoundError.
-        with self.assertRaises(FileNotFoundError):
+        import subprocess
+        with self.assertRaises((FileNotFoundError, subprocess.CalledProcessError)):
             interface_slurm.bootup(command_lst=["sleep", "1"])

executorlib/task_scheduler/interactive/spawner_pysqa.py (4)

129-138: Add actionable error messages for Flux constraints.

Bare ValueError() hinders debugging.

-            if self._num_nodes is not None:
-                raise ValueError()
-            if self._threads_per_core > 1:
-                raise ValueError()
-            if self._gpus_per_core > 0:
-                raise ValueError()
-            if self._exclusive:
-                raise ValueError()
-            if self._openmpi_oversubscribe:
-                raise ValueError()
+            if self._num_nodes is not None:
+                raise ValueError("Flux backend: num_nodes is not supported when cores>1")
+            if self._threads_per_core > 1:
+                raise ValueError("Flux backend: threads_per_core>1 is not supported")
+            if self._gpus_per_core > 0:
+                raise ValueError("Flux backend: gpus_per_core>0 is not supported")
+            if self._exclusive:
+                raise ValueError("Flux backend: exclusive allocation is not supported")
+            if self._openmpi_oversubscribe:
+                raise ValueError("Flux backend: oversubscribe is not supported")

140-142: Fix misleading error message for multi-core without backend.

Backend cannot be None when cores > 1.

-            raise ValueError(
-                f"backend should be None, slurm or flux, not {self._backend}"
-            )
+            raise ValueError(
+                f"When cores>1, backend must be 'slurm' or 'flux'; got {self._backend!r}"
+            )

147-160: Silence unused-arg warning for wait or implement waiting.

Currently unused; keep interface but declare intent.

     def shutdown(self, wait: bool = True):
         """
         Method to shutdown the subprocess interface.
@@
-        if self._process is not None:
+        # 'wait' is intentionally unused for queue-backed jobs
+        if self._process is not None:

Alternatively:

-    def shutdown(self, wait: bool = True):
+    def shutdown(self, wait: bool = True):  # noqa: ARG002

90-98: Consider a bootup timeout.

An endless loop can hang indefinitely without a stop_function. Expose a startup_timeout or retry/backoff.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d93e777 and 5f7b676.

📒 Files selected for processing (6)

executorlib/executor/flux.py (1 hunks)
executorlib/executor/slurm.py (1 hunks)
executorlib/task_scheduler/interactive/spawner_pysqa.py (1 hunks)
tests/test_fluxclusterexecutor.py (5 hunks)
tests/test_slurmclusterexecutor.py (2 hunks)
tests/test_standalone_interactive_backend.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

tests/test_slurmclusterexecutor.py (1)

executorlib/executor/slurm.py (1)

SlurmClusterExecutor (22-229)

executorlib/task_scheduler/interactive/spawner_pysqa.py (4)

executorlib/standalone/inputcheck.py (1)

validate_number_of_cores (176-200)

executorlib/standalone/interactive/spawner.py (1)

BaseSpawner (9-64)

executorlib/standalone/scheduler.py (2)

pysqa_execute_command (32-65)

terminate_with_pysqa (8-29)

executorlib/task_scheduler/interactive/blockallocation.py (1)

BlockAllocationTaskScheduler (29-197)

tests/test_fluxclusterexecutor.py (1)

executorlib/task_scheduler/interactive/spawner_pysqa.py (2)

PysqaSpawner (16-208)

_check_process_helper (190-205)

executorlib/executor/slurm.py (2)

executorlib/task_scheduler/interactive/spawner_pysqa.py (1)

create_pysqa_block_allocation_scheduler (211-248)

executorlib/task_scheduler/file/task_scheduler.py (1)

create_file_executor (85-131)

tests/test_standalone_interactive_backend.py (1)

executorlib/task_scheduler/interactive/spawner_pysqa.py (3)

PysqaSpawner (16-208)

create_pysqa_block_allocation_scheduler (211-248)

_check_process_helper (190-205)

executorlib/executor/flux.py (6)

executorlib/task_scheduler/interactive/spawner_pysqa.py (1)

create_pysqa_block_allocation_scheduler (211-248)

executorlib/task_scheduler/interactive/blockallocation.py (2)

max_workers (95-96)

max_workers (99-121)

executorlib/task_scheduler/interactive/dependency.py (2)

max_workers (87-95)

max_workers (98-111)

executorlib/executor/base.py (2)

max_workers (26-27)

max_workers (30-31)

executorlib/task_scheduler/base.py (2)

max_workers (36-37)

max_workers (40-41)

executorlib/task_scheduler/file/task_scheduler.py (1)

create_file_executor (85-131)

🪛 Ruff (0.12.2)

executorlib/task_scheduler/interactive/spawner_pysqa.py

140-142: Avoid specifying long messages outside the exception class

(TRY003)

147-147: Unused method argument: wait

(ARG002)

178-178: Probable use of insecure hash functions in hashlib: md5

(S324)

198-200: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: notebooks_integration

🔇 Additional comments (2)

executorlib/executor/slurm.py (1)

169-187: Block-allocation via PySQA path looks correct; verify default cache_directory semantics.

create_pysqa_block_allocation_scheduler forces cache_directory to os.path.abspath(".") when None, which may diverge from the documented default "executorlib_cache". Please confirm intended behavior across Slurm/Flux so users don’t see different defaults depending on block_allocation.

executorlib/executor/flux.py (1)

361-379: Flux block-allocation via PySQA path is consistent with Slurm; re-check default cache_directory.

Same note as for Slurm: create_pysqa_block_allocation_scheduler defaults to "." when cache_directory is None. Ensure this matches documented defaults and user expectations.

coderabbitai · 2025-09-08T11:04:40Z

executorlib/task_scheduler/interactive/spawner_pysqa.py

+class PysqaSpawner(BaseSpawner):
+    def __init__(
+        self,
+        cwd: Optional[str] = None,
+        cores: int = 1,
+        threads_per_core: int = 1,
+        gpus_per_core: int = 0,
+        num_nodes: Optional[int] = None,
+        exclusive: bool = False,
+        openmpi_oversubscribe: bool = False,
+        slurm_cmd_args: Optional[list[str]] = None,
+        pmi_mode: Optional[str] = None,
+        config_directory: Optional[str] = None,
+        backend: Optional[str] = None,
+        **kwargs,
+    ):


🛠️ Refactor suggestion

Plumb cache_directory and worker_id into the spawner; don’t forward them to PySQA.

Ensures the worker dir logic can use these values and avoids leaking non-PySQA keys to submit_job.

class PysqaSpawner(BaseSpawner): def __init__( self, cwd: Optional[str] = None, cores: int = 1, threads_per_core: int = 1, gpus_per_core: int = 0, num_nodes: Optional[int] = None, exclusive: bool = False, openmpi_oversubscribe: bool = False, slurm_cmd_args: Optional[list[str]] = None, pmi_mode: Optional[str] = None, config_directory: Optional[str] = None, backend: Optional[str] = None, + cache_directory: Optional[str] = None, **kwargs, ): @@ - self._pysqa_submission_kwargs = kwargs + # Internal/non-PySQA fields + self._worker_id = kwargs.pop("worker_id", None) + # Accept via explicit arg or kwargs for BC + self._cache_directory = cache_directory or kwargs.pop("cache_directory", None) + # Remaining kwargs are forwarded to PySQA submit_job + self._pysqa_submission_kwargs = kwargs

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class PysqaSpawner(BaseSpawner):

def __init__(

self,

cwd: Optional[str] = None,

cores: int = 1,

threads_per_core: int = 1,

gpus_per_core: int = 0,

num_nodes: Optional[int] = None,

exclusive: bool = False,

openmpi_oversubscribe: bool = False,

slurm_cmd_args: Optional[list[str]] = None,

pmi_mode: Optional[str] = None,

config_directory: Optional[str] = None,

backend: Optional[str] = None,

**kwargs,

):

class PysqaSpawner(BaseSpawner):

def __init__(

self,

cwd: Optional[str] = None,

cores: int = 1,

threads_per_core: int = 1,

gpus_per_core: int = 0,

num_nodes: Optional[int] = None,

exclusive: bool = False,

openmpi_oversubscribe: bool = False,

slurm_cmd_args: Optional[list[str]] = None,

pmi_mode: Optional[str] = None,

config_directory: Optional[str] = None,

backend: Optional[str] = None,

cache_directory: Optional[str] = None,

**kwargs,

):

# … (other initialization, e.g. self.cwd = cwd, etc.)

# Internal/non-PySQA fields

self._worker_id = kwargs.pop("worker_id", None)

# Accept via explicit arg or kwargs for BC

self._cache_directory = cache_directory or kwargs.pop("cache_directory", None)

# Remaining kwargs are forwarded to PySQA submit_job

self._pysqa_submission_kwargs = kwargs

🤖 Prompt for AI Agents

In executorlib/task_scheduler/interactive/spawner_pysqa.py around lines 16 to 31, the PysqaSpawner __init__ must accept and store cache_directory and worker_id for use by the worker directory logic and must not forward these non-PySQA keys to PySQA submit_job; add optional parameters cache_directory: Optional[str] = None and worker_id: Optional[str] = None to the constructor signature, assign them to self.cache_directory and self.worker_id, and ensure any call site that forwards kwargs to PySQA strips these keys from the kwargs (or builds a new dict without them) before calling submit_job so only valid PySQA arguments are passed.

coderabbitai · 2025-09-08T11:04:41Z

executorlib/task_scheduler/interactive/spawner_pysqa.py

+    def _start_process_helper(
+        self, command_lst: list[str], queue_adapter: QueueAdapter
+    ) -> int:
+        hash = hashlib.md5(str(self).encode()).hexdigest()
+        if self._cwd is not None:
+            working_directory = os.path.join(self._cwd, hash)
+        else:
+            working_directory = os.path.abspath(hash)
+        return queue_adapter.submit_job(
+            command=" ".join(self.generate_command(command_lst=command_lst)),
+            working_directory=working_directory,
+            cores=int(self._cores * self._threads_per_core),
+            **self._pysqa_submission_kwargs,
+        )


🛠️ Refactor suggestion

Stop hashing into cwd; derive working_directory from cache_directory + worker_id (addresses open TODO).

Avoids insecure md5 and shadowing built-in hash.

Produces stable per-worker directories under the cache dir as requested in the PR checklist.

- def _start_process_helper( - self, command_lst: list[str], queue_adapter: QueueAdapter - ) -> int: - hash = hashlib.md5(str(self).encode()).hexdigest() - if self._cwd is not None: - working_directory = os.path.join(self._cwd, hash) - else: - working_directory = os.path.abspath(hash) + def _start_process_helper( + self, command_lst: list[str], queue_adapter: QueueAdapter + ) -> int: + # Prefer cache_directory + worker_id; fall back to a short blake2 hash + base_dir = ( + self._cache_directory + or (self._cwd if self._cwd is not None else os.path.abspath(".")) + ) + if getattr(self, "_worker_id", None) is not None: + worker_dir = f"worker_{self._worker_id}" + else: + # stable-ish fallback without using insecure md5 + worker_dir = "worker_" + hashlib.blake2s( + repr(self).encode(), digest_size=8 + ).hexdigest() + working_directory = os.path.join(base_dir, worker_dir) + os.makedirs(working_directory, exist_ok=True) return queue_adapter.submit_job( command=" ".join(self.generate_command(command_lst=command_lst)), working_directory=working_directory, cores=int(self._cores * self._threads_per_core), **self._pysqa_submission_kwargs, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _start_process_helper(

self, command_lst: list[str], queue_adapter: QueueAdapter

) -> int:

hash = hashlib.md5(str(self).encode()).hexdigest()

if self._cwd is not None:

working_directory = os.path.join(self._cwd, hash)

else:

working_directory = os.path.abspath(hash)

return queue_adapter.submit_job(

command=" ".join(self.generate_command(command_lst=command_lst)),

working_directory=working_directory,

cores=int(self._cores * self._threads_per_core),

**self._pysqa_submission_kwargs,

)

def _start_process_helper(

self, command_lst: list[str], queue_adapter: QueueAdapter

) -> int:

# Prefer cache_directory + worker_id; fall back to a short blake2 hash

base_dir = (

self._cache_directory

or (self._cwd if self._cwd is not None else os.path.abspath("."))

)

if getattr(self, "_worker_id", None) is not None:

worker_dir = f"worker_{self._worker_id}"

else:

# stable-ish fallback without using insecure md5

worker_dir = "worker_" + hashlib.blake2s(

repr(self).encode(), digest_size=8

).hexdigest()

working_directory = os.path.join(base_dir, worker_dir)

os.makedirs(working_directory, exist_ok=True)

return queue_adapter.submit_job(

command=" ".join(self.generate_command(command_lst=command_lst)),

working_directory=working_directory,

cores=int(self._cores * self._threads_per_core),

**self._pysqa_submission_kwargs,

)

🧰 Tools

🪛 Ruff (0.12.2)

178-178: Probable use of insecure hash functions in hashlib: md5

(S324)

🤖 Prompt for AI Agents

In executorlib/task_scheduler/interactive/spawner_pysqa.py around lines 175-188, replace the current MD5-based working_directory logic (which shadows the built-in name `hash` and hashes self) with a stable per-worker path derived from the configured cache directory and the worker id: build working_directory = os.path.join(self._cache_directory, str(self._worker_id)) (or use the existing cache dir attribute name if different), ensure the cache directory exists (os.makedirs(..., exist_ok=True)), avoid using hashlib/md5 and do not shadow built-ins, and keep the rest of the queue_adapter.submit_job call unchanged so each worker uses a persistent, predictable subdirectory under the cache dir.

coderabbitai · 2025-09-08T11:04:41Z

tests/test_slurmclusterexecutor.py

+try:
+    import pysqa
+
+    skip_pysqa_test = False
+except ImportError:
+    skip_pysqa_test = True
+


⚠️ Potential issue

Good: explicit PySQA gating flag. Also gate earlier tests that rely on PySQA.

The TestCacheExecutorPysqa class (Lines 50-54) relies on the file backend, which needs PySQA. Add skip_pysqa_test to that skipIf to avoid ImportError when PySQA isn’t installed.

Proposed change outside this hunk (around Line 50):

@unittest.skipIf( skip_slurm_test or skip_mpi4py_test or skip_h5py_test or skip_pysqa_test, "h5py or mpi4py or SLURM or pysqa are not installed; tests skipped.", )

🤖 Prompt for AI Agents

In tests/test_slurmclusterexecutor.py around lines 50 to 54, the TestCacheExecutorPysqa class skip decorator needs to include the earlier defined skip_pysqa_test flag so tests that depend on PySQA are skipped when it’s not installed; update the @unittest.skipIf condition to add skip_pysqa_test to the OR list and adjust the skip message to mention pysqa (e.g., "h5py or mpi4py or SLURM or pysqa are not installed; tests skipped.").

coderabbitai · 2025-09-08T11:04:41Z

tests/test_slurmclusterexecutor.py

+@unittest.skipIf(skip_pysqa_test, "pysqa is not installed, so the pysqa tests are skipped.")
+class TestSlurmClusterInit(unittest.TestCase):
+    def test_slurm_cluster_block_allocation(self):
+        with self.assertRaises(ValueError):
+            SlurmClusterExecutor(block_allocation=True)
+
+    def test_slurm_cluster_file(self):
+        self.assertTrue(SlurmClusterExecutor(block_allocation=False))


💡 Verification agent

🧩 Analysis chain

Fix test: block_allocation no longer raises ValueError when PySQA is available.

The code now instantiates a PySQA-backed BlockAllocationTaskScheduler; expecting ValueError is incorrect and will fail when PySQA is installed. Assert successful construction instead. Also, make the “file” test assert type rather than truthiness.

@unittest.skipIf(skip_pysqa_test, "pysqa is not installed, so the pysqa tests are skipped.") class TestSlurmClusterInit(unittest.TestCase): def test_slurm_cluster_block_allocation(self): - with self.assertRaises(ValueError): - SlurmClusterExecutor(block_allocation=True) + exe = SlurmClusterExecutor(block_allocation=True) + self.assertIsInstance(exe, SlurmClusterExecutor) def test_slurm_cluster_file(self): - self.assertTrue(SlurmClusterExecutor(block_allocation=False)) + exe = SlurmClusterExecutor(block_allocation=False) + self.assertIsInstance(exe, SlurmClusterExecutor)

If you specifically want to test behavior when PySQA is missing, add a separate test guarded by skipUnless(skip_pysqa_test) and assert ImportError on construction (since the import happens before any ValueError).

🏁 Script executed:

#!/bin/bash set -e # Locate the SlurmClusterExecutor class definition rg -nP --type=py 'class\s+SlurmClusterExecutor' -C5 # Inspect its __init__ method for block_allocation handling rg -nP --type=py 'def\s+__init__\s*\(.*block_allocation' -C5 # Search for all occurrences of block_allocation within this class rg -nP --type=py --context 5 'block_allocation' --no-heading

Length of output: 925

🏁 Script executed:

#!/bin/bash set -e # Show the beginning of the SlurmClusterExecutor implementation sed -n '1,200p' executorlib/executor/slurm.py # Find any __init__ definitions in slurm.py rg -nP --type=py 'def\s+__init__' -n executorlib/executor/slurm.py -C3 # Search for block_allocation occurrences across the repository rg -nP --type=py 'block_allocation' -n -C3

Length of output: 93539

Update tests in tests/test_slurmclusterexecutor.py to reflect PySQA-backed behavior.

In test_slurm_cluster_block_allocation, replace the ValueError assertion with:
exe = SlurmClusterExecutor(block_allocation=True) self.assertIsInstance(exe, SlurmClusterExecutor)

In test_slurm_cluster_file, replace assertTrue(...) with:
exe = SlurmClusterExecutor(block_allocation=False) self.assertIsInstance(exe, SlurmClusterExecutor)

Add a new test guarded by @unittest.skipUnless(skip_pysqa_test, ...) that verifies SlurmClusterExecutor(block_allocation=True) raises ImportError when PySQA is unavailable.

🤖 Prompt for AI Agents

tests/test_slurmclusterexecutor.py lines 120-127: update the tests to reflect PySQA-backed behavior by replacing the ValueError expectation with creating the executor and asserting its type (exe = SlurmClusterExecutor(block_allocation=True); self.assertIsInstance(exe, SlurmClusterExecutor)), replace the assertTrue call with the same pattern for block_allocation=False, and add a new test function decorated with @unittest.skipUnless(skip_pysqa_test, "pysqa is not installed, so the pysqa tests are skipped.") that asserts SlurmClusterExecutor(block_allocation=True) raises ImportError when PySQA is unavailable.

jan-janssen added 9 commits July 27, 2025 08:36

Move scheduler to standalone

a72c2b0

Merge remote-tracking branch 'refs/remotes/origin/main' into worker

cbd8aa6

# Conflicts: # executorlib/standalone/scheduler.py # executorlib/task_scheduler/file/queue_spawner.py

fix subprocess spawner docstring

2ad819e

file executor fix parallel execution

fc5f199

add command tests

293adc3

move slurm command to standalone

07e8409

implement spawner for pysqa

1220868

Merge remote-tracking branch 'origin/main' into worker

7dbf1ed

# Conflicts: # executorlib/standalone/command.py # executorlib/task_scheduler/file/shared.py # executorlib/task_scheduler/interactive/slurmspawner.py # tests/test_standalone_command.py

transfer changes

62c4c91

jan-janssen marked this pull request as draft July 27, 2025 21:14

pyiron-runner and others added 2 commits July 27, 2025 21:15

Format black

7786585

[pre-commit.ci] auto fixes from pre-commit.com hooks

36b0b47

for more information, see https://pre-commit.ci

jan-janssen and others added 6 commits July 28, 2025 09:06

Merge branch 'main' into worker

90ac694

Merge commit '30f1c5d2e522973dda9f286f75d8a3f6e0ac4cba' into worker

e5ddb24

block_allocation

1cc7044

Merge commit '83653a350296f9b184842a19492c8980adc8402d' into worker

b42dd06

Format black

02f0ce7

fix type hint

2804562

jan-janssen and others added 9 commits August 19, 2025 11:52

implement additional options for SLURM

6fb86f7

Format black

ff10b0d

fixes

38e0220

[pre-commit.ci] auto fixes from pre-commit.com hooks

7138878

for more information, see https://pre-commit.ci

Merge commit 'b3cf4c8985316f853cfc6c90adff5feff337f7f7' into worker

88d623b

add test for flux block allocation

a9c4c68

fixes

0e60b28

more fixes

faf4c50

fixes

4bd0001

jan-janssen and others added 25 commits August 31, 2025 20:44

resort

72da39d

remove unused statement

60e2dee

rename variable

dbf3e65

[pre-commit.ci] auto fixes from pre-commit.com hooks

6b73ab7

for more information, see https://pre-commit.ci

Update shared.py

fd9b630

Update blockallocation.py

b60d3a2

[pre-commit.ci] auto fixes from pre-commit.com hooks

c27713f

for more information, see https://pre-commit.ci

Add docstrings

a647574

test for generate_command()

4e097b6

[pre-commit.ci] auto fixes from pre-commit.com hooks

1c0148a

for more information, see https://pre-commit.ci

Add more tests

62d4898

smaller tests

dba3b48

submit a big job

ad4e45c

extend tests

6248561

no command

1e2d21c

remove error test

10db91c

extend tests

8daa42c

change error name

77ae767

check more errors

54c5a23

clean up

676b4ec

extend tests

9b497d2

more tests

6c624bd

validate initialization

6c02845

fix test

5f7b676

jan-janssen marked this pull request as ready for review September 8, 2025 10:56

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

jan-janssen merged commit 9bacadf into main Sep 8, 2025
114 of 119 checks passed

jan-janssen deleted the worker branch September 8, 2025 11:40

jan-janssen mentioned this pull request Sep 8, 2025

[Documentation] Worker for executing multiple Python functions #696

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Worker for overflow queue #763

Worker for overflow queue #763

Uh oh!

jan-janssen commented Jul 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 27, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 27, 2025 •

edited

Loading

Uh oh!

jan-janssen commented Aug 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 8, 2025

Uh oh!

coderabbitai bot Sep 8, 2025

Uh oh!

coderabbitai bot Sep 8, 2025

Uh oh!

coderabbitai bot Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Worker for overflow queue #763

Worker for overflow queue #763

Uh oh!

Conversation

jan-janssen commented Jul 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jan-janssen commented Aug 19, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jan-janssen commented Jul 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 27, 2025 •

edited

Loading

codecov bot commented Jul 27, 2025 •

edited

Loading