[forge] include some documentation pages by kpouget · Pull Request #16 · openshift-psap/forge

kpouget · 2026-04-07T13:10:35Z

Summary by CodeRabbit

Documentation
- Added comprehensive CI and CLI orchestration guides covering discovery, workflows, logging, PR integration, artifact layout, and troubleshooting.
New Features
- Job submit-and-wait tool: generates manifests, submits FOURNOS jobs, polls completion, captures logs/status, and writes artifacts.
Improvements
- CI entrypoints now require executability to be detected.
- CI metadata consolidated under a dedicated metadata folder in artifacts.
- Richer task/subprocess error reporting and structured CI failure summaries.
- Default namespace configured for job submissions.

openshift-ci · 2026-04-07T13:10:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mml-coder for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-07T13:10:42Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds CI/CLI orchestration docs; requires CI scripts be user-executable; relocates CI metadata and variable overrides to 000__ci_metadata/; introduces a FOURNOS submit-and-wait pipeline that creates manifests, submits FOURNOS jobs via oc, polls for completion, collects logs/status, and always cleans up; improves DSL/task and subprocess error logging.

Changes

Cohort / File(s)	Summary
Documentation `docs/ci/run_ci.md`, `docs/orchestration/ci.md`, `docs/orchestration/cli.md`	Added comprehensive docs for CI/CLI entrypoints, orchestration patterns, environment/setup, logging, PR integration, artifacts layout, and troubleshooting.
CI script discovery `projects/core/ci_entrypoint/run_ci.py`	`find_ci_script()` now only returns scripts that both exist and are user-executable (`os.X_OK`), changing which orchestration scripts are considered runnable.
CI metadata & overrides `projects/core/ci_entrypoint/prepare_ci.py`, `projects/core/library/config.py`, `projects/core/notifications/send.py`	CI metadata dir changed to `000__ci_metadata/` (new constant) and default `VARIABLE_OVERRIDES_FILENAME` updated to `000__ci_metadata/variable_overrides.yaml`; notification discovery updated to match.
DSL/task & subprocess logging `projects/core/dsl/runtime.py`, `projects/core/dsl/task.py`, `projects/core/dsl/shell.py`	Added `TaskExecutionError`, improved task failure context and logging (task name/description/args), moved execution.log to top-level ARTIFACT_DIR, and enhanced subprocess CalledProcessError messages with exit code and trimmed stdout/stderr.
Fournos submission wiring `projects/fournos_launcher/orchestration/submit.py`, `projects/fournos_launcher/orchestration/ci.py`, `projects/fournos_launcher/orchestration/cli.py`	Renamed `submit()` → `submit_job()` and updated CLI/CI commands to invoke `submit_job()`; submission now delegates to the new submit-and-wait pipeline, preserving KUBECONFIG handling.
Fournos config `projects/fournos_launcher/orchestration/config.yaml`	Added top-level `fournos` block with `namespace: psap-automation`.
New submit-and-wait pipeline `projects/fournos_launcher/toolbox/submit_and_wait/main.py`	Added executable pipeline implementing tasks: validate inputs, generate job name, ensure `oc`, create manifest, `oc apply` submit, poll job phase with retries/backoff, retrieve logs, capture final status YAML, and always cleanup manifest; includes CLI `main()` and `run(...)` entry.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as "CLI / run()"
  participant Pipeline as "submit_and_wait.run()"
  participant FS as "Artifacts (ARTIFACT_DIR)"
  participant OC as "Kubernetes (`oc`)"
  participant Logs as "Job Logs"

  CLI->>Pipeline: invoke run(cluster, project, args...)
  Pipeline->>FS: write job manifest (manifest_file)
  Pipeline->>OC: oc apply -f manifest_file
  OC-->>Pipeline: accepted
  loop poll
    Pipeline->>OC: oc get fournosjob ... (phase)
    OC-->>Pipeline: phase (Pending/Running/Completed/Failed)
  end
  Pipeline->>OC: oc logs -l "fournos.job=<name>"
  OC-->>Logs: job logs (or empty)
  Pipeline->>FS: save logs and final status YAML
  Pipeline->>FS: cleanup manifest file (always)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[Fournos Laucher] Rename project from [Fournos] #15 — Overlaps CI entrypoint discovery and CI metadata handling changes touching run_ci.py and related prepare_ci updates.
[llmd] Add a stub for the project #6 — Related changes to VARIABLE_OVERRIDES_FILENAME and CI metadata placement affecting config/overrides loading.
[Fournos_Launcher] New project to launch FORGE jobs on FOURNOS #12 — Similar updates to CI metadata/artifact locations and notification discovery paths.

Poem

🐰
I hopped through docs by lantern light,
Scripts must be executable to take flight,
000__ci_metadata/ keeps overrides snug and dry,
Jobs submit, wait, log — then flutter by,
A carrot cheer for artifacts piled high! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title describes adding documentation pages, but the changeset includes significant code modifications beyond documentation (changes to run_ci.py, config.py, cli.py, submit.py, runtime.py, shell.py, task.py, ci.py, etc.).	Revise the title to accurately reflect that this PR adds documentation AND makes implementation changes to CI/orchestration systems (e.g., '[forge] add CI/orchestration documentation and refactor metadata paths and job submission').

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kpouget · 2026-04-07T13:23:02Z

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

topsail-bot · 2026-04-07T13:23:47Z

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 00 minutes 02 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

docs/orchestration/ci.md (1)

34-38: Command format inconsistent with other documentation.

The run_ci command here doesn't match the format in docs/ci/run_ci.md which uses ./run_ci.py. Consider aligning these examples:
 or from the CI launcher:
 
 ```bash
-run_ci llm_d ci pre-cleanup
-run_ci llm_d ci prepare
-run_ci llm_d ci test
+./run_ci.py llm_d ci pre-cleanup
+./run_ci.py llm_d ci prepare
+./run_ci.py llm_d ci test
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/orchestration/ci.md around lines 34 - 38, Update the examples in
docs/orchestration/ci.md to match the other docs by replacing the bare run_ci
invocations with the script form ./run_ci.py; specifically change occurrences of
run_ci llm_d ci pre-cleanup, run_ci llm_d ci prepare, and run_ci llm_d ci test
to use ./run_ci.py so they align with docs/ci/run_ci.md and the project's
canonical command format.
</details>

</blockquote></details>
<details>
<summary>projects/core/ci_entrypoint/run_ci.py (1)</summary><blockquote>

`199-200`: **Good addition of executable check, but error message could be clearer.**

The executable permission check aligns with the filtering in `show_project_operations` (lines 275-276) and is consistent with the documentation. However, when a script exists but isn't executable, the error message at line 396 says "Expected: {project_dir}/orchestration/{operation}.py" without mentioning the executable requirement, which could confuse users.



<details>
<summary>💡 Suggested improvement for the error message</summary>

```diff
         click.echo(
             click.style(
                 f"❌ ERROR: No CI script found for project '{project}' operation '{operation}'.",
                 fg='red'
             ),
             err=True
         )
-        click.echo(f"🔍 Expected: {project_dir}/orchestration/{operation}.py")
+        click.echo(f"🔍 Expected: {project_dir}/orchestration/{operation}.py (must be executable)")
         sys.exit(1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/core/ci_entrypoint/run_ci.py` around lines 199 - 200, The error
message raised when a script exists but isn't executable is unclear; update the
message emitted where you validate script_path (the branch that checks
script_path.exists() and os.access(script_path, os.X_OK)) to explicitly state
that the file must be executable and show the full expected path (e.g.,
"Expected executable: {project_dir}/orchestration/{operation}.py (file exists
but is not executable)"), so callers see both the path and the
executable-permission requirement; adjust the string emitted at the current
error location that currently reads "Expected:
{project_dir}/orchestration/{operation}.py" to this clearer, permission-aware
wording.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/ci/run_ci.md`:
- Around line 162-165: Update the documentation's project aliasing section to
match the actual mapping implemented in run_ci.py: change the described alias
from "forge → fournos" to reflect the code that maps "fournos" to
"fournos_launcher" (the if block checking project == "fournos" and setting
project = "fournos_launcher"), and also remove or correct any mention of "forge"
if it is not used; ensure the docs text and bullet list consistently state
"fournos → fournos_launcher" (and adjust related environment-specific/ artifact
management phrasing if it referenced the old alias).
- Line 15: Replace the redundant phrase "CLI interface" in the docs entry for
run_ci.py by changing the description for run_ci.py to use either "CLI" or the
expanded "Command Line Interface" (e.g., "Main entrypoint script offering a
unified CLI for CI operations" or "Main entrypoint script offering a unified
Command Line Interface for CI operations"); update the line referencing
run_ci.py accordingly.

In `@docs/orchestration/ci.md`:
- Line 9: The doc uses two different phase names—"cleanup" and "pre_cleanup"—so
pick one canonical phase name (e.g., pre_cleanup) and update every occurrence in
the CI docs and examples to match; specifically, rename the phase listed in the
Commands for CI phases section (the "cleanup" entry) to the chosen canonical
name, and update any headings, example commands, and referenced pipeline stage
names in the same document to use that exact identifier so all mentions (e.g.,
prepare, test, pre_cleanup) are consistent.

In `@projects/core/library/config.py`:
- Line 20: The constant VARIABLE_OVERRIDES_FILENAME was changed to
"_meta/variable_overrides.yaml", but code still hardcodes the old path (e.g.,
variable_overrides_file in projects/jump_ci/testing/test.py) and
projects/legacy/library/config.py defines the old constant, causing missed
overrides; update callers to use the canonical VARIABLE_OVERRIDES_FILENAME
constant instead of composing "variable_overrides.yaml" manually and update the
legacy module to export the same VARIABLE_OVERRIDES_FILENAME value (or import it
from projects/core/library/config.py) so all code reads ARTIFACT_DIR /
VARIABLE_OVERRIDES_FILENAME consistently.

---

Nitpick comments:
In `@docs/orchestration/ci.md`:
- Around line 34-38: Update the examples in docs/orchestration/ci.md to match
the other docs by replacing the bare run_ci invocations with the script form
./run_ci.py; specifically change occurrences of run_ci llm_d ci pre-cleanup,
run_ci llm_d ci prepare, and run_ci llm_d ci test to use ./run_ci.py so they
align with docs/ci/run_ci.md and the project's canonical command format.

In `@projects/core/ci_entrypoint/run_ci.py`:
- Around line 199-200: The error message raised when a script exists but isn't
executable is unclear; update the message emitted where you validate script_path
(the branch that checks script_path.exists() and os.access(script_path,
os.X_OK)) to explicitly state that the file must be executable and show the full
expected path (e.g., "Expected executable:
{project_dir}/orchestration/{operation}.py (file exists but is not
executable)"), so callers see both the path and the executable-permission
requirement; adjust the string emitted at the current error location that
currently reads "Expected: {project_dir}/orchestration/{operation}.py" to this
clearer, permission-aware wording.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 132470d2-94fe-40db-9134-a3d69ed12a7d

📥 Commits

Reviewing files that changed from the base of the PR and between c6d909e and ca5bc32.

📒 Files selected for processing (5)

docs/ci/run_ci.md
docs/orchestration/ci.md
docs/orchestration/cli.md
projects/core/ci_entrypoint/run_ci.py
projects/core/library/config.py

kpouget · 2026-04-07T14:34:08Z

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

topsail-bot · 2026-04-07T14:34:57Z

🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 01 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

Failure indicator:

/logs/artifacts/FAILURE | CI failed with exception:
Exception: BdbQuit: 

Full stack trace:
Traceback (most recent call last):
  File "/app/forge/projects/core/library/ci.py", line 50, in wrapper
    exit_code = command_func(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/forge/projects/fournos_launcher/orchestration/ci.py", line 37, in submit
    return submit_mod.submit_job()

[...]

kpouget · 2026-04-07T14:36:32Z

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

kpouget · 2026-04-07T14:42:58Z

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

topsail-bot · 2026-04-07T14:43:42Z

🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 02 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

Failure indicator:

/logs/artifacts/000__submit_and_wait/FAILURE | CalledProcessError: Command 'oc apply -f /logs/artifacts/000__submit_and_wait/src/forge-llm_d-20260407-144340-manifest.yaml' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/app/forge/projects/core/dsl/runtime.py", line 103, in execute_tasks
    raise execution_error
  File "/app/forge/projects/core/dsl/runtime.py", line 67, in execute_tasks
    _execute_single_task(task_info, args)
  File "/app/forge/projects/core/dsl/runtime.py", line 138, in _execute_single_task
    task_status["ret"] = task_func(args)
                         ^^^^^^^^^^^^^^^
  File "/app/forge/projects/core/dsl/task.py", line 72, in wrapper

[...]

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py`:
- Line 51: Remove the blocking debugger hook by deleting the "import pdb;
pdb.set_trace()" statement in main.py so non-interactive submission paths no
longer stop; also remove any unused "import pdb" left behind and run the
submission flow (or unit tests) to confirm tasks proceed without manual
intervention.
- Around line 138-142: The code currently echoes the kubectl/oc command instead
of executing it; update the shell.run invocation in submit_and_wait.main to run
the real apply command (replace f"echo oc apply -f {args.manifest_file}" with
f"oc apply -f {args.manifest_file}") so the FournosJob is actually created, keep
checking result.success and keep the RuntimeError with result.stderr if it fails
(references: shell.run, result, args.manifest_file, RuntimeError).
- Around line 83-87: Normalize args.final_job_name so it is a Kubernetes-safe
DNS-1123 label: import re, convert args.project to lowercase, replace
underscores and invalid characters with hyphens (e.g. re.sub(r'[^a-z0-9-]+',
'-', project_lower)), collapse multiple hyphens, trim leading/trailing hyphens,
and truncate the combined "forge-{project}-{timestamp}" to <=253 chars
(preferably keeping timestamp intact); apply this transformation when assigning
args.final_job_name in the block that currently sets it from
args.project/timestamp (referencing args.final_job_name, args.project, and
timestamp) and ensure you add the import re at the top of the file.
- Around line 147-184: The wait_for_job_completion function currently relies on
a fixed `@retry` decorator (120×30s) and ignores args.timeout_minutes and
args.poll_interval_seconds; replace the retry-based flow with an explicit loop
using time.monotonic(): compute a deadline = time.monotonic() +
args.timeout_minutes * 60, then loop while time.monotonic() < deadline calling
the same shell.run status check and the existing status handling
(Completed/Failed/Running/Pending/other), sleeping args.poll_interval_seconds
between iterations; if the deadline is exceeded raise a TimeoutError with a
clear message about args.final_job_name; keep the failure-detail lookup
(jsonpath .status.message) and existing logging behavior.
- Around line 187-215: Add the `@always` decorator to both artifact-collector
tasks so they run even if prior tasks fail: annotate retrieve_job_logs and
capture_final_job_status with `@always` (above their existing `@task` decorators)
and ensure the always decorator is imported from the task framework (e.g., add
the import for always alongside task if missing) so logs and final-status YAML
are collected regardless of earlier exceptions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8efc7105-c27a-445f-8bc8-bb42d091ba7b

📥 Commits

Reviewing files that changed from the base of the PR and between ca5bc32 and fb0346b.

📒 Files selected for processing (10)

docs/ci/run_ci.md
docs/orchestration/ci.md
docs/orchestration/cli.md
projects/core/ci_entrypoint/run_ci.py
projects/core/library/config.py
projects/fournos_launcher/orchestration/ci.py
projects/fournos_launcher/orchestration/cli.py
projects/fournos_launcher/orchestration/config.yaml
projects/fournos_launcher/orchestration/submit.py
projects/fournos_launcher/toolbox/submit_and_wait/main.py

✅ Files skipped from review due to trivial changes (4)

projects/fournos_launcher/orchestration/config.yaml
docs/orchestration/cli.md
docs/orchestration/ci.md
docs/ci/run_ci.md

🚧 Files skipped from review as they are similar to previous changes (2)

projects/core/ci_entrypoint/run_ci.py
projects/core/library/config.py

coderabbitai · 2026-04-07T14:46:13Z

+    if args.job_name:
+        args.final_job_name = args.job_name
+    else:
+        timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
+        args.final_job_name = f"forge-{args.project}-{timestamp}"


⚠️ Potential issue | 🔴 Critical

Normalize final_job_name to a Kubernetes-safe slug.

args.project is interpolated verbatim here. Values like llm_d produce forge-llm_d-..., which will be rejected as metadata.name.

Suggested fix

if args.job_name: args.final_job_name = args.job_name else: timestamp = datetime.now().strftime("%Y%m%d-%H%M%S") - args.final_job_name = f"forge-{args.project}-{timestamp}" + project_slug = re.sub(r"[^a-z0-9-]+", "-", args.project.lower()).strip("-") or "job" + args.final_job_name = f"forge-{project_slug}-{timestamp}"

# add with the other imports import re

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if args.job_name:

args.final_job_name = args.job_name

else:

timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")

args.final_job_name = f"forge-{args.project}-{timestamp}"

if args.job_name:

args.final_job_name = args.job_name

else:

timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")

project_slug = re.sub(r"[^a-z0-9-]+", "-", args.project.lower()).strip("-") or "job"

args.final_job_name = f"forge-{project_slug}-{timestamp}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 83 - 87, Normalize args.final_job_name so it is a Kubernetes-safe DNS-1123 label: import re, convert args.project to lowercase, replace underscores and invalid characters with hyphens (e.g. re.sub(r'[^a-z0-9-]+', '-', project_lower)), collapse multiple hyphens, trim leading/trailing hyphens, and truncate the combined "forge-{project}-{timestamp}" to <=253 chars (preferably keeping timestamp intact); apply this transformation when assigning args.final_job_name in the block that currently sets it from args.project/timestamp (referencing args.final_job_name, args.project, and timestamp) and ensure you add the import re at the top of the file.

coderabbitai · 2026-04-07T14:46:13Z

+@task
+@retry(attempts=120, delay=30, backoff=1.0)
+def wait_for_job_completion(args):
+    """Wait for FOURNOS job to complete"""
+
+    # Check job status
+    status_result = shell.run(
+        f'oc get fournosjob {args.final_job_name} -n {args.namespace} -o jsonpath="{{.status.phase}}"',
+        check=False,
+        log_stdout=False
+    )
+
+    if not status_result.success:
+        # Failed to get status, will retry
+        print(f"Failed to get job status, retrying...")
+        raise RuntimeError("Failed to get job status")
+
+    status = status_result.stdout.strip()
+
+    if status == "Completed":
+        return f"Job {args.final_job_name} completed successfully"
+    elif status == "Failed":
+        # Get failure details
+        failure_result = shell.run(
+            f'oc get fournosjob {args.final_job_name} -n {args.namespace} -o jsonpath="{{.status.message}}"',
+            check=False,
+            log_stdout=False
+        )
+        failure_msg = failure_result.stdout.strip() if failure_result.success else "Unknown failure"
+        raise RuntimeError(f"Job {args.final_job_name} failed: {failure_msg}")
+    elif status in ["Running", "Pending"]:
+        print(f"Job {args.final_job_name} status: {status}")
+        # Job still running, will retry
+        raise RuntimeError("Job still running")
+    else:
+        print(f"Job {args.final_job_name} status: {status}")
+        # Unknown status, will retry
+        raise RuntimeError(f"Unknown job status: {status}")


⚠️ Potential issue | 🟠 Major

Honor timeout_minutes and poll_interval_seconds in the wait loop.

The retry budget is fixed at 120 × 30s, so this stops after about an hour even though run() documents a 120-minute default. poll_interval_seconds is also unused.

Suggested direction

-@task -@retry(attempts=120, delay=30, backoff=1.0) +@task def wait_for_job_completion(args):

deadline = time.monotonic() + args.timeout_minutes * 60 while time.monotonic() < deadline: ... time.sleep(args.poll_interval_seconds) raise TimeoutError(f"Timed out waiting for job {args.final_job_name}")

🧰 Tools

🪛 Ruff (0.15.9)

[error] 161-161: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 147 - 184, The wait_for_job_completion function currently relies on a fixed `@retry` decorator (120×30s) and ignores args.timeout_minutes and args.poll_interval_seconds; replace the retry-based flow with an explicit loop using time.monotonic(): compute a deadline = time.monotonic() + args.timeout_minutes * 60, then loop while time.monotonic() < deadline calling the same shell.run status check and the existing status handling (Completed/Failed/Running/Pending/other), sleeping args.poll_interval_seconds between iterations; if the deadline is exceeded raise a TimeoutError with a clear message about args.final_job_name; keep the failure-detail lookup (jsonpath .status.message) and existing logging behavior.

coderabbitai · 2026-04-07T14:46:13Z

+@task
+def retrieve_job_logs(args):
+    """Retrieve and save job logs"""
+
+    # Get job logs
+    logs_result = shell.run(
+        f'oc logs -l "fournos.job={args.final_job_name}" -n {args.namespace} --all-containers=true',
+        check=False,
+        stdout_dest=args.artifact_dir / f"{args.final_job_name}-logs.txt"
+    )
+
+    if logs_result.success:
+        return f"Job logs saved to {args.final_job_name}-logs.txt"
+    else:
+        return "No logs available or failed to retrieve logs"
+
+
+@task
+def capture_final_job_status(args):
+    """Capture final job status and details"""
+
+    # Get full job details
+    shell.run(
+        f'oc get fournosjob {args.final_job_name} -n {args.namespace} -o yaml',
+        stdout_dest=args.artifact_dir / f"{args.final_job_name}-final-status.yaml",
+        check=False
+    )
+
+    return f"Final job status captured to {args.final_job_name}-final-status.yaml"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the files exist and understand the structure git ls-files | grep -E "(runtime\.py|main\.py)" | head -20

Repository: openshift-psap/forge

Length of output: 200

🏁 Script executed:

# Check the runtime.py file to understand task execution flow cat projects/core/dsl/runtime.py | head -150

Repository: openshift-psap/forge

Length of output: 5137

🏁 Script executed:

# Check the main.py file to see the current state of the code cat projects/fournos_launcher/toolbox/submit_and_wait/main.py | sed -n '180,220p'

Repository: openshift-psap/forge

Length of output: 1215

🏁 Script executed:

# Look for `@always` decorator definition rg "@always" --type py -B 2 -A 5

Repository: openshift-psap/forge

Length of output: 682

🏁 Script executed:

# Check the full task ordering in main.py to understand execution sequence rg "@task" projects/fournos_launcher/toolbox/submit_and_wait/main.py -B 1 -A 3

Repository: openshift-psap/forge

Length of output: 840

Add @always decorator to artifact collector tasks.

The task execution model stops normal tasks after the first exception. If wait_for_job_completion fails, retrieve_job_logs and capture_final_job_status never run, leaving no logs or final status for debugging. Both tasks should execute regardless of prior failures to ensure diagnostic data is captured.

Suggested fix

+@always `@task` def retrieve_job_logs(args): """Retrieve and save job logs"""

+@always `@task` def capture_final_job_status(args): """Capture final job status and details"""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@task

def retrieve_job_logs(args):

"""Retrieve and save job logs"""

# Get job logs

logs_result = shell.run(

f'oc logs -l "fournos.job={args.final_job_name}" -n {args.namespace} --all-containers=true',

check=False,

stdout_dest=args.artifact_dir / f"{args.final_job_name}-logs.txt"

)

if logs_result.success:

return f"Job logs saved to {args.final_job_name}-logs.txt"

else:

return "No logs available or failed to retrieve logs"

@task

def capture_final_job_status(args):

"""Capture final job status and details"""

# Get full job details

shell.run(

f'oc get fournosjob {args.final_job_name} -n {args.namespace} -o yaml',

stdout_dest=args.artifact_dir / f"{args.final_job_name}-final-status.yaml",

check=False

)

return f"Final job status captured to {args.final_job_name}-final-status.yaml"

`@always`

`@task`

def retrieve_job_logs(args):

"""Retrieve and save job logs"""

# Get job logs

logs_result = shell.run(

f'oc logs -l "fournos.job={args.final_job_name}" -n {args.namespace} --all-containers=true',

check=False,

stdout_dest=args.artifact_dir / f"{args.final_job_name}-logs.txt"

)

if logs_result.success:

return f"Job logs saved to {args.final_job_name}-logs.txt"

else:

return "No logs available or failed to retrieve logs"

`@always`

`@task`

def capture_final_job_status(args):

"""Capture final job status and details"""

# Get full job details

shell.run(

f'oc get fournosjob {args.final_job_name} -n {args.namespace} -o yaml',

stdout_dest=args.artifact_dir / f"{args.final_job_name}-final-status.yaml",

check=False

)

return f"Final job status captured to {args.final_job_name}-final-status.yaml"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 187 - 215, Add the `@always` decorator to both artifact-collector tasks so they run even if prior tasks fail: annotate retrieve_job_logs and capture_final_job_status with `@always` (above their existing `@task` decorators) and ensure the always decorator is imported from the task framework (e.g., add the import for always alongside task if missing) so logs and final-status YAML are collected regardless of earlier exceptions.

coderabbitai

🧹 Nitpick comments (4)

projects/fournos_launcher/toolbox/submit_and_wait/main.py (3)

160-160: Remove extraneous f prefix from string literal.

This f-string has no placeholders.

✨ Suggested fix

-        print(f"Failed to get job status, retrying...")
+        print("Failed to get job status, retrying...")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` at line 160, The
print call in main.py uses an unnecessary f-string without placeholders; locate
the print in the retry loop inside the submit_and_wait / get job status flow
(the print("Failed to get job status, retrying...") line) and remove the f
prefix so it becomes a plain string literal (replace print(f"...") with
print("...")) to avoid the redundant f-string.

245-247: Remove extraneous f prefixes from string literals.

These f-strings have no placeholders.

✨ Suggested fix

-        print(f"✅ FOURNOS job completed successfully")
+        print("✅ FOURNOS job completed successfully")
     except KeyboardInterrupt:
-        print(f"\n🚫 Operation interrupted by user")
+        print("\n🚫 Operation interrupted by user")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 245 -
247, Two print statements in submit_and_wait/main.py use unnecessary f-strings
with no placeholders; change print(f"✅ FOURNOS job completed successfully") and
print(f"\n🚫 Operation interrupted by user") to plain string literals (remove
the leading 'f') in the try/except block so they become print("✅ FOURNOS job
completed successfully") and print("\n🚫 Operation interrupted by user"); update
the prints where they appear in the submit_and_wait main flow (the
completion/KeyboardInterrupt handlers).

18-27: Use explicit | None type hints instead of implicit Optional.

PEP 484 prohibits implicit Optional. The args and variables_overrides parameters have default values of None but their type hints don't reflect this.

✨ Suggested fix

 def run(
     cluster_name: str,
     project: str,
-    args: list = None,
-    variables_overrides: dict = None,
+    args: list | None = None,
+    variables_overrides: dict | None = None,
     job_name: str = "",
     namespace: str = "fournos-jobs",
     timeout_minutes: int = 120,
     poll_interval_seconds: int = 30
 ):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 18 -
27, The function signature for run uses default None for args and
variables_overrides but their type hints are implicit; update the signature to
use explicit None union types—change args: list to args: list | None = None and
variables_overrides: dict to variables_overrides: dict | None = None
(referencing the run function parameters args and variables_overrides) so the
annotations comply with PEP 484/PEP 604; no other logic changes needed.

projects/fournos_launcher/orchestration/submit.py (1)

33-46: Consider handling exceptions from submit_and_wait.

If submit_and_wait() raises an exception, the function will never return 0 and the exception will propagate. This may be intentional (letting the caller handle it), but the unconditional return 0 could mislead readers into thinking success is guaranteed.

Also, the f-string on line 34 has no placeholders.
✨ Suggested fix for f-string
 def submit_job():
-    logger.warning(f"Hello Fournos")
+    logger.warning("Hello Fournos")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/orchestration/submit.py` around lines 33 - 46, The
submit_job function currently logs a placeholder f-string and unconditionally
returns 0 while calling submit_and_wait; update submit_job to (1) change
logger.warning(f"Hello Fournos") to a normal string logger.warning("Hello
Fournos") since there are no placeholders, and (2) wrap the submit_and_wait(...)
call in a try/except around submit_job so that exceptions from submit_and_wait
are caught, logged (include the exception object), and cause submit_job to
return a non-zero status (e.g., return 1) or re-raise after logging — reference
submit_job and submit_and_wait to find the call site and logging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@projects/fournos_launcher/orchestration/submit.py`:
- Around line 33-46: The submit_job function currently logs a placeholder
f-string and unconditionally returns 0 while calling submit_and_wait; update
submit_job to (1) change logger.warning(f"Hello Fournos") to a normal string
logger.warning("Hello Fournos") since there are no placeholders, and (2) wrap
the submit_and_wait(...) call in a try/except around submit_job so that
exceptions from submit_and_wait are caught, logged (include the exception
object), and cause submit_job to return a non-zero status (e.g., return 1) or
re-raise after logging — reference submit_job and submit_and_wait to find the
call site and logging.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py`:
- Line 160: The print call in main.py uses an unnecessary f-string without
placeholders; locate the print in the retry loop inside the submit_and_wait /
get job status flow (the print("Failed to get job status, retrying...") line)
and remove the f prefix so it becomes a plain string literal (replace
print(f"...") with print("...")) to avoid the redundant f-string.
- Around line 245-247: Two print statements in submit_and_wait/main.py use
unnecessary f-strings with no placeholders; change print(f"✅ FOURNOS job
completed successfully") and print(f"\n🚫 Operation interrupted by user") to
plain string literals (remove the leading 'f') in the try/except block so they
become print("✅ FOURNOS job completed successfully") and print("\n🚫 Operation
interrupted by user"); update the prints where they appear in the
submit_and_wait main flow (the completion/KeyboardInterrupt handlers).
- Around line 18-27: The function signature for run uses default None for args
and variables_overrides but their type hints are implicit; update the signature
to use explicit None union types—change args: list to args: list | None = None
and variables_overrides: dict to variables_overrides: dict | None = None
(referencing the run function parameters args and variables_overrides) so the
annotations comply with PEP 484/PEP 604; no other logic changes needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1033a0c2-ba42-4f5b-aa57-2b3b75482065

📥 Commits

Reviewing files that changed from the base of the PR and between fb0346b and 9a63947.

📒 Files selected for processing (9)

docs/orchestration/ci.md
docs/orchestration/cli.md
projects/core/ci_entrypoint/run_ci.py
projects/core/library/config.py
projects/fournos_launcher/orchestration/ci.py
projects/fournos_launcher/orchestration/cli.py
projects/fournos_launcher/orchestration/config.yaml
projects/fournos_launcher/orchestration/submit.py
projects/fournos_launcher/toolbox/submit_and_wait/main.py

✅ Files skipped from review due to trivial changes (3)

projects/fournos_launcher/orchestration/config.yaml
docs/orchestration/cli.md
docs/orchestration/ci.md

🚧 Files skipped from review as they are similar to previous changes (4)

projects/fournos_launcher/orchestration/ci.py
projects/fournos_launcher/orchestration/cli.py
projects/core/ci_entrypoint/run_ci.py
projects/core/library/config.py

kpouget · 2026-04-07T15:06:35Z

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

topsail-bot · 2026-04-07T15:07:34Z

🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 02 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

Failure indicator:

/logs/artifacts/001__submit_and_wait/FAILURE | CalledProcessError: Command 'oc apply -f /logs/artifacts/001__submit_and_wait/src/forge-llm_d-20260407-150732-manifest.yaml' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/app/forge/projects/core/dsl/runtime.py", line 103, in execute_tasks
    raise execution_error
  File "/app/forge/projects/core/dsl/runtime.py", line 67, in execute_tasks
    _execute_single_task(task_info, args)
  File "/app/forge/projects/core/dsl/runtime.py", line 139, in _execute_single_task
    task_status["ret"] = task_func(args)
                         ^^^^^^^^^^^^^^^
  File "/app/forge/projects/core/dsl/task.py", line 72, in wrapper

[...]

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

projects/fournos_launcher/toolbox/submit_and_wait/main.py (3)

186-214: ⚠️ Potential issue | 🟠 Major

Add @always decorator to artifact collector tasks.

If wait_for_job_completion fails, retrieve_job_logs and capture_final_job_status won't run, leaving no logs or final status for debugging. These tasks should run regardless of prior failures to capture diagnostic data.

Proposed fix

+@always
 `@task`
 def retrieve_job_logs(args):
     """Retrieve and save job logs"""

+@always
 `@task`
 def capture_final_job_status(args):
     """Capture final job status and details"""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 186 -
214, The artifact-collector tasks retrieve_job_logs and capture_final_job_status
must run even if earlier tasks fail; add the `@always` decorator (from the task
framework used) above both the retrieve_job_logs and capture_final_job_status
function definitions so they execute regardless of prior task failures, leaving
their bodies unchanged and preserving stdout_dest/check args.

146-148: ⚠️ Potential issue | 🟠 Major

The timeout_minutes and poll_interval_seconds parameters are ignored.

The @retry(attempts=120, delay=30, backoff=1.0) decorator uses hardcoded values, resulting in a fixed ~60-minute timeout (120 × 30s) regardless of the timeout_minutes parameter (documented default: 120 minutes).

Suggested direction: Use explicit deadline loop

Replace the retry decorator with an explicit polling loop that respects the configured parameters:

`@task`
def wait_for_job_completion(args):
    """Wait for FOURNOS job to complete"""
    import time
    
    deadline = time.monotonic() + args.timeout_minutes * 60
    
    while time.monotonic() < deadline:
        # ... existing status check logic ...
        
        if status == "Completed":
            return f"Job {args.final_job_name} completed successfully"
        elif status == "Failed":
            # ... existing failure handling ...
        
        time.sleep(args.poll_interval_seconds)
    
    raise TimeoutError(f"Timed out waiting for job {args.final_job_name} after {args.timeout_minutes} minutes")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 146 -
148, The wait_for_job_completion function currently uses a hardcoded
`@retry`(attempts=120, delay=30, backoff=1.0) decorator so args.timeout_minutes
and args.poll_interval_seconds are ignored; remove the retry decorator on
wait_for_job_completion and replace it with an explicit polling loop that
computes a deadline using time.monotonic() + args.timeout_minutes*60, calls the
existing job status check and existing success/failure handling inside the loop,
sleeps using args.poll_interval_seconds between polls, and when the deadline is
exceeded raise a TimeoutError with a message referencing args.final_job_name and
args.timeout_minutes.

82-86: ⚠️ Potential issue | 🟠 Major

Normalize final_job_name to a Kubernetes-safe DNS-1123 label.

The args.project value (e.g., llm_d) is interpolated verbatim into the job name. Values containing underscores produce names like forge-llm_d-... which violate DNS-1123 subdomain naming (lowercase alphanumeric and hyphens only).

Proposed fix

+import re
+
 `@task`
 def generate_job_name(args):
     """Generate job name if not provided"""

     if args.job_name:
         args.final_job_name = args.job_name
     else:
         timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
-        args.final_job_name = f"forge-{args.project}-{timestamp}"
+        # Normalize project name to DNS-1123 safe format
+        project_slug = re.sub(r"[^a-z0-9-]+", "-", args.project.lower()).strip("-") or "job"
+        args.final_job_name = f"forge-{project_slug}-{timestamp}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 82 -
86, The generated job name may include characters invalid for DNS-1123 labels
(e.g., underscores from args.project); update the logic that assigns
args.final_job_name (the block using args.job_name, args.project, and
datetime.now().strftime) to normalize the interpolated name: lowercase it,
replace any character not a-z0-9 or hyphen with a hyphen (e.g., convert
underscores to hyphens), collapse consecutive hyphens, strip leading/trailing
hyphens, and truncate to 63 characters (DNS-1123 label max) while ensuring it is
not empty (provide a safe default like "forge"). Implement this normalization
using a simple regex-based sanitizer before assigning args.final_job_name.

🧹 Nitpick comments (7)

projects/fournos_launcher/orchestration/submit.py (1)

34-34: Remove unnecessary f-string prefix.

This f-string has no placeholders. Remove the f prefix.

Proposed fix

-    logger.warning(f"Hello Fournos")
+    logger.warning("Hello Fournos")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/orchestration/submit.py` at line 34, The
logger.warning call in submit.py uses an unnecessary f-string
(logger.warning(f"Hello Fournos")) even though there are no placeholders; change
it to a plain string by removing the `f` prefix (use logger.warning("Hello
Fournos")) to avoid misleading use of f-strings and minor overhead.

projects/fournos_launcher/toolbox/submit_and_wait/main.py (3)

160-160: Remove unnecessary f-string prefix.

This f-string has no placeholders.

Proposed fix

-        print(f"Failed to get job status, retrying...")
+        print("Failed to get job status, retrying...")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` at line 160, The
print call uses an unnecessary f-string prefix: replace print(f"Failed to get
job status, retrying...") with a regular string literal print("Failed to get job
status, retrying...") in the submit/wait logic (the print statement that logs
retrying on failed job status retrieval) to remove the redundant f prefix.

245-248: Remove unnecessary f-string prefixes.

These f-strings on lines 245 and 247 have no placeholders.

Proposed fix

-        print(f"✅ FOURNOS job completed successfully")
+        print("✅ FOURNOS job completed successfully")
     except KeyboardInterrupt:
-        print(f"\n🚫 Operation interrupted by user")
+        print("\n🚫 Operation interrupted by user")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 245 -
248, The two print statements use unnecessary f-string prefixes even though they
contain no placeholders; update the prints in the try/except block (the calls
currently written as print(f"✅ FOURNOS job completed successfully") and
print(f"\n🚫 Operation interrupted by user")) to plain string prints (remove the
leading f) so they become print("✅ FOURNOS job completed successfully") and
print("\n🚫 Operation interrupted by user").

21-22: Use explicit Optional type hints per PEP 484.

The type hints use implicit Optional (mutable default None) which PEP 484 prohibits.

Proposed fix

+from typing import Optional
+
 def run(
     cluster_name: str,
     project: str,
-    args: list = None,
-    variables_overrides: dict = None,
+    args: Optional[list] = None,
+    variables_overrides: Optional[dict] = None,
     job_name: str = "",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py` around lines 21 -
22, Replace the implicit Optional hints for the parameters args and
variables_overrides with explicit PEP 484 Optional types: import Optional (and
List/Dict/Any as needed) from typing and change the annotations from "args: list
= None" and "variables_overrides: dict = None" to something like "args:
Optional[List[Any]] = None" and "variables_overrides: Optional[Dict[str, Any]] =
None" in the function signature (the parameters named args and
variables_overrides in main.py), ensuring the typing imports are added at the
top of the file.

projects/core/ci_entrypoint/prepare_ci.py (1)

396-411: Static analysis: Consider using absolute paths for curl.

The static analysis flags curl as a partial executable path (S607). While this is low risk in a CI environment, using the full path or shutil.which('curl') would be more secure.

Also, the subprocess calls don't capture output, so if the curl command fails, the warning message won't include details about why.

Suggested improvement

+import shutil
+
+curl_path = shutil.which('curl')
+if not curl_path:
+    logger.warning("curl not found, skipping PR download")
+else:
     try:
         # Download PR data
         result = subprocess.run(
-            ["curl", "-sSf", pr_url, "-o", str(artifact_path / CI_METADATA_DIRNAME / "pull_request.json")],
-            timeout=30
+            [curl_path, "-sSf", pr_url, "-o", str(artifact_path / CI_METADATA_DIRNAME / "pull_request.json")],
+            timeout=30,
+            capture_output=True,
+            text=True
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/core/ci_entrypoint/prepare_ci.py` around lines 396 - 411, Replace
the bare "curl" invocation in the subprocess.run calls with an absolute command
path resolved via shutil.which('curl') (and fail early with a clear logger.error
if None), and change the subprocess.run invocations that build the PR and
PR-comments files to pass capture_output=True and text=True so you can log
result.stderr/result.stdout when result.returncode != 0; update the two
locations that reference pr_url/pr_comments_url and artifact_path /
CI_METADATA_DIRNAME to use the resolved curl_path variable and include the
captured stderr in the logger.warning/error messages to surface why the download
failed.

projects/core/dsl/task.py (1)

79-83: Duplicate error logging with runtime.py may cause confusing output.

Both task.py (lines 80-81) and runtime.py (lines 146-147) now log similar error details when a task fails. Since the wrapper in task.py re-raises the exception and runtime.py catches it, the same failure will be logged twice with slightly different formats:

Here: ==> TASK FAILED: {task_name} - {e.__class__.__name__}: {e}

In runtime.py: ==> EXECUTION FAILED for {task_name}: {e.__class__.__name__}: {e}

Consider removing the error logging from one location to avoid duplicate messages in the execution log.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/core/dsl/task.py` around lines 79 - 83, The except block in the task
wrapper is duplicating error logs already emitted by runtime.py; in the except
handler that references task_name and func, remove (or demote to debug) the two
logger.error calls that emit "==> TASK FAILED: ..." and "==> Task description:
...", leaving the raise so runtime.py can perform the single centralized failure
log; locate the except Exception as e block in the task wrapper (the code using
task_name and func.__doc__) and delete or change those logger.error lines
accordingly.

projects/core/dsl/shell.py (1)

106-108: Overriding error.args is unconventional but achieves the goal.

The standard CalledProcessError.args is (returncode, cmd), but overriding it to (error_msg,) changes what str(error) returns. This works for logging purposes, but note that any code inspecting error.args[0] expecting an integer return code will get the string message instead.
Alternative: Subclass or use exception chaining

A cleaner approach would be to raise a new exception with the enhanced message while preserving the original:
-            error = subprocess.CalledProcessError(result.returncode, command, result.stdout, result.stderr)
-            error.args = (error_msg,)
-            raise error
+            error = subprocess.CalledProcessError(result.returncode, command, result.stdout, result.stderr)
+            raise RuntimeError(error_msg) from error
This preserves the original exception while providing the enhanced message.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/core/dsl/shell.py` around lines 106 - 108, Don't override
subprocess.CalledProcessError.args; instead preserve the original
CalledProcessError and raise a new exception that carries the enhanced message
while retaining the original as the cause. Specifically, keep the original
CalledProcessError created from result.returncode, command, result.stdout,
result.stderr (the variable named error) and then raise a new exception (e.g.,
RuntimeError or a small custom exception) with error_msg using "raise ... from
error" so callers can still inspect error.returncode/args via the original error
variable; reference subprocess.CalledProcessError, the local names error,
result, command, and error_msg to locate where to change the code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/core/ci_entrypoint/prepare_ci.py`:
- Around line 382-383: system_prechecks() writes the FORGE git version into
artifact_path / CI_METADATA_DIRNAME / "forge.git_version" but the
CI_METADATA_DIRNAME directory may not exist yet (parse_and_save_pr_arguments()
creates it later), causing FileNotFoundError; fix by ensuring the directory
exists before writing (e.g., call mkdir(parents=True, exist_ok=True) on
artifact_path / CI_METADATA_DIRNAME) inside system_prechecks() (or in a shared
helper invoked before the write), so the write in system_prechecks() succeeds
regardless of prepare() call order.

In `@projects/core/dsl/shell.py`:
- Around line 98-109: The CalledProcessError is being constructed with string
stdout/stderr while downstream code expects bytes (it calls .decode()), causing
AttributeError; to fix, preserve text=True in subprocess.run but when creating
the CalledProcessError in shell.py (the block that builds error_msg and does
subprocess.CalledProcessError(...)), pass bytes for stdout/stderr (e.g., encode
result.stdout/result.stderr) so CalledProcessError.stderr/stdout are bytes, or
alternatively change the subprocess.run calls to text=False so outputs are bytes
consistently; update the CalledProcessError construction in the error-handling
path in shell.py (the raise of subprocess.CalledProcessError) to match whichever
approach you choose.

---

Duplicate comments:
In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py`:
- Around line 186-214: The artifact-collector tasks retrieve_job_logs and
capture_final_job_status must run even if earlier tasks fail; add the `@always`
decorator (from the task framework used) above both the retrieve_job_logs and
capture_final_job_status function definitions so they execute regardless of
prior task failures, leaving their bodies unchanged and preserving
stdout_dest/check args.
- Around line 146-148: The wait_for_job_completion function currently uses a
hardcoded `@retry`(attempts=120, delay=30, backoff=1.0) decorator so
args.timeout_minutes and args.poll_interval_seconds are ignored; remove the
retry decorator on wait_for_job_completion and replace it with an explicit
polling loop that computes a deadline using time.monotonic() +
args.timeout_minutes*60, calls the existing job status check and existing
success/failure handling inside the loop, sleeps using
args.poll_interval_seconds between polls, and when the deadline is exceeded
raise a TimeoutError with a message referencing args.final_job_name and
args.timeout_minutes.
- Around line 82-86: The generated job name may include characters invalid for
DNS-1123 labels (e.g., underscores from args.project); update the logic that
assigns args.final_job_name (the block using args.job_name, args.project, and
datetime.now().strftime) to normalize the interpolated name: lowercase it,
replace any character not a-z0-9 or hyphen with a hyphen (e.g., convert
underscores to hyphens), collapse consecutive hyphens, strip leading/trailing
hyphens, and truncate to 63 characters (DNS-1123 label max) while ensuring it is
not empty (provide a safe default like "forge"). Implement this normalization
using a simple regex-based sanitizer before assigning args.final_job_name.

---

Nitpick comments:
In `@projects/core/ci_entrypoint/prepare_ci.py`:
- Around line 396-411: Replace the bare "curl" invocation in the subprocess.run
calls with an absolute command path resolved via shutil.which('curl') (and fail
early with a clear logger.error if None), and change the subprocess.run
invocations that build the PR and PR-comments files to pass capture_output=True
and text=True so you can log result.stderr/result.stdout when result.returncode
!= 0; update the two locations that reference pr_url/pr_comments_url and
artifact_path / CI_METADATA_DIRNAME to use the resolved curl_path variable and
include the captured stderr in the logger.warning/error messages to surface why
the download failed.

In `@projects/core/dsl/shell.py`:
- Around line 106-108: Don't override subprocess.CalledProcessError.args;
instead preserve the original CalledProcessError and raise a new exception that
carries the enhanced message while retaining the original as the cause.
Specifically, keep the original CalledProcessError created from
result.returncode, command, result.stdout, result.stderr (the variable named
error) and then raise a new exception (e.g., RuntimeError or a small custom
exception) with error_msg using "raise ... from error" so callers can still
inspect error.returncode/args via the original error variable; reference
subprocess.CalledProcessError, the local names error, result, command, and
error_msg to locate where to change the code.

In `@projects/core/dsl/task.py`:
- Around line 79-83: The except block in the task wrapper is duplicating error
logs already emitted by runtime.py; in the except handler that references
task_name and func, remove (or demote to debug) the two logger.error calls that
emit "==> TASK FAILED: ..." and "==> Task description: ...", leaving the raise
so runtime.py can perform the single centralized failure log; locate the except
Exception as e block in the task wrapper (the code using task_name and
func.__doc__) and delete or change those logger.error lines accordingly.

In `@projects/fournos_launcher/orchestration/submit.py`:
- Line 34: The logger.warning call in submit.py uses an unnecessary f-string
(logger.warning(f"Hello Fournos")) even though there are no placeholders; change
it to a plain string by removing the `f` prefix (use logger.warning("Hello
Fournos")) to avoid misleading use of f-strings and minor overhead.

In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py`:
- Line 160: The print call uses an unnecessary f-string prefix: replace
print(f"Failed to get job status, retrying...") with a regular string literal
print("Failed to get job status, retrying...") in the submit/wait logic (the
print statement that logs retrying on failed job status retrieval) to remove the
redundant f prefix.
- Around line 245-248: The two print statements use unnecessary f-string
prefixes even though they contain no placeholders; update the prints in the
try/except block (the calls currently written as print(f"✅ FOURNOS job completed
successfully") and print(f"\n🚫 Operation interrupted by user")) to plain string
prints (remove the leading f) so they become print("✅ FOURNOS job completed
successfully") and print("\n🚫 Operation interrupted by user").
- Around line 21-22: Replace the implicit Optional hints for the parameters args
and variables_overrides with explicit PEP 484 Optional types: import Optional
(and List/Dict/Any as needed) from typing and change the annotations from "args:
list = None" and "variables_overrides: dict = None" to something like "args:
Optional[List[Any]] = None" and "variables_overrides: Optional[Dict[str, Any]] =
None" in the function signature (the parameters named args and
variables_overrides in main.py), ensuring the typing imports are added at the
top of the file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e282bfdd-6a78-4882-b4e4-75f099cd40ea

📥 Commits

Reviewing files that changed from the base of the PR and between 9a63947 and 872074c.

📒 Files selected for processing (15)

docs/ci/run_ci.md
docs/orchestration/ci.md
docs/orchestration/cli.md
projects/core/ci_entrypoint/prepare_ci.py
projects/core/ci_entrypoint/run_ci.py
projects/core/dsl/runtime.py
projects/core/dsl/shell.py
projects/core/dsl/task.py
projects/core/library/config.py
projects/core/notifications/send.py
projects/fournos_launcher/orchestration/ci.py
projects/fournos_launcher/orchestration/cli.py
projects/fournos_launcher/orchestration/config.yaml
projects/fournos_launcher/orchestration/submit.py
projects/fournos_launcher/toolbox/submit_and_wait/main.py

✅ Files skipped from review due to trivial changes (5)

projects/fournos_launcher/orchestration/config.yaml
projects/core/notifications/send.py
docs/orchestration/cli.md
docs/orchestration/ci.md
docs/ci/run_ci.md

🚧 Files skipped from review as they are similar to previous changes (4)

projects/core/ci_entrypoint/run_ci.py
projects/fournos_launcher/orchestration/cli.py
projects/core/library/config.py
projects/fournos_launcher/orchestration/ci.py

kpouget · 2026-04-07T15:45:16Z

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

topsail-bot · 2026-04-07T15:46:41Z

🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 02 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos llm_d hello world
/cluster psap
/var something.is.wrong: false

Failure indicator:

/logs/artifacts/001__submit_and_wait/FAILURE | TaskExecutionError: Task 'submit_fournos_job' failed: CalledProcessError: Command 'oc apply -f /logs/artifacts/001__submit_and_wait/src/forge-llm_d-20260407-154640-manifest.yaml' returned non-zero exit status 1.
Task description: Submit the FOURNOS job
Traceback (most recent call last):
  File "/app/forge/projects/core/dsl/runtime.py", line 87, in execute_tasks
    _execute_single_task(task_info, args)
  File "/app/forge/projects/core/dsl/runtime.py", line 179, in _execute_single_task
    raise task_error from None
projects.core.dsl.runtime.TaskExecutionError: Task 'submit_fournos_job' failed: CalledProcessError: Command 'oc apply -f /logs/artifacts/001__submit_and_wait/src/forge-llm_d-20260407-154640-manifest.yaml' returned non-zero exit status 1.
Task description: Submit the FOURNOS job

openshift-ci · 2026-04-07T15:46:48Z

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/fournos	`241e31c`	link	true	`/test fournos`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

projects/core/dsl/runtime.py (1)

26-26: Use explicit None union for optional parameter.

Per PEP 484, parameters with None default should use explicit union type annotation.

♻️ Proposed fix

-    def __init__(self, task_name: str, task_description: str, original_exception: Exception, task_args: dict = None):
+    def __init__(self, task_name: str, task_description: str, original_exception: Exception, task_args: dict | None = None):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/core/dsl/runtime.py` at line 26, The __init__ signature currently
sets task_args: dict = None which violates PEP 484 for optional defaults; update
the parameter annotation in the __init__ method to use an explicit None union
(e.g., task_args: Optional[dict] = None or task_args: dict | None = None) and
import typing.Optional if using Optional so the type clearly indicates the
parameter may be None; keep the parameter name and default value unchanged and
only change the type annotation in the __init__ definition.

projects/core/library/ci.py (2)

12-15: Remove unused inspect import.

The inspect module is imported but not used anywhere in this file.

♻️ Proposed fix

 import traceback
 import logging
-import inspect
 import os
 import yaml
 from pathlib import Path

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/core/library/ci.py` around lines 12 - 15, Remove the unused import
"inspect" from the top of the file: delete the "import inspect" statement so
only used modules (os, yaml, Path) remain; verify there are no other references
to the inspect symbol in this module and run linters/formatters to confirm no
unused-import warnings remain.

123-125: Move subprocess import to module level.

Imports inside functions add minor overhead and deviate from Python conventions. Move this to the top of the file with other imports.

♻️ Proposed fix

At top of file (around line 10):

import subprocess

Then remove line 124:

     # Only include original_error for exceptions we haven't structurally parsed
-    import subprocess
     if not isinstance(actual_exception, subprocess.CalledProcessError):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@projects/core/library/ci.py` around lines 123 - 125, The subprocess import is
inside a function near the check "if not isinstance(actual_exception,
subprocess.CalledProcessError)"; move "import subprocess" up to the module-level
imports with the other imports, remove the local import statement inside the
function, and run a quick grep for any other local "subprocess" imports to avoid
duplicates or name conflicts so the isinstance check against
subprocess.CalledProcessError continues to work.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/core/dsl/runtime.py`:
- Around line 96-98: The except TaskExecutionError block currently re-raises
immediately and prevents the always-tasks block from running; change it to
capture and store the TaskExecutionError into the same exception variable used
for other errors (the one set in the other except path) instead of raising, let
the existing always-tasks block execute, and after that block re-raise the
stored TaskExecutionError so cleanup always runs; reference the
TaskExecutionError except handler and the always-tasks block in runtime.py when
applying this change.

In `@projects/core/library/ci.py`:
- Around line 200-206: The error summary written by _write_error_summary_to_file
is being clobbered later by postchecks, so change the write target to a
non-conflicting file (e.g., write the detailed per-operation summary to
"FAILURES_DETAILED" instead of "FAILURES") to preserve the structured summary;
update the function _write_error_summary_to_file to use the new filename and
ensure any consumers that need the consolidated view (postchecks) either
read/merge FAILURES_DETAILED or continue writing its own aggregated FAILURES
without overwriting the detailed file (alternatively, if you prefer to keep the
same name, modify postchecks to open FAILURES in append mode or merge existing
contents before writing).

---

Nitpick comments:
In `@projects/core/dsl/runtime.py`:
- Line 26: The __init__ signature currently sets task_args: dict = None which
violates PEP 484 for optional defaults; update the parameter annotation in the
__init__ method to use an explicit None union (e.g., task_args: Optional[dict] =
None or task_args: dict | None = None) and import typing.Optional if using
Optional so the type clearly indicates the parameter may be None; keep the
parameter name and default value unchanged and only change the type annotation
in the __init__ definition.

In `@projects/core/library/ci.py`:
- Around line 12-15: Remove the unused import "inspect" from the top of the
file: delete the "import inspect" statement so only used modules (os, yaml,
Path) remain; verify there are no other references to the inspect symbol in this
module and run linters/formatters to confirm no unused-import warnings remain.
- Around line 123-125: The subprocess import is inside a function near the check
"if not isinstance(actual_exception, subprocess.CalledProcessError)"; move
"import subprocess" up to the module-level imports with the other imports,
remove the local import statement inside the function, and run a quick grep for
any other local "subprocess" imports to avoid duplicates or name conflicts so
the isinstance check against subprocess.CalledProcessError continues to work.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 09d17a6d-f7a3-4ff2-a58e-bcedb16788de

📥 Commits

Reviewing files that changed from the base of the PR and between 872074c and 241e31c.

📒 Files selected for processing (4)

projects/core/dsl/runtime.py
projects/core/dsl/shell.py
projects/core/dsl/task.py
projects/core/library/ci.py

🚧 Files skipped from review as they are similar to previous changes (2)

projects/core/dsl/task.py
projects/core/dsl/shell.py

coderabbitai · 2026-04-07T15:49:30Z

+        except TaskExecutionError:
+            # Re-raise the TaskExecutionError to preserve context
+            raise


⚠️ Potential issue | 🔴 Critical

Critical: TaskExecutionError bypasses always-tasks execution.

Re-raising TaskExecutionError at line 98 exits the function before the always-tasks block (lines 113-122) executes. This breaks the guaranteed cleanup pattern that other exceptions follow (line 108-109 stores the error to allow always-tasks to run).

🐛 Proposed fix to preserve always-tasks execution

- except TaskExecutionError: - # Re-raise the TaskExecutionError to preserve context - raise + except TaskExecutionError as e: + # Store the TaskExecutionError to preserve context after always tasks + execution_error = e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except TaskExecutionError:

# Re-raise the TaskExecutionError to preserve context

raise

except TaskExecutionError as e:

# Store the TaskExecutionError to preserve context after always tasks

execution_error = e

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@projects/core/dsl/runtime.py` around lines 96 - 98, The except TaskExecutionError block currently re-raises immediately and prevents the always-tasks block from running; change it to capture and store the TaskExecutionError into the same exception variable used for other errors (the one set in the other except path) instead of raising, let the existing always-tasks block execute, and after that block re-raise the stored TaskExecutionError so cleanup always runs; reference the TaskExecutionError except handler and the always-tasks block in runtime.py when applying this change.

kpouget · 2026-04-07T19:06:12Z

merging the docs update

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Comment thread docs/ci/run_ci.md Outdated

Comment thread docs/ci/run_ci.md

Comment thread docs/orchestration/ci.md Outdated

Comment thread projects/core/library/config.py Outdated

kpouget force-pushed the docs branch from a7bdf8e to fb0346b Compare April 7, 2026 14:33

kpouget force-pushed the docs branch from fb0346b to 49cfca6 Compare April 7, 2026 14:35

kpouget force-pushed the docs branch from 49cfca6 to 9a63947 Compare April 7, 2026 14:42

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

kpouget added 3 commits April 7, 2026 16:51

docs: ci: run_ci: add the documentation

57f96a8

docs: orchestration: ci: add the documentation

0a49e3b

docs: orchestration: cli: add the documentaiton

1044936

kpouget force-pushed the docs branch from 9a63947 to 872074c Compare April 7, 2026 15:06

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Comment thread projects/core/ci_entrypoint/prepare_ci.py Outdated

Comment thread projects/core/dsl/shell.py Outdated

kpouget force-pushed the docs branch from 872074c to 241e31c Compare April 7, 2026 15:44

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

kpouget force-pushed the docs branch from 241e31c to 1044936 Compare April 7, 2026 19:05

kpouget merged commit 210aa00 into openshift-psap:main Apr 7, 2026
2 checks passed

kpouget deleted the docs branch April 7, 2026 19:06

This was referenced Apr 7, 2026

[fournos launcher] Improve the toolbox #17

Merged

[fournos-launcher] Submit jobs to the psap-mgmt cluster #19

Merged

[fournos-launcher] Make it work with FOURNOS v0.3.1 #21

Merged

[foreign_testing] New project to support foreign testing in FORGE #24

Merged

coderabbitai bot mentioned this pull request Apr 16, 2026

feat: Ruff precommit and GHA check #31

Merged

-        except TaskExecutionError:
-            # Re-raise the TaskExecutionError to preserve context
-            raise
+        except TaskExecutionError as e:
+            # Store the TaskExecutionError to preserve context after always tasks
+            execution_error = e

Conversation

kpouget commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Apr 7, 2026

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

kpouget commented Apr 7, 2026

Uh oh!

topsail-bot bot commented Apr 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kpouget commented Apr 7, 2026

Uh oh!

topsail-bot bot commented Apr 7, 2026

Uh oh!

kpouget commented Apr 7, 2026

Uh oh!

kpouget commented Apr 7, 2026

Uh oh!

topsail-bot bot commented Apr 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kpouget commented Apr 7, 2026

Uh oh!

topsail-bot bot commented Apr 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kpouget commented Apr 7, 2026

Uh oh!

topsail-bot bot commented Apr 7, 2026

Uh oh!

openshift-ci bot commented Apr 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kpouget commented Apr 7, 2026

Uh oh!

kpouget commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading