[skeleton] orchestration: Implement a stub for resolve_fournos_config#45
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Fournos config resolver CLI and runtime that fetches a FournosJob from the cluster, injects vault secret refs, optionally resolves hardware, writes a resolved YAML artifact, and optionally reapplies the object; plus related launcher, template, vault validation, and CI wiring changes. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as "resolve-fournos-config CLI"
participant VaultSvc as "Vault list provider"
participant Hardware as "Hardware resolver (optional)"
participant OC as "oc (cluster)"
participant FS as "Filesystem (ARTIFACT_DIR)"
CLI->>VaultSvc: vault_list_func() -> list[vaults]
CLI->>OC: oc get fjob/<name> -o yaml
OC-->>CLI: fjob YAML -> dict
CLI->>CLI: inject spec.secretRefs from vaults
CLI->>Hardware: hardware_resolver_func(spec.hardware)?
alt hardware resolved
Hardware-->>CLI: resolved hardware dict
CLI->>CLI: set spec.hardware or null if empty
end
CLI->>FS: write fjob.resolved.yaml in ARTIFACT_DIR
alt dry_run == false
CLI->>OC: oc apply -f fjob.resolved.yaml
OC-->>CLI: apply result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
42858d1 to
b5ae596
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
projects/legacy/library/run.py (1)
242-242: Preserve exception context instead of suppressing it.At line 242,
from Nonesuppresses the exception context from the caughtException as e. While the traceback is already printed, preserving the exception chain in the raisedSystemExitensures proper debugging if the process termination sequence does not complete immediately.Proposed fix
- raise SystemExit(1) from None + raise SystemExit(1) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@projects/legacy/library/run.py` at line 242, The code currently suppresses the original exception context by using "raise SystemExit(1) from None" inside the except block that captures "Exception as e"; replace that with a plain "raise SystemExit(1)" (or "raise SystemExit(1) from e" if you prefer explicit chaining) so the original exception context is preserved for debugging while still exiting the process; update the raise near the caught "Exception as e" and keep existing traceback printing intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@projects/core/ci_entrypoint/fournos_resolve.py`:
- Around line 143-147: The code ignores the resolver's return value; change the
block using hardware_resolver_func so you capture updated_hardware and assign
fjob_obj["spec"]["hardware"] to the resolver's return when provided (falling
back to the original hardware_spec if the resolver returns None), i.e. call
hardware_resolver_func(hardware_spec), then set resolved = updated_hardware if
updated_hardware is not None else hardware_spec and assign that resolved value
back into fjob_obj["spec"]["hardware"]; keep the logger.info("Applied hardware
resolution configuration") as-is.
In `@projects/fournos_launcher/toolbox/submit_and_wait/main.py`:
- Around line 40-42: The new public run() kwargs (exclusive, gpu_count,
gpu_type) must be validated in validate_inputs(): ensure gpu_count and gpu_type
are either both absent/None or both present and valid (gpu_count is an integer >
0 and gpu_type is a non-empty string), reject falsey/partial combos (e.g. 0 or
"" or only one provided) by raising a clear ValueError so callers fail fast;
update validate_inputs() to perform these checks and mention
spec.hardware/job.yaml.j2 in the error message so it's obvious why both fields
are required.
In `@projects/skeleton/orchestration/test_skeleton.py`:
- Around line 36-45: The init signature in test_skeleton changed to
init(skip_vault_init=False) but orchestration.cli still calls
test_skeleton.init(strict_vault_validation=False), causing a TypeError; either
update the caller to use the new parameter (call
test_skeleton.init(skip_vault_init=not strict_vault_validation)) or add a small
compatibility shim in the test_skeleton module — e.g., change def
init(skip_vault_init=False, strict_vault_validation=None): and if
strict_vault_validation is not None set skip_vault_init = not
strict_vault_validation — then proceed with existing logic and preserve
behavior.
---
Nitpick comments:
In `@projects/legacy/library/run.py`:
- Line 242: The code currently suppresses the original exception context by
using "raise SystemExit(1) from None" inside the except block that captures
"Exception as e"; replace that with a plain "raise SystemExit(1)" (or "raise
SystemExit(1) from e" if you prefer explicit chaining) so the original exception
context is preserved for debugging while still exiting the process; update the
raise near the caught "Exception as e" and keep existing traceback printing
intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ad40e30f-b068-4683-bbda-dd8a2b79b482
📒 Files selected for processing (9)
projects/core/ci_entrypoint/fournos_resolve.pyprojects/fournos_launcher/orchestration/config.yamlprojects/fournos_launcher/orchestration/submit.pyprojects/fournos_launcher/toolbox/submit_and_wait/main.pyprojects/fournos_launcher/toolbox/submit_and_wait/templates/job.yaml.j2projects/legacy/library/run.pyprojects/skeleton/orchestration/ci.pyprojects/skeleton/orchestration/test_skeleton.pypyproject.toml
d5d3898 to
d8e6236
Compare
|
/test fournos skeleton |
|
🔴 Test of 'fournos_launcher submit' failed after 00 hours 05 minutes 06 seconds 🔴 • Link to the test results. • No reports index generated... Test configuration: |
|
/test fournos skeleton |
|
🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 22 seconds 🔴 • Link to the test results. • No reports index generated... Test configuration: |
|
/test fournos skeleton |
|
🔴 Test of 'fournos_launcher submit' failed after 00 hours 01 minutes 16 seconds 🔴 • Link to the test results. • No reports index generated... Test configuration: |
|
/test fournos skeleton |
947dcd8 to
7c5e551
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
projects/core/ci_entrypoint/fournos_resolve.py (1)
143-148:⚠️ Potential issue | 🟠 MajorHonor
hardware_resolver_func()'s return value.Line 147 discards the callback result and always keeps
spec.hardwarebound to the original mapping. That only works for in-place mutation; any resolver that returns a replacement dict will silently lose its changes.Proposed fix
- fjob_obj["spec"]["hardware"] = hardware_spec = fjob_obj["spec"].get("hardware", {}) - hardware_resolver_func(hardware_spec) + hardware_spec = fjob_obj["spec"].get("hardware", {}) + updated_hardware = hardware_resolver_func(hardware_spec) + fjob_obj["spec"]["hardware"] = ( + updated_hardware if updated_hardware is not None else hardware_spec + ) logger.info("Applied hardware resolution configuration")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@projects/core/ci_entrypoint/fournos_resolve.py` around lines 143 - 148, The code currently calls hardware_resolver_func(hardware_spec) but ignores its return value; change the call in the hardware resolution block so you capture the resolver's return (e.g., result = hardware_resolver_func(hardware_spec)) and if result is not None (or is a dict), assign fjob_obj["spec"]["hardware"] = result, otherwise keep the original hardware_spec (to support both in-place mutation and replacement returns); ensure you still initialize hardware_spec via fjob_obj["spec"].get("hardware", {}) and keep the logger.info("Applied hardware resolution configuration") after the assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/orchestration/vaults.md`:
- Around line 132-149: The snippet calls get_vault_manager() before the vault
system is initialized, which causes an exception; initialize the vault first by
calling vault.init(...) with the project/config parameters required by your
codebase before calling get_vault_manager(), then you can optionally call
vault.disable_strict_validation() and proceed to use
vault_manager.validate_vault, vault_manager.validate_project_vaults, and
vault_manager.validate_all_vaults; ensure you call vault.init (the initializer)
prior to get_vault_manager to avoid the raise.
In `@projects/core/ci_entrypoint/run_ci.py`:
- Around line 438-445: The command line is wrongly built by inserting
`operation` before `args` and unconditionally indexing `args[0]`; in `run_ci.py`
adjust the `cmd` construction to use the already-selected script (`ci_script`)
plus the child argv built from `args` (not by prepending `operation`), and only
apply the "resolve_fournos_config" -> "resolve-fournos-config" alias when `args`
is non-empty (guard access to `args[0]`); ensure `cmd` becomes `[sys.executable,
str(ci_script)] + list(args)` (with the alias replacement performed on `args`
when present) so Click entrypoints receive the intended argv and zero-arg
invocations don't crash.
In `@projects/core/library/vault.py`:
- Around line 151-153: The current global _strict_validation_enabled is
irreversible and causes validate_vault() (and the other validate_* uses around
lines 377-412) to silently downgrade strict=True calls; change the design to
make strict state instance-scoped: add a boolean attribute (e.g.,
self._strict_validation_enabled) to VaultManager initialized True, update
disable_strict_validation() to be an instance method (and provide an enable or
reversible API/context-manager) that toggles that attribute, and modify
validate_vault() and the other validation functions to compute effective_strict
by AND-ing the local strict parameter with the VaultManager instance attribute
(not the module-global) so one caller cannot disable strictness for the whole
process.
In `@projects/fournos_launcher/orchestration/submit.py`:
- Around line 106-116: The GPU validation currently only checks for None and
allows falsy values like 0 or empty string; update the logic around gpu_count
and gpu_type to enforce types and non-empty values: ensure gpu_count is either
None or an int > 0 and gpu_type is either None or a non-empty str, and require
that both are provided together or both are None/omitted. Replace the current
presence check (gpu_config_present) with explicit checks using
isinstance(gpu_count, int) and gpu_count > 0, and isinstance(gpu_type, str) and
gpu_type.strip() != "", then raise a ValueError if one is valid and the other is
not (include the offending values in the error message). Ensure the validated
variables still match the expected function signature (gpu_count: int, gpu_type:
str) for downstream template usage.
---
Duplicate comments:
In `@projects/core/ci_entrypoint/fournos_resolve.py`:
- Around line 143-148: The code currently calls
hardware_resolver_func(hardware_spec) but ignores its return value; change the
call in the hardware resolution block so you capture the resolver's return
(e.g., result = hardware_resolver_func(hardware_spec)) and if result is not None
(or is a dict), assign fjob_obj["spec"]["hardware"] = result, otherwise keep the
original hardware_spec (to support both in-place mutation and replacement
returns); ensure you still initialize hardware_spec via
fjob_obj["spec"].get("hardware", {}) and keep the logger.info("Applied hardware
resolution configuration") after the assignment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2ba4ce29-244f-4e94-b0b4-f3b0cb372e27
📒 Files selected for processing (13)
docs/orchestration/vaults.mdprojects/core/ci_entrypoint/fournos_resolve.pyprojects/core/ci_entrypoint/run_ci.pyprojects/core/library/vault.pyprojects/fournos_launcher/orchestration/config.yamlprojects/fournos_launcher/orchestration/submit.pyprojects/fournos_launcher/toolbox/submit_and_wait/main.pyprojects/fournos_launcher/toolbox/submit_and_wait/templates/job.yaml.j2projects/legacy/library/run.pyprojects/llm_d_legacy/orchestration/ci.pyprojects/skeleton/orchestration/ci.pyprojects/skeleton/orchestration/test_skeleton.pypyproject.toml
✅ Files skipped from review due to trivial changes (3)
- pyproject.toml
- projects/fournos_launcher/orchestration/config.yaml
- projects/fournos_launcher/toolbox/submit_and_wait/main.py
🚧 Files skipped from review as they are similar to previous changes (4)
- projects/legacy/library/run.py
- projects/fournos_launcher/toolbox/submit_and_wait/templates/job.yaml.j2
- projects/skeleton/orchestration/ci.py
- projects/skeleton/orchestration/test_skeleton.py
36231f0 to
898f8c9
Compare
|
/test fournos skeleton |
|
/test fournos skeleton |
|
🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 20 seconds 🔴 • Link to the test results. • No reports index generated... Test configuration: |
|
🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 21 seconds 🔴 • Link to the test results. • No reports index generated... Test configuration: |
…d_save_pr_arguments
|
🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 38 seconds 🟢 • Link to the test results. • No reports index generated... • No test configuration ( |
|
/test fournos skeleton |
|
🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 38 seconds 🟢 • Link to the test results. • No reports index generated... • No test configuration ( |
|
🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 28 seconds 🟢 • Link to the test results. • No reports index generated... Test configuration: |
|
/test fournos skeleton |
|
🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 38 seconds 🟢 • Link to the test results. • No reports index generated... • No test configuration ( |
|
🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 26 seconds 🟢 • Link to the test results. • No reports index generated... Test configuration: |
f0b92d7 to
afb77e9
Compare
|
/test fournos skeleton |
|
🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 38 seconds 🟢 • Link to the test results. • No reports index generated... • No test configuration ( |
|
🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 28 seconds 🟢 • Link to the test results. • No reports index generated... Test configuration: |
|
/test fournos skeleton |
|
🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 38 seconds 🟢 • Link to the test results. • No reports index generated... • No test configuration ( |
|
🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 28 seconds 🟢 • Link to the test results. • No reports index generated... Test configuration: |
|
looks good, merging |
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores