Skip to content

Preinstall Soperator utility scripts to jail#1070

Merged
rdjjke merged 2 commits intodevfrom
jail-soperator-utils/0
Jun 26, 2025
Merged

Preinstall Soperator utility scripts to jail#1070
rdjjke merged 2 commits intodevfrom
jail-soperator-utils/0

Conversation

@rdjjke
Copy link
Collaborator

@rdjjke rdjjke commented Jun 26, 2025

Here’s a summary of the changes made in the PR:

1. Default Values for Slurm Configuration

  • The default value for the field taskPluginParam in SlurmConfig is changed from an empty string ("") to Autobind=Cores in:
    • Go struct tags (api/v1/slurmcluster_types.go)
    • The generated CRD YAMLs (config/crd/bases/slurm.nebius.ai_slurmclusters.yaml, helm/soperator-crds/templates/slurmcluster-crd.yaml, helm/soperator/crds/slurmcluster-crd.yaml)

2. Slurm Scheduler Configuration

  • Added CR_ONE_TASK_PER_CORE to the SelectTypeParameters in the Slurm config generation (internal/render/common/configmap.go) to improve core allocation granularity.

3. Jail Docker Image Enhancements

  • Two utility scripts are added to /opt/soperator_utils in the jail container:
    • soperator_instance_login.sh: Script to SSH/chroot into the systemd namespace of a worker node.
    • slurm_task_info.sh: Prints Slurm task resource bindings (node, rank, CPU, GPU, CUDA devices).

4. User Creation Script Update

  • createuser.sh now supports a --without-internal-ssh flag to skip generating an internal SSH key pair.

5. New Utility Scripts

  • images/jail/scripts/slurm_task_info.sh: Prints details about Slurm task resource bindings.
  • images/jail/scripts/soperator_instance_login.sh: Allows logging into the instance associated with a specific worker node or instance ID.

Summary:
The PR sets Autobind=Cores as the default for Slurm’s taskPluginParam, tweaks Slurm core scheduling, adds two admin/diagnostic scripts to the jail image, and enhances user creation with an option to skip internal SSH key generation.

@rdjjke rdjjke force-pushed the jail-soperator-utils/0 branch from 4286455 to 4d75a0a Compare June 26, 2025 10:20
@theyoprst theyoprst requested a review from Copilot June 26, 2025 10:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR preinstalls Soperator utility scripts into the jail container, enhances user creation with optional internal SSH key support, and updates the Slurm CRD defaults for taskPluginParam.

  • Adds two helper scripts (soperator_instance_login.sh, slurm_task_info.sh) and integrates them into the jail image
  • Extends createuser.sh with a --without-internal-ssh flag to control SSH key generation
  • Updates Helm charts, CRD YAMLs, and Go API types to default taskPluginParam to "Autobind=Cores"

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
internal/render/common/configmap.go Included CR_ONE_TASK_PER_CORE in SelectTypeParameters
images/jail/scripts/soperator_instance_login.sh New script for interactive instance login
images/jail/scripts/slurm_task_info.sh New script to display Slurm task resource bindings
images/jail/scripts/createuser.sh Added --without-internal-ssh option and conditional key gen
images/jail/jail.dockerfile Preinstalled Soperator scripts under /opt/soperator_utils
helm/soperator/crds/slurmcluster-crd.yaml Set default taskPluginParam: Autobind=Cores
helm/soperator-crds/templates/slurmcluster-crd.yaml Mirror default update for Helm chart
config/crd/bases/slurm.nebius.ai_slurmclusters.yaml Mirror default update for CRD base
api/v1/slurmcluster_types.go Updated Kubebuilder default annotation for TaskPluginParam
Comments suppressed due to low confidence (1)

images/jail/scripts/slurm_task_info.sh:9

  • Nested double quotes inside the command substitution will break the shell syntax. Use single quotes around the grep and sed patterns (e.g., grep 'Cpus_allowed_list:' | sed 's/Cpus_allowed_list:\t//') or escape the inner quotes.
    "$(cat /proc/self/status | grep "Cpus_allowed_list:" | sed "s/Cpus_allowed_list:\t//g")" \

@Uburro Uburro added the feature label Jun 26, 2025
@rdjjke rdjjke merged commit 20a7e01 into dev Jun 26, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants