Replies: 10 comments 6 replies
-
Tu as regardé pour les GPU/block devices ? |
Beta Was this translation helpful? Give feedback.
-
Simple Cuda code that detects GPUS and print some informations (may behave differently than nvidia-smi, so useful to check if a GPU is accessible for programming and not only for displaying with nvidia-smi)
(compile with |
Beta Was this translation helpful? Give feedback.
-
POC For Nvidia GPU access crontrol with systemdFake a JOB
Create a transcient slice
Set up properties of the slice without device /dev/nvidia0 allocation
Launch a shell inside the slice ant test
Now allow the /dev/nvidia0 device and test again
Kill the job
|
Beta Was this translation helpful? Give feedback.
-
Working solution: https://github.com/oar-team/oar3/tree/systemd , resulting in 2 files, OAR2/OAR3 compatibles:
currently in production on my "bigfoot" cluster (OAR 2.5.9) with 14 normal nodes and 1 experimental systemd node |
Beta Was this translation helpful? Give feedback.
-
Some refs:
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Nvidia just gave me this link as a reference to try to fix the original issue that urged me to use systemd and cgroupv2 (original problem was that a GH200 host frozed each time a nvidia-smi command was passed from inside a cpuset v1) |
Beta Was this translation helpful? Give feedback.
-
I missed the Also, need to see how to work nicely the AllowedMemoryNodes property with the dbus API syntax... |
Beta Was this translation helpful? Give feedback.
-
root@dahu-21:~# busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager SetUnitProperties 'sba(sv)' oar-f_n-jobid.slice 1 2 AllowedCPUs $(/home/pneyron/scm/hwloc/utils/hwloc/hwloc-calc --cpuset-output-format systemd-dbus-api node:1) AllowedMemoryNodes $(/home/pneyron/scm/hwloc/utils/hwloc/hwloc-calc --nodeset-output-format systemd-dbus-api node:1)
root@dahu-21:~# cat /sys/fs/cgroup/oar.slice/oar-f_n.slice/oar-f_n-jobid.slice/cpuset.cpus.effective
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
root@dahu-21:~# cat /sys/fs/cgroup/oar.slice/oar-f_n.slice/oar-f_n-jobid.slice/cpuset.mems.effective
1 😎 |
Beta Was this translation helpful? Give feedback.
-
Dans le job manager, phase d'init (préparation du slice):
On obtient ça:
Dans oarsh_shell, lancement du shell via systemd:
On voit bien le shell et ses process dans un scope du slice:
On tourne bien sur les cpu spécifiées:
Et dans le job_manager, au clean, on peut faire:
Hop:
En déléguant tout à systemd, j'ai l'impression qu'on aurait un truc bien propre, et on n'a même plus besoin de se soucier qu'on soit en cgroup v1 ou v2!
Beta Was this translation helpful? Give feedback.
All reactions