Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA GPUs: add documentation on how to use (for users), and pointers to ebuilds (for devs) #1034

Closed
t-lo opened this issue May 22, 2023 · 9 comments
Assignees

Comments

@t-lo
Copy link
Member

t-lo commented May 22, 2023

No description provided.

@pothos
Copy link
Member

pothos commented May 22, 2023

Sayan has a draft ready, will be filed as PR soon.

One possible improvement I noticed is that we could switch the nvidia.service unit to be Type=oneshot and RemainAfterExit=true and set Before=containerd.service to prevent race conditions with containers that want to use the GPU.

@sayanchowdhury
Copy link
Member

@shsamkit
Copy link

Is there any work going on related to chaning nvidia.service unit to oneshot type? It would really help with sequencing the service containers.
ATM waitFor/required doesn't really work with nvidia.service because it doesn't wait for the service to be actually completed.

@pothos
Copy link
Member

pothos commented Jul 24, 2023

I suggested it but didn't start the work, feel free to file a PR with the above suggestions.

@shsamkit
Copy link

@pothos Looks like there may not be a need for changing the service-type afterall. I was able to change the service-type using the drop-in-units

If you want to try it out, this is the butane config I used

---
variant: flatcar
version: 1.0.0
storage:
  files:
    - path: /etc/systemd/system/nvidia.service.d/service-type-simple.conf
      contents:
        inline: |
          [Service]
          Type=oneshot

@pothos
Copy link
Member

pothos commented Jul 25, 2023

I think you should also have RemainAfterExit=true or it will run multiple times, and I think we should just do this by default, so if you want, you can file a PR where you set this and Before=containerd.service.
The file is here: https://github.com/flatcar/scripts/blob/main/sdk_container/src/third_party/coreos-overlay/x11-drivers/nvidia-drivers/files/units/nvidia.service

@shsamkit
Copy link

I have opened an issue and working on a PR for it.
About Before=containerd.service, a quick look at the installation script I assume the installation runs installation with containers. Not sure if that would work? I may be wrong tho

@jepio
Copy link
Member

jepio commented Jul 27, 2023

The installation uses systemd-nspawn and the idea was that containers may want to use GPUs so that's why ordering the nvidia.service before containerd.service.

@sayanchowdhury
Copy link
Member

We have added documentation for the users, and with the migration to the base image - the ebuild is simplified so closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants