New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for host devices #5607

Closed
proppy opened this Issue Mar 18, 2015 · 80 comments

Comments

Projects
None yet
@proppy
Copy link
Contributor

proppy commented Mar 18, 2015

It would be nice if the container api payload had support for exposing host devices to the container (like docker run --device does).

The kubelet could pass it go-dockerclient once they add support for it (fsouza/go-dockerclient#241), or create container with the docker remote api by passing an addition member in the /create HostConfig payload:

{
    "PathOnHost": "/dev/deviceName",
    "PathInContainer": "/dev/deviceName",
    "CgroupPermissions": "mrw"
}
@proppy

This comment has been minimized.

Copy link
Contributor

proppy commented Mar 19, 2015

So it's now fixed on the go-dockerclient side fsouza/go-dockerclient@e4fcc92

@therc

This comment has been minimized.

Copy link
Contributor

therc commented Feb 10, 2016

We need to expose GPUs to the containers. I can write the PR and (given my previous experience with a types.go change) rebase it over and over. What are the odds of it getting accepted? The first thought that comes to mind is how to secure it, as in @erictune's linked issue.

@therc

This comment has been minimized.

Copy link
Contributor

therc commented Feb 10, 2016

I could place it behind a kubelet or apiserver flag, which is off by default.

@osterman

This comment has been minimized.

Copy link

osterman commented Jun 24, 2016

I need this feature so we can run s3fs inside of k8s. Will have to use fleet for now :(

@praoreo

This comment has been minimized.

Copy link

praoreo commented Sep 29, 2016

@proppy , @fsouza
Hi,

What is the syntax to mention device information in yaml/json file? I tried giving the below in .json file, but got "found invalid field device for v1.PodSpec" error. I am using 1.3.6 kubernetes version.

                                "device": {
                                        "PathOnHost": "/dev"
                                },
                                "nodeSelector": {
@maci0

This comment has been minimized.

Copy link

maci0 commented Oct 17, 2016

I don't think it has been implemented yet. But it seems what @therc wants to do has.
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/types.go has some nvidia stuff.

It's still missing support for other devices.
It used to work in docker with volume mounts, my guess it when they introduced --device they locked down the volume mounts using device cgroups
https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt

@farmdawgnation

This comment has been minimized.

Copy link

farmdawgnation commented Nov 1, 2016

Hey what's the status here?

We need this as well. I haven't contributed to Kubernetes before, but it looks like a lightweight way to provide some (initial) support for this is to support a container annotation for it. That looks like it would get piped into the PodSandboxContext which I could in turn use to pass the requisite arguments into the host config for the Docker container creation.

@drekle

This comment has been minimized.

Copy link

drekle commented Nov 2, 2016

I also believe that I need this. I will be trying to deploy a pod to a specific node which uses a napatech card or other networking devices.

@jbiel

This comment has been minimized.

Copy link

jbiel commented Nov 3, 2016

FWIW the following was working for me to pass through a sound card device ~3 months ago. Privileged mode was the key and according to the docs it looks like it should still work.

      containers:
      - name: foo
        ...
        volumeMounts:
        - mountPath: /dev/snd
          name: dev-snd
        securityContext:
          privileged: true
      volumes:
      - name: dev-snd
        hostPath:
          path: /dev/snd
@farmdawgnation

This comment has been minimized.

Copy link

farmdawgnation commented Nov 3, 2016

Yeah, that'll work with privileged mode. The rub is we run code in a multi-tenant environment so that's a non-starter for our security requirements. Mounting devices using --device is safer.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Nov 3, 2016

The status of this is that we have not had a proposal for an API to capture this. The issue is that the API needs to be plausible across multiple runtimes.

@farmdawgnation

This comment has been minimized.

Copy link

farmdawgnation commented Nov 7, 2016

@thockin Got it. I'm not super familiar with the Kubernetes proposal process, but I'm willing to suggest some things.

Does it need to be plausable across multiple runtimes or implementable across multiple runtimes? The latter would imply that if rkt doesn't support something, then we can't have any kind of support for it in Docker at all.

I know that there's already some pattern of using container annotations for things that are vendor specific. Is that an option here?

@maci0

This comment has been minimized.

Copy link

maci0 commented Dec 9, 2016

+1

@tcf909

This comment has been minimized.

Copy link

tcf909 commented Jan 22, 2017

+1

Currently hostDevice option requires full privileged security rather than cap_adds -- very uneasy about this vs segmented cap permissions.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Jan 23, 2017

@gavrie

This comment has been minimized.

Copy link

gavrie commented Jan 23, 2017

@thockin: It does sound interesting to use opaque integer resources for this. Is there a way to add metadata to such a resource? I couldn't find one in the documentation.

Your concern about growing a de facto API with annotations is understandable. On the other hand, it might be useful to provide a way to access devices and see how people use it in the real world before designing an API that then captures those real world requirements in a clean way.

Specifically, I'm interested in allocating node-local block devices to containers (or pods). If a node has a certain amount of local SSDs, I want to be able to use such an SSD directly from a pod. Metadata would include the capacity of the SSD, its device node, and maybe other fields such as device type.

The resource model design proposal mentions this, but it seems to be way down the road.

Would there be a simple way to allow usage of local devices on the short term and thereby gather real world requirements, without requiring the use of privileged containers?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Jan 23, 2017

@ConnorDoyle for opaque resources

@msau42 for local-storage stuff

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Jan 23, 2017

@dchen1107 @yujuhong I am anxious about an annotation for this, but it certainly has come up and we don't have a "real" answer yet.

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Jan 23, 2017

Local storage will be a long-term project. For the short term, the only ways now to utilize local SSDs are hostpath volumes or a distributed fs like glusterfs.

@maci0

This comment has been minimized.

Copy link

maci0 commented Jan 24, 2017

I also want this so, I can mount /dev/kvm into an unprivileged container.
Currently kubernetes has an alpha api to mount nvidia video cards into the container, does this work across all runtimes as well, if not why does --device need to be any different ?

@yujuhong

This comment has been minimized.

Copy link
Contributor

yujuhong commented Jan 24, 2017

@thockin, there are two APIs involved in this issue: the kubernetes api and the api between kubelet and the container runtime (a.k.a. CRI). For the former I think supporting opaque resources makes sense. As for the latter, the CRI already includes devices in its API in order to support the GPU devices in Alpha. The change was introduced in #35597.

@maci0

This comment has been minimized.

Copy link

maci0 commented Jan 24, 2017

Exactly my point. Most of the code should be there, the kubernetes api just doesn't expose that functionality yet.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Jan 24, 2017

@ConnorDoyle

This comment has been minimized.

Copy link
Member

ConnorDoyle commented Jan 25, 2017

Coming into this late, just adding some notes about opaque resources. Opaque Integer Resources (OIRs) are alpha as of v1.5. The missing feature to support this use case is how to extend node-level isolation to an opaque resource. There are discussions happening this month about how to accomplish that. This is happening in sig-node and the resource management workgroup. At last mention, @derekwaynecarr is working on a proposal for isolation extensions. At the same time, @vishh @dchen1107 and @thockin have asked for a proposal to explore some kind of lifecycle hook to let operators execute extra steps during pod/container setup and teardown.

Agree with @thockin on not putting device names into the pod spec. Unless Kubernetes will include specializations for all sorts of devices, users would need some external way (an API) to do the matchmaking from resource type to concrete device name on the host.

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Jan 25, 2017

I feel https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-hostpath-qualifiers.md can be extended to have kubelet auto whitelist hostpath devices for the respective containers.

@RRAlex

This comment has been minimized.

Copy link

RRAlex commented Feb 20, 2018

/remove-lifecycle rotten

For people wanting to run packer (with /dev/kvm) inside k8s, this is an essential issue.
--privileged is way overkill and a --device equivalent would be very useful, maybe via a hostDevice: option?

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Feb 20, 2018

@jiayingz does the device plugin interface cover this use case?

@jiayingz

This comment has been minimized.

Copy link
Member

jiayingz commented Feb 21, 2018

The device plugin API does allow a device plugin to pass Kubelet the host devices to be created in container runtime, but I am not sure what is the extra requirement of this feature.

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Feb 26, 2018

related to my issue earlier regarding character devices, I think it was a bug, fix is in #60440

k8s-merge-robot added a commit that referenced this issue Feb 27, 2018

Merge pull request #60440 from andrewsykim/andrewsykim/fix-char-devic…
…e-mount-bug

Automatic merge from submit-queue (batch tested with PRs 60433, 59982, 59128, 60243, 60440). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubelet: fix bug where character device is not recognized

**What this PR does / why we need it**:
Fixes a bug where character devices are not recognized by the kubelet because we return `FileTypeBlockDev` instead of `FileTypeCharDev`.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Related issue: #5607

**Special notes for your reviewer**:
Kubelet event for bug: #5607 (comment)
```
Warning		FailedMount		MountVolume.SetUp failed for volume "dev-fuse" : hostPath type check failed: /dev/fuse is not a character device
```

Commit where bug was introduced: 57ead48 
**Release note**:
```release-note
Fixes a bug where character devices are not recongized by the kubelet
```
@asigatchov

This comment has been minimized.

Copy link

asigatchov commented Feb 28, 2018

I need for /dev/dri/renderD128

@hiyijian

This comment has been minimized.

Copy link

hiyijian commented Mar 1, 2018

I need for /dev/infiniband

@hustcat

This comment has been minimized.

Copy link
Contributor

hustcat commented Mar 2, 2018

+1 for this.

I'v write a RDMA device plugin for kubernetes.

We need pass device /dev/infiniband/rdma_cm for all container which use RDMA device for run RDMA application in container.

@resouer

@cmluciano

This comment has been minimized.

Copy link
Member

cmluciano commented Mar 2, 2018

@hustcat this is awesome! Have you presented this on a Resource-Management WG call yet?

@hustcat

This comment has been minimized.

Copy link
Contributor

hustcat commented Mar 2, 2018

@cmluciano I didn't do that. I'd like to present this on next Resource-Management WG call.

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Mar 3, 2018

@msau42 @jiayingz This issue is out-of-date actually. CRI already support devices for long time and that's how device plugins works. But the user requirements here seems to be: specifying devices I want in the container. This is kinda conflict with DP right now which relies on Kubelet to maintain devices info and call Allocate(devIDs) to do that.

I would also suggest to close this one and create a new issue to track this, with the background of DP and CRI.

@hustcat How did you workaround /dev/infiniband/rdma_cm right now? Please update it in the new issue.

@t3hmrman

This comment has been minimized.

Copy link

t3hmrman commented May 22, 2018

He all, sorry to bump an old thread, but to my knowledge this doesn't actually fix the issue:

I'm trying to run QEMU (I found out that I'm basically walking in the footsteps of another developer formerly from CoreOS) and /dev/kvm use is looking to be a requirement for me (for both performance and some key QEMU features), but I don't want to run all privileged containers.

The new ticket seems to focus on making Device Plugins more flexible but when I read the device plugin docs, there is a huge issue -- devices can't be shared, and /dev/kvm is most certainly shared.

I would even settle for a way to run a privileged container but restrict the accessible device paths in the container. From what I can read the closes thing would be a current combination of privileged container and allowedHostPaths, but I don't think it quite works that way.

@guanyuding

This comment has been minimized.

Copy link

guanyuding commented Aug 21, 2018

+1 need for /dev/fuse

@micw

This comment has been minimized.

Copy link

micw commented Nov 5, 2018

@micw

This comment has been minimized.

Copy link

micw commented Nov 5, 2018

It should be possible to create a generic device plug-in which gets a device node as argument as well as a number of allowed instances.

@t3hmrman

This comment has been minimized.

Copy link

t3hmrman commented Nov 5, 2018

Hi @micw Thanks for the suggestion! That definitely looks like it would solve my problem (and others'), and it explains how kubevirt can get the functionality they provide.

Since posting here I've started using untrusted workload runtimes w/ containerd in combination with the kata-containers project to run pods in VMs, and in the future intend to use the runtime class proposal to solve this instead. These days kata-containers has a super easy to use installer as well, and I can only imagine that it will get better/easier as the runtime class proposal moves towards GA.

As far as running QEMU inside an actual pod, it seems like kubevirt or runtimeClass-annotation enabled controllers are the better way to go for now. That generic device plug-in does sound good though -- would likely solve all the other cases mentioned

@OJFord

This comment has been minimized.

Copy link

OJFord commented Nov 27, 2018

@thockin, you wrote:

Interestingly, to use /dev/fuse you have to be running with privileges anyway (right?) so you can literally hostPath mount /dev/fuse today. Not a great answer, but it seems to work.

and @dixudx said similar, but this is not true in Docker - 'just' cap_add: - SYS_ADMIN is enough.

@dinathom

This comment has been minimized.

Copy link

dinathom commented Dec 20, 2018

Is it true then that the only way in k8s to access host block device is to use a privileged container? So the volumeMode=Block would not mean anything (to read/write into the device) unless this is running in a privilege container?

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Dec 20, 2018

your application could read the block device directly too. stuff like, kvm would do that. or some databases. No privilege is needed in that case.

@dinathom

This comment has been minimized.

Copy link

dinathom commented Dec 21, 2018

@kfox1111 that seems different from what I read here: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged
how would the application have the ability to read the device unless the container allows it via some additional capabilities?

The reason kvm works ( did not test this but from the documentation ) is because of device plugins that whitelist /dev/kvm.

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Dec 21, 2018

I don't believe you need any special privilege in linux to read from a block device. only unix permissions to the block device. You do need special privilege to mount a block device. If the storage driver plumbs through the device and gives it the right permissions, I think it works.

I believe /dev/kvm is an entirely different thing as it isn't a blockdev.

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Dec 21, 2018

Hmm... no. there seems to be a capability restricted in docker by default that normal users on the host dosn't have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment