Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for host devices #5607

Closed
Tracked by #12
proppy opened this issue Mar 18, 2015 · 90 comments
Closed
Tracked by #12

add support for host devices #5607

proppy opened this issue Mar 18, 2015 · 90 comments
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@proppy
Copy link
Contributor

proppy commented Mar 18, 2015

It would be nice if the container api payload had support for exposing host devices to the container (like docker run --device does).

The kubelet could pass it go-dockerclient once they add support for it (fsouza/go-dockerclient#241), or create container with the docker remote api by passing an addition member in the /create HostConfig payload:

{
    "PathOnHost": "/dev/deviceName",
    "PathInContainer": "/dev/deviceName",
    "CgroupPermissions": "mrw"
}
@cjcullen cjcullen added priority/backlog Higher priority than priority/awaiting-more-evidence. team/cluster labels Mar 18, 2015
@proppy
Copy link
Contributor Author

proppy commented Mar 19, 2015

So it's now fixed on the go-dockerclient side fsouza/go-dockerclient@e4fcc92

@therc
Copy link
Member

therc commented Feb 10, 2016

We need to expose GPUs to the containers. I can write the PR and (given my previous experience with a types.go change) rebase it over and over. What are the odds of it getting accepted? The first thought that comes to mind is how to secure it, as in @erictune's linked issue.

@therc
Copy link
Member

therc commented Feb 10, 2016

I could place it behind a kubelet or apiserver flag, which is off by default.

@bgrant0607 bgrant0607 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 17, 2016
@osterman
Copy link

I need this feature so we can run s3fs inside of k8s. Will have to use fleet for now :(

@praoreo
Copy link

praoreo commented Sep 29, 2016

@proppy , @fsouza
Hi,

What is the syntax to mention device information in yaml/json file? I tried giving the below in .json file, but got "found invalid field device for v1.PodSpec" error. I am using 1.3.6 kubernetes version.

                                "device": {
                                        "PathOnHost": "/dev"
                                },
                                "nodeSelector": {

@maci0
Copy link

maci0 commented Oct 17, 2016

I don't think it has been implemented yet. But it seems what @therc wants to do has.
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/types.go has some nvidia stuff.

It's still missing support for other devices.
It used to work in docker with volume mounts, my guess it when they introduced --device they locked down the volume mounts using device cgroups
https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt

@farmdawgnation
Copy link

Hey what's the status here?

We need this as well. I haven't contributed to Kubernetes before, but it looks like a lightweight way to provide some (initial) support for this is to support a container annotation for it. That looks like it would get piped into the PodSandboxContext which I could in turn use to pass the requisite arguments into the host config for the Docker container creation.

@drekle
Copy link

drekle commented Nov 2, 2016

I also believe that I need this. I will be trying to deploy a pod to a specific node which uses a napatech card or other networking devices.

@jbiel
Copy link

jbiel commented Nov 3, 2016

FWIW the following was working for me to pass through a sound card device ~3 months ago. Privileged mode was the key and according to the docs it looks like it should still work.

      containers:
      - name: foo
        ...
        volumeMounts:
        - mountPath: /dev/snd
          name: dev-snd
        securityContext:
          privileged: true
      volumes:
      - name: dev-snd
        hostPath:
          path: /dev/snd

@farmdawgnation
Copy link

Yeah, that'll work with privileged mode. The rub is we run code in a multi-tenant environment so that's a non-starter for our security requirements. Mounting devices using --device is safer.

@thockin
Copy link
Member

thockin commented Nov 3, 2016

The status of this is that we have not had a proposal for an API to capture this. The issue is that the API needs to be plausible across multiple runtimes.

@farmdawgnation
Copy link

@thockin Got it. I'm not super familiar with the Kubernetes proposal process, but I'm willing to suggest some things.

Does it need to be plausable across multiple runtimes or implementable across multiple runtimes? The latter would imply that if rkt doesn't support something, then we can't have any kind of support for it in Docker at all.

I know that there's already some pattern of using container annotations for things that are vendor specific. Is that an option here?

@maci0
Copy link

maci0 commented Dec 9, 2016

+1

@tcf909
Copy link

tcf909 commented Jan 22, 2017

+1

Currently hostDevice option requires full privileged security rather than cap_adds -- very uneasy about this vs segmented cap permissions.

@thockin
Copy link
Member

thockin commented Jan 23, 2017 via email

@gavrie
Copy link

gavrie commented Jan 23, 2017

@thockin: It does sound interesting to use opaque integer resources for this. Is there a way to add metadata to such a resource? I couldn't find one in the documentation.

Your concern about growing a de facto API with annotations is understandable. On the other hand, it might be useful to provide a way to access devices and see how people use it in the real world before designing an API that then captures those real world requirements in a clean way.

Specifically, I'm interested in allocating node-local block devices to containers (or pods). If a node has a certain amount of local SSDs, I want to be able to use such an SSD directly from a pod. Metadata would include the capacity of the SSD, its device node, and maybe other fields such as device type.

The resource model design proposal mentions this, but it seems to be way down the road.

Would there be a simple way to allow usage of local devices on the short term and thereby gather real world requirements, without requiring the use of privileged containers?

@thockin
Copy link
Member

thockin commented Jan 23, 2017

@ConnorDoyle for opaque resources

@msau42 for local-storage stuff

@thockin
Copy link
Member

thockin commented Jan 23, 2017

@dchen1107 @yujuhong I am anxious about an annotation for this, but it certainly has come up and we don't have a "real" answer yet.

@msau42
Copy link
Member

msau42 commented Jan 23, 2017

Local storage will be a long-term project. For the short term, the only ways now to utilize local SSDs are hostpath volumes or a distributed fs like glusterfs.

@maci0
Copy link

maci0 commented Jan 24, 2017

I also want this so, I can mount /dev/kvm into an unprivileged container.
Currently kubernetes has an alpha api to mount nvidia video cards into the container, does this work across all runtimes as well, if not why does --device need to be any different ?

@yujuhong
Copy link
Contributor

@thockin, there are two APIs involved in this issue: the kubernetes api and the api between kubelet and the container runtime (a.k.a. CRI). For the former I think supporting opaque resources makes sense. As for the latter, the CRI already includes devices in its API in order to support the GPU devices in Alpha. The change was introduced in #35597.

@maci0
Copy link

maci0 commented Jan 24, 2017

Exactly my point. Most of the code should be there, the kubernetes api just doesn't expose that functionality yet.

@thockin
Copy link
Member

thockin commented Jan 24, 2017 via email

@ConnorDoyle
Copy link
Contributor

Coming into this late, just adding some notes about opaque resources. Opaque Integer Resources (OIRs) are alpha as of v1.5. The missing feature to support this use case is how to extend node-level isolation to an opaque resource. There are discussions happening this month about how to accomplish that. This is happening in sig-node and the resource management workgroup. At last mention, @derekwaynecarr is working on a proposal for isolation extensions. At the same time, @vishh @dchen1107 and @thockin have asked for a proposal to explore some kind of lifecycle hook to let operators execute extra steps during pod/container setup and teardown.

Agree with @thockin on not putting device names into the pod spec. Unless Kubernetes will include specializations for all sorts of devices, users would need some external way (an API) to do the matchmaking from resource type to concrete device name on the host.

@vishh
Copy link
Contributor

vishh commented Jan 25, 2017

I feel https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-hostpath-qualifiers.md can be extended to have kubelet auto whitelist hostpath devices for the respective containers.

@guanyuding
Copy link

+1 need for /dev/fuse

@micw
Copy link

micw commented Nov 5, 2018

@t3hmrman I found https://github.com/kubevirt/kubernetes-device-plugins/blob/master/docs/README.kvm.md which may solve your particular use case.

@micw
Copy link

micw commented Nov 5, 2018

It should be possible to create a generic device plug-in which gets a device node as argument as well as a number of allowed instances.

@t3hmrman
Copy link

t3hmrman commented Nov 5, 2018

Hi @micw Thanks for the suggestion! That definitely looks like it would solve my problem (and others'), and it explains how kubevirt can get the functionality they provide.

Since posting here I've started using untrusted workload runtimes w/ containerd in combination with the kata-containers project to run pods in VMs, and in the future intend to use the runtime class proposal to solve this instead. These days kata-containers has a super easy to use installer as well, and I can only imagine that it will get better/easier as the runtime class proposal moves towards GA.

As far as running QEMU inside an actual pod, it seems like kubevirt or runtimeClass-annotation enabled controllers are the better way to go for now. That generic device plug-in does sound good though -- would likely solve all the other cases mentioned

@OJFord
Copy link

OJFord commented Nov 27, 2018

@thockin, you wrote:

Interestingly, to use /dev/fuse you have to be running with privileges anyway (right?) so you can literally hostPath mount /dev/fuse today. Not a great answer, but it seems to work.

and @dixudx said similar, but this is not true in Docker - 'just' cap_add: - SYS_ADMIN is enough.

@dinathom
Copy link

Is it true then that the only way in k8s to access host block device is to use a privileged container? So the volumeMode=Block would not mean anything (to read/write into the device) unless this is running in a privilege container?

@kfox1111
Copy link

your application could read the block device directly too. stuff like, kvm would do that. or some databases. No privilege is needed in that case.

@dinathom
Copy link

@kfox1111 that seems different from what I read here: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged
how would the application have the ability to read the device unless the container allows it via some additional capabilities?

The reason kvm works ( did not test this but from the documentation ) is because of device plugins that whitelist /dev/kvm.

@kfox1111
Copy link

I don't believe you need any special privilege in linux to read from a block device. only unix permissions to the block device. You do need special privilege to mount a block device. If the storage driver plumbs through the device and gives it the right permissions, I think it works.

I believe /dev/kvm is an entirely different thing as it isn't a blockdev.

@kfox1111
Copy link

Hmm... no. there seems to be a capability restricted in docker by default that normal users on the host dosn't have.

@bluebeach
Copy link

+1
I need access /dev/mem in my unprivileged container .
any help!!!

@bluebeach
Copy link

I Just find a plugin that can support add device /dev/mem without privileged !!!
https://github.com/honkiko/k8s-hostdev-plugin

@shufanhao
Copy link
Contributor

I Just find a plugin that can support add device /dev/mem without privileged !!!
https://github.com/honkiko/k8s-hostdev-plugin

Actually, this solution also need run the daemonSet with securityContext: privileged: true

@shufanhao
Copy link
Contributor

+1
also need access /dev/mem in unprivileged container and don't want to run any pod with securityContext: privileged: true

@kfox1111
Copy link

What? According to the man page for /dev/mem (http://man7.org/linux/man-pages/man4/mem.4.html)

      "/dev/mem is a character device file that is an image of the main
       memory of the computer.  It may be used, for example, to examine (and
       even patch) the system."

If you can touch that file, you are privileged whether its flagged or not... That shoulnd't be handed over to unprivileged containers IMO.

@xt94c4t9ce
Copy link

I'm surprised this bug's closed because the original problem doesn't seem to be fixed.

On Kubernetes 1.18.8 with Docker 19.03.12, I'm not able to use a mapped host block device in a container without running the container in privileged mode.

The original problem here was that Docker's --device functionality wasn't available in Kubernetes, and that problem remains.

Or, is there a solution to this that I've missed? Thank you.

@patrijua
Copy link

I also find this surprising that there seems to not be a way to use host connected devices from containers without compromising security. We would need to access /dev/ttyUSB0 chardevice from container and we do not want to run anything as privileged. So if there's a solution, please share. Thanks!

@YaShanBoy
Copy link

It would be nice if the container api payload had support for exposing host devices to the container (like docker run --device does).

The kubelet could pass it go-dockerclient once they add support for it (fsouza/go-dockerclient#241), or create container with the docker remote api by passing an addition member in the /create HostConfig payload:

{
    "PathOnHost": "/dev/deviceName",
    "PathInContainer": "/dev/deviceName",
    "CgroupPermissions": "mrw"
}

How to integrate with k8s????

@pre
Copy link

pre commented Jan 18, 2021

As of 01/2021 it doesn't seem to be possible to mount eg. /dev/fuse without privileged:true

The relevant issue seems to be #7890

@pre
Copy link

pre commented Jan 23, 2021

Mouting host devices without privileged: true is possible via the Kubelet device api using a Device Manager!

See details in #7890 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests