Add support for devices with "service create" #1244

flx42 · 2016-07-26T22:02:38Z

Initially reported: moby/moby#24865, but I realized it actually belongs here. Feel free to close the other one if you want. Content of the original issue copied below.

Related: #1030

Currently, it's not possible to add devices with docker service create, there is no equivalent for docker run --device=/dev/foo.

I'm an author of nvidia-docker with @3XX0 and we need to add devices files (the GPUs) and volumes to the starting containers in order to enable GPU apps as services.
See the discussion here: moby/moby#23917 (comment) (summarized below).

We figured out how to add a volume provided by a volume plugin:

$ docker service create --mount type=volume,source=nvidia_driver_367.35,target=/usr/local/nvidia,volume-driver=nvidia-docker [...]

But there is no solution for devices, @cpuguy83 and @justincormack suggested using --mount type=bind. But it doesn't seem to work, it's probably like doing a mknod but without the proper device cgroup whitelisting.

$ docker service create --mount type=bind,source=/dev/nvidiactl,target=/dev/nvidiactl ubuntu:14.04 sh -c 'echo foo > /dev/nvidiactl'
$ docker logs stupefied_kilby.1.2445ld28x6ooo0rjns26ezsfg
sh: 1: cannot create /dev/nvidiactl: Operation not permitted

It's probably equivalent to this:

$ docker run -ti ubuntu:14.04                      
root@76d4bb08b07c:/# mknod -m 666 /dev/nvidiactl c 195 255
root@76d4bb08b07c:/# echo foo > /dev/nvidiactl
bash: /dev/nvidiactl: Operation not permitted

Whereas the following works (invalid arg is normal, but no permission error):

$ docker run -ti --device /dev/nvidiactl ubuntu:14.04
root@ea53a1b96226:/# echo foo > /dev/nvidiactl
bash: echo: write error: Invalid argument

The text was updated successfully, but these errors were encountered:

stevvooe · 2016-07-26T22:48:15Z

@flx42 For the container runtime, devices require special handling (a mknod syscall), so mounts won't work. We'll probably have to add some sort of support for this. (cc @crosbymichael)

Ideally, we'd like to be able to schedule over devices, as well.

cpuguy83 · 2016-07-26T22:49:10Z

@stevvooe Already have device support in the runtime, just not exposed in swarm.

flx42 · 2016-07-26T22:54:57Z

Ideally, we'd like to be able to schedule over devices, as well.

This question was raised here: moby/moby#24750
But the discussion was redirected here: moby/moby#23917, in order to have a single discussion thread.

flx42 · 2016-07-28T00:04:47Z

@stevvooe I quickly hacked a solution, it's not too difficult:
flx42@a82b9fb
This is not a PR yet, would you be interested if I do one? Or are the swarmkit features frozen right now before 1.12?
The next step would be to also modify the engine API.

flx42 · 2016-07-28T00:07:31Z

Forgot to mention that I can now run GPU containers by mimicking what nvidia-docker does:

./bin/swarmctl service create --device /dev/nvidia-uvm --device /dev/nvidiactl --device /dev/nvidia0 --bind /var/lib/nvidia-docker/volumes/nvidia_driver/367.35:/usr/local/nvidia --image nvidia/digits:4.0 --name digits

stevvooe · 2016-07-28T00:22:56Z

@flx42 I took a quick peak and the PR looks like a decent start. I am not sure about representing these as cluster-level resources for container startup. From an orchestration perspective, we have to match these up with announced resources at the node level, which might be okay. It might be better on ContainerSpec, but I'm not sure yet.

Go ahead and file as a [WIP] PR.

flx42 · 2016-07-28T00:33:19Z

@stevvooe Yeah, that's the biggest discussion point for sure.

In engine-api, devices are resources:
https://github.com/docker/engine-api/blob/master/types/container/host_config.go#L249

But in swarmkit, resources are so far "fungible" objects like CPU shares and memory, with a base value and a limit. A device doesn't really fit that definition. For GPU apps we have devices that must be shared (/dev/nvidiactl) and devices that could be exclusively acquired (like /dev/nvidia0).

I decided to initially put devices into resources because there is already a function in swarmkit that creates a engine-api Resource object from a swarm Resource object:
https://github.com/docker/swarmkit/blob/master/agent/exec/container/container.go#L301-L324
This method would also need to access the container spec.

I will file a PR soon to continue the discussion.

stevvooe · 2016-07-28T01:52:41Z

@flx42 Great!

We really aren't planning on following the same resource model from HostConfig for SwarmKit. In this case, we are instructing the container to mount these devices, which is specific to a container runtime. Other runtimes may not have a container or devices. Thus, I would err on ContainerSpec.

Now, I would like to see scheduling of fungible GPUs but that might a wholly separate flow, keeping the initial support narrow. Such services would require manual constraint and device assignment, but you still achieve the goal.

Let's discuss this in the context of the PR.

aluzzardi · 2016-08-05T01:01:24Z

Thanks @flx42 - I think GPU is definitly something we want to support medium term.

/cc @mgoelzer

flx42 · 2016-08-10T01:19:14Z

Thanks @aluzzardi, PR created, it's quite basic.

mlhales · 2016-12-27T04:16:56Z

The --device option is really import for my use case too. I am trying to use swarm to manage 50 Raspberry Pi's to do computer vision, but I need to be able to access /dev/video0 to capture images. Without this option, I'm stuck, and have to manage them without swarm, which is painful.

stevvooe · 2017-01-06T22:13:18Z

@mlhales We need someone who is willing to workout the issues with --device in a clustered environment and support that solution, rather than just a drive by PR. If you or a colleague want to take this on, that would be great, but this isn't as simple as adding --device.

StefanScherer · 2017-02-15T22:51:00Z

Using --device=/dev/gpiomem would be great on a RPi swarm to access GPIO on each node without privileged mode.

nazar-pc · 2017-02-20T13:25:12Z

Using --device=/dev/fuse would be great for mounting FUSE, which isn't currently possible.

StefanScherer · 2017-02-20T13:27:30Z

We found an easier way for Blinkt! LED strip to use sysfs. Now we can run Blinkt! in docker swarm mode without privileges.

mathiasimmer · 2017-02-21T09:17:39Z

@StefanScherer is it a proper alternative for using e.g. --device=/dev/mem to access GPIO on a RPi ? Would love to see an example if you would care to share :)

StefanScherer · 2017-02-21T09:30:06Z

@mathiasimmer For the use-case with Blinkt! LED strip there are only eight RGB LED's. So using sysfs it not time critical for these few LED's. If you want to drive hundreds of them you still need faster GPIO access to have a higher clock rate. But for Blinkt! we have forked the Node.js module and adjusted in in this branch https://github.com/sealsystems/node-blinkt/tree/sysfs.
A sample application can be found as well and how to use this forked module as dependency in an own package.json.

aluzzardi · 2017-02-22T19:48:02Z

/cc @cyli

stevvooe · 2017-02-22T19:53:26Z

@aluzzardi I think we should resurrect the --device patch. I don't think there is anything in the pipeline that is sophisticated enough to handle proper, cluster-level dynamic resource allocation. Looking back at this issue, there isn't necessarily a model that will work well in all cases (mostly because no one here can seem to enumerate them).

We can always add logic in the scheduler to prevent device contention in the future.

cyli · 2017-02-22T23:43:53Z

Attempt to add devices to the container spec and plugin spec here: #1964

I've no objection to the --device flag - cc @diogomonica ?

diogomonica · 2017-02-23T03:53:51Z

--device allows any service to escalate privileges. Why would we add this w/out profiles on services?

cyli · 2017-02-23T03:56:07Z

@diogomonica I thought profiles mainly covered capabilities, etc?

diogomonica · 2017-02-23T04:06:29Z

@cyli well, if we believe "devices" are easy enough to understand for easy user acceptance then we might not need them, but we should look critically at adding anything that allows escalation of privileges of a container to the cmd-line before we have agood way of informing everything the service will need from a security perspective to the user.

brubbel · 2017-03-12T10:23:51Z

Also following this. Very interested in access to character devices (/dev/bus/usb/...) in a docker swarm.
To help some others until this is supported by docker, a workaround for swarm + usb:

On the (linux) host(s), create a udev rule which creates a symlink to your device (in my case an ftdi device). e.g. /etc/udev/rules.d/99-libftdi.rules
SUBSYSTEMS=="usb", ATTRS{idVendor}=="xxxx", ATTRS{idProduct}=="xxxx", GROUP="dialout", MODE="0666", SYMLINK+="my_ftdi", RUN+="/usr/bin/setupdockerusb.sh"
Then reload udev rules:
sudo udevadm control --reload-rules
Upon connect of the usb device, the udev manager will create a symlink /dev/my_ftdi -> /dev/bus/usb/xxx/xxx and execute /usr/bin/setupdockerusb.sh
The /usr/bin/setupdockerusb.sh (ref)
This script sets the character device permissions on (the first) container with given image name.

#!/bin/bash
USBDEV=`readlink -f /dev/my_ftdi`
read minor major < <(stat -c '%T %t' $USBDEV)
if [[ -z $minor || -z $major ]]; then
    echo 'Device not found'
    exit
fi
dminor=$((0x${minor}))
dmajor=$((0x${major}))
CID=`docker ps --no-trunc -q --filter ancestor=my/imagename|head -1`
if [[ -z $CID ]]; then
    echo 'CID not found'
    exit
fi
echo 'Setting permissions'
echo "c $dmajor:$dminor rwm" > /sys/fs/cgroup/devices/docker/$CID/devices.allow

Create the docker swarm with following options:
docker service create [...] --mount type=bind,source=/dev/bus/usb,target=/dev/bus/usb [...]
Event listener (systemd service):
Waits for a container to be started and sets permissions. Run with root permissions on host.

#!/bin/bash
docker events --filter 'event=start'| \
while read line; do
    /usr/bin/setupdockerusb.sh
done

allfro · 2021-03-30T13:08:38Z

We NEED device mapping for swarms. I'd hate to switch over to Kubernetes for something as trivial as mapping common devices such as /dev/tun across a cluster. We beg you Docker!

cpuguy83 · 2021-03-30T14:21:26Z

Maybe stop begging someone else to write features you need?
That is why there is exactly one person working on this repo... in their spare time.

allfro · 2021-03-30T15:28:43Z

@cpuguy83 isn't swarmkit developed by Docker corp which is also commercially sold as part of Docker EE?

cpuguy83 · 2021-03-30T17:17:00Z

@allfro No. Docker sold off the EE stuff to Mirantis... but even before then Swarmkit had very little support.

mdegans · 2021-10-08T21:22:59Z

The docs say to use:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

But unsurprisingly this is broken in swarm.

Time to move to k8s.

mdegans · 2021-10-08T21:23:31Z

Maybe stop begging someone else to write features you need? That is why there is exactly one person working on this repo... in their spare time.

Maybe they can maintain their stuff? How long has GPU support been broken in swarm?

prologic · 2021-12-01T01:21:39Z

I know this is a 6 year old issue, but is there actually an open PR for this that just needs a bit of attention? Maybe I could help finish the code required to support this? 🤔

Stefan592 · 2021-12-20T06:04:19Z

Solution does not work for Debian11 Bullseye.
Is there a new workaround for this?

#1244 (comment)

MohammedNoureldin · 2022-01-14T02:14:40Z

Hey, @allfro
Have you found a solution? I need the exact same usage like you (for tun device). Did you switch to another solution or have you figured out a workaround?

radeksh · 2022-06-09T09:19:03Z

How can I help to finish that feature?

pjalusic · 2022-07-08T08:27:53Z

I really like workaround from @BretFisher #1244 (comment) and here is how I adapted it for nodes that require a device:

connect that special container to stack network to make it behave like part of the stack (shutting down, redeploying). Otherwise it will keep running forever and require manual removal
extract that service to separate docker-compose.yml and save it on each node that requires a device. Run command will be less messy

Putting it all together, your services will have to change from this:

services:
  my-service-starter:
    image: docker
    command: 'docker run --name <name> --device /dev/bus/usb -e TOKEN=1234 -p 5000:5000 <image>'
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      placement:
        constraints:
          - node.labels.device_required == true

to this:

services:
  my-service-handler:
    image: docker
    command: 'docker-compose -f /docker-compose.yml up'
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /home/ubuntu/docker-compose.yml:/docker-compose.yml
    deploy:
      placement:
        constraints:
          - node.labels.device_required == true

networks:
  default:
    name: my_network
    driver: overlay
    attachable: true

(on manager)
and

services:
  <name>:
    image: <image>
    restart: always
    container_name: <name>
    devices:
      - /dev/bus/usb
    environment:
      - TOKEN=1234
    ports:
      - 5000:5000

networks:
  default:
    name: my_network
    external: true

(/home/ubuntu/docker-compose.yml on nodes that require a device)

bighb69738 · 2022-08-15T02:24:28Z

Hi @pjalusic ,

services:
  <name>:
    image: <image>
    restart: always
    container_name: <name>
    devices:
      - /dev/bus/usb
    environment:
      - TOKEN=1234
    ports:
      - 5000:5000

networks:
  default:
    name: my_network
    external: true

But in the worker node, I need to depend on another servcie from manager node.
Could you give me a example for the docker-compose.yml of worker node to add "depends_on" tag.

allfro · 2022-11-18T01:39:25Z

I developed a plugin in the end that allows me to map devices to containers: https://github.com/allfro/device-volume-driver. Hope it helps others. Unfortunately, it only works on systems that use cgroup v1 (alpine). I am looking for some help to develop the cgroupv2 support into the plugin. It works really well and I've used it to containerize x11 desktops that require access to fuse and the vmware graphics devices.

cc: @MohammedNoureldin

zikaeroh · 2022-12-24T23:31:36Z

After planning to redo my home server setup with swarm (so I can have multiple nodes), I discovered that this wasn't supported, and I needed it for VAAPI.

After looking through things, it seemed to me like this was a plumbing (and developer-hour) problem. Basing things on a previous PR series which added ulimit support to swarm, here is a chain of PRs which add devices in the most boring way; just plumbing it through the API as-is, no special management or API. Just what docker already supported outside of swarm.

I'm sure I've missed something, and I don't quite know how to get everything building together to test this (I typically run things from my package manager's installed docker), but maybe someone is willing to try the above out.

vadd98 · 2023-02-02T11:41:36Z

Hi, I'm trying the workaround in #1244 (comment) and it indeed works, but when I remove the stack the handler is successfully removed, while the privileged container in docker-compose.yml continue running and has to be killed manually using docker kill.

Any idea on what could be the issue?

coltonbh · 2023-02-27T04:00:47Z

The docs say to use:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

But unsurprisingly this is broken in swarm.

Time to move to k8s.

These docs are referencing the API for docker compose, not swarm (services or stacks). The correct (and functioning) API for a stack is:

services:
  my-gpu-service:
    ...
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
              kind: "NVIDIA-GPU"
              value: 2

This works if you've registered your GPUs in the /etc/docker/daemon.json file.

For anyone looking for device support for NVIDIA GPUs using Swarm I did a quick write up here summarizing two solutions. My write up was heavily inspired by the original gist I found on the subject here.

zeppmg · 2023-03-08T16:56:00Z

Hello,
Thanks for the tip. Anyway in my case it doesn't work. After having investigated step by step, I've realized that I don't have a /sys/fs/cgroup/devices folder on any of my swarm nodes. Does anyone have an idea of where this can come from ?

sudo ls /sys/fs/cgroup
cgroup.controllers      cgroup.stat             cpuset.cpus.effective  dev-mqueue.mount  io.pressure       memory.stat                    sys-kernel-debug.mount
cgroup.max.depth        cgroup.subtree_control  cpuset.mems.effective  init.scope        io.stat           -.mount                        sys-kernel-tracing.mount
cgroup.max.descendants  cgroup.threads          cpu.stat               io.cost.model     memory.numa_stat  sys-fs-fuse-connections.mount  system.slice
cgroup.procs            cpu.pressure            dev-hugepages.mount    io.cost.qos       memory.pressure   sys-kernel-config.mount        user.slice

reisholmes · 2023-03-12T13:18:00Z

Hello, Thanks for the tip. Anyway in my case it doesn't work. After having investigated step by step, I've realized that I don't have a /sys/fs/cgroup/devices folder on any of my swarm nodes. Does anyone have an idea of where this can come from ?

sudo ls /sys/fs/cgroup
cgroup.controllers      cgroup.stat             cpuset.cpus.effective  dev-mqueue.mount  io.pressure       memory.stat                    sys-kernel-debug.mount
cgroup.max.depth        cgroup.subtree_control  cpuset.mems.effective  init.scope        io.stat           -.mount                        sys-kernel-tracing.mount
cgroup.max.descendants  cgroup.threads          cpu.stat               io.cost.model     memory.numa_stat  sys-fs-fuse-connections.mount  system.slice
cgroup.procs            cpu.pressure            dev-hugepages.mount    io.cost.qos       memory.pressure   sys-kernel-config.mount        user.slice

Also in this same situation. I was using this solution to passthrough the iGPU driver to PLEX on a dockerswarm host for hardware transcoding: https://pastebin.com/XY7GP18T

I had some new hardware which required running the latest version of ubuntu to recognise it but this uses the cgroups v2. At the moment I reverted back to using cgroups v1 via these instructions to get this working again: https://sleeplessbeastie.eu/2021/09/10/how-to-enable-control-group-v2/
Key bit:
$ sudo sed -i -e 's/^GRUB_CMDLINE_LINUX=""/GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"/' /etc/default/grub $ sudo update-grub $ sudo reboot

I will experiment with moving to cgroups v2 and a combination of generic resource advertising the iGPU to the service as soon as I have time via these two hints as outlined by @coltonbh :

https://gist.github.com/coltonbh/374c415517dbeb4a6aa92f462b9eb287
https://docs.docker.com/compose/gpu-support/#enabling-gpu-access-to-service-containers

If anyone has any idea how to correctly advertise a quicksync driver to a cgroup v2 using dockerswarm it would be highly appreciated. Alternatively, I guess I could migrate to kubernetes ;)

jvrobert · 2023-04-13T20:10:33Z

I'm getting strong "the perfect is the enemy of the good" vibes from this issue. Strongly in favor of just passing through the devices options and letting buyer beware.

allfro · 2023-07-03T14:03:56Z

I've written this hack and tried it with plex and it seems to work: https://github.com/allfro/device-mapping-manager. Essentially it runs a privileged container which listens for docker create events and inspects the mount points. If a mount is within the /dev folder, it will walk the mount path for character and block devices and apply the necessary device rules to make the devices available. This doesn't work with fuse yet because the default apparmor profile blocks mounts (ugh!) but it does work with graphics cards and other devices that don't require operations that are blocked by Docker's apparmor profile. It is inspired by the previous comments.

allfro · 2023-07-03T17:32:28Z

If anyone has any idea how to correctly advertise a quicksync driver to a cgroup v2 using dockerswarm it would be highly appreciated. Alternatively, I guess I could migrate to kubernetes ;)

@reisholmes check this out: https://github.com/allfro/device-mapping-manager

flx42 mentioned this issue Aug 10, 2016

[WIP] api: add support for devices #1355

Closed

AkihiroSuda mentioned this issue Sep 14, 2016

docker service create doesn't allow --privileged flag moby/moby#24862

Open

thaJeztah mentioned this issue Nov 21, 2016

[epic] add more options to service create / service update moby/moby#25303

Open

This was referenced Apr 4, 2017

[PROPOSAL] Third Party Resources support #2089

Closed

Third Party Resources support #2090

Merged

chrisns added a commit to chrisns/clustered_domoticz_zwave that referenced this issue Apr 14, 2017

tweak to make USB devices maybe work based on moby/swarmkit#1244

17bbaeb

jeremyhayes mentioned this issue Aug 26, 2021

monitor upsd jeremyhayes/homelab#1

Open

H-Sf mentioned this issue Aug 31, 2022

Unable to use mazzolino/docker arm32v7 djmaze/dind-image-with-armhf#2

Closed

This was referenced Dec 24, 2022

Add support for devices #3106

Open

Add support for devices in swarm moby/moby#44695

Draft

Add support for devices in swarm docker/cli#3930

Draft

vadd98 mentioned this issue Feb 2, 2023

Start redroid as service in Docker Swarm remote-android/redroid-doc#312

Closed

bluepuma77 mentioned this issue Apr 17, 2023

Feature request: Enable "--device" in "stack deploy" for Confidential Computing #3129

Open

tedivm mentioned this issue Jul 11, 2023

Add support for USB TTY devices allfro/device-mapping-manager#4

Closed

cf-sewe mentioned this issue Oct 4, 2023

Does the mapper work with systemd cgroup driver allfro/device-mapping-manager#10

Open

Add support for devices with "service create" #1244

Add support for devices with "service create" #1244

Comments

flx42 commented Jul 26, 2016 • edited

stevvooe commented Jul 26, 2016

cpuguy83 commented Jul 26, 2016

flx42 commented Jul 26, 2016

flx42 commented Jul 28, 2016

flx42 commented Jul 28, 2016

stevvooe commented Jul 28, 2016

flx42 commented Jul 28, 2016

stevvooe commented Jul 28, 2016

aluzzardi commented Aug 5, 2016

flx42 commented Aug 10, 2016

mlhales commented Dec 27, 2016

stevvooe commented Jan 6, 2017

StefanScherer commented Feb 15, 2017

nazar-pc commented Feb 20, 2017

StefanScherer commented Feb 20, 2017

mathiasimmer commented Feb 21, 2017

StefanScherer commented Feb 21, 2017

aluzzardi commented Feb 22, 2017

stevvooe commented Feb 22, 2017

cyli commented Feb 22, 2017

diogomonica commented Feb 23, 2017

cyli commented Feb 23, 2017

diogomonica commented Feb 23, 2017

brubbel commented Mar 12, 2017 • edited

allfro commented Mar 30, 2021

cpuguy83 commented Mar 30, 2021

allfro commented Mar 30, 2021

cpuguy83 commented Mar 30, 2021

mdegans commented Oct 8, 2021

mdegans commented Oct 8, 2021

prologic commented Dec 1, 2021

Stefan592 commented Dec 20, 2021

MohammedNoureldin commented Jan 14, 2022

radeksh commented Jun 9, 2022

pjalusic commented Jul 8, 2022

bighb69738 commented Aug 15, 2022

allfro commented Nov 18, 2022 • edited

zikaeroh commented Dec 24, 2022

vadd98 commented Feb 2, 2023

coltonbh commented Feb 27, 2023

zeppmg commented Mar 8, 2023

reisholmes commented Mar 12, 2023

jvrobert commented Apr 13, 2023

allfro commented Jul 3, 2023

allfro commented Jul 3, 2023 • edited

flx42 commented Jul 26, 2016 •

edited

brubbel commented Mar 12, 2017 •

edited

allfro commented Nov 18, 2022 •

edited

allfro commented Jul 3, 2023 •

edited