Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing support for --cpu-rt-period and --cpu-rt-runtime #23430

Merged
merged 1 commit into from Oct 26, 2016

Conversation

@erikstmartin
Copy link
Contributor

commented Jun 10, 2016

- What I did
Added support for two new flags --cpu-rt-period and --cpu-rt-runtime combined with either the --privileged flag or --cap-add=sys_nice this allows processes within a container to prioritize themselves as real-time when CONFIG_RT_GROUP_SCHED is enabled in the kernel. See #22380. When this option is enabled each docker container's cgroup has to have cpu.rt_runtime_us set in order for the container to be able to prioritize its threads.

- How I did it

Support for changing the cgroup value already existed in runc, I updated all the necessary container configs to pass these values down to runc. I also implemented some sanity checks in the daemon.

- How to verify it
Ensure CONFIG_RT_GROUP_SCHED is enabled in the kernel (standard CentOS 7 install has this enabled). You can verify it's enabled by running
zcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHED or by checking for the existence of /sys/fs/cgroup/cpu.rt_runtime_us

You will also need to ensure that the system and all parent cgroups have values set for the period and runtime as this limits what the children can be set to. If you're familiar with these values you can set them appropriately, sane defaults would be rt_period_us = 1000000 and rt_runtime_us = 950000, they only need to be set once on the system.

sysctl kernel.sched_rt_period_us
sysctl kernel.sched_rt_runtime_us

cat /sys/fs/cgroup/cpu.rt_period_us
cat /sys/fs/cgroup/cpu.rt_runtime_us

cat /sys/fs/cgroup/docker/cpu.rt_period_us
cat /sys/fs/cgroup/docker/cpu.rt_runtime_us

Update: You can now set the values for the parent cgroups with a new set of daemon flags. The system value will still need to be set using sysctl.
docker daemon --cpu-rt-runtime=950000

Now you have a system you can test on. To validate this works start a container with the new flags (we'll add a ulimit just to test that works as expected too)

docker run --it --cpu-rt-runtime=95000 --ulimit rtprio=99 --cap-add=sys_nice debian:jessie

If you receive an error like this, it's because your parent cgroups dont have rt_runtime_us set, or it's a lower value than you're trying to set for your container. Also ensure they are set for your container /sys/fs/cgroup/docker/<container id>/cpu.rt_runtime_us (after the container has been started). if you're testing this inside a container.

docker: Error response from daemon: oci runtime error: process_linux.go:286: setting cgroup config for ready process caused "failed to write 950000 to cpu.rt_runtime_us: write /sys/fs/cgroup/cpu,cpuacct/docker/5024d2730a83e86efd9042ea99972d582c66830f4a3aa21c747d1c25c824eb75/cpu.rt_runtime_us: invalid argument".

Once inside the container you can attempt to change the priority of your shell
chrt -r -p 99 1

Then validate by calling
chrt -p 1

root@c1d2f4b7d579:/# chrt -r -p 99 1
root@c1d2f4b7d579:/# chrt -p 1
pid 1's current scheduling policy: SCHED_RR
pid 1's current scheduling priority: 99

You can also test the following error conditions.

  • Passing either flag without having the sys_nice capability or being privileged results in error (no point setting your runtime if you dont have permission to change your priority)
  • Passing either flag on a host with the kernel option disabled results in error
  • Passing a runtime greater than the period results in an error

- Description for the changelog
Support for --cpu-rt-period and --cpu-rt-runtime flags, allowing containers to run real-time threads when CONFIG_RT_GROUP_SCHED is enabled in the kernel.

- A picture of a cute animal (not mandatory but encouraged)

Picture of Gopher

@erikstmartin erikstmartin force-pushed the erikstmartin:realtime-threads branch from bbbc519 to 8612a12 Jun 10, 2016

@justincormack

This comment has been minimized.

Copy link
Contributor

commented Jun 10, 2016

Thanks for working on this!

}

if hc.Resources.CPURtRuntime > 0 && !hasCap(hc, "SYS_NICE") {
return fmt.Errorf("invalid --cpu-rt-runtime: changing to rt priority requires sys_nice capability")

This comment has been minimized.

Copy link
@justincormack

justincormack Jun 10, 2016

Contributor

I don't think I would check this, there are potentially other ways in which we could get capabilities, so I would just check the return codes when you actually try to change stuff and try to give a helpful message if it fails.

This comment has been minimized.

Copy link
@erikstmartin

erikstmartin Jun 10, 2016

Author Contributor

Remove just the capabilities check? or all the param sanity checking?

Also you wont see any errors until you make the syscall to change your priority, which would likely just be a permissions error and not overly helpful.

This comment has been minimized.

Copy link
@justincormack

justincormack Jun 10, 2016

Contributor

The capabilities check, it is a bit ugly and encodes a lot of presumed information about how capabilities are assigned which could change in future, and we don't do it anywhere else. EPERM in the syscall is pretty clear and you can provide a more helpful message there.

This comment has been minimized.

Copy link
@erikstmartin

erikstmartin Jun 10, 2016

Author Contributor

I got overly ambition trying to save people from themselves. I'll get this removed.

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 10, 2016

No problem, sorry for the crazy delay, been insanely caught up on other tasks.

Let me know if you have any questions/problems trying to test this. I admit it's a lot of steps to get setup to test if you don't already have this kernel param enabled, then the Docker inception thing, having to make sure the container you're testing inside of has rt_runtime. You almost need this feature to test this feature :)

@erikstmartin erikstmartin force-pushed the erikstmartin:realtime-threads branch from 8612a12 to 8b0b462 Jun 10, 2016

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 10, 2016

I've removed the capabilities check and associated unit tests, PTAL.

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2016

CI Failures are from the manual changes within vendor/ I was advised to do that for now here so that everything could be reviewed in one place.

Once docker/engine-api#262 is merged, this PR can be updated to pull in upstream and that failure will disappear.

@erikstmartin erikstmartin force-pushed the erikstmartin:realtime-threads branch from e936503 to 4aa30da Jun 13, 2016

@GordonTheTurtle GordonTheTurtle removed the dco/no label Jun 13, 2016

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 13, 2016

@erikstmartin yes, don't worry about the "failing-ci" label for now, it's just that we know it's failing (in this case due to the "vendor" CI). Should not be an issue while in design-review 👍

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Jul 20, 2016

I've updated this PR to include a --cpu-rt-runtime flag to the daemon. The daemon needs to be able to set the cgroup value otherwise a sysadmin will need to set the value after the first container is started (after every reboot)

@erikstmartin erikstmartin force-pushed the erikstmartin:realtime-threads branch 2 times, most recently from 3a39af7 to fe2f46e Jul 20, 2016

@jalaziz

This comment has been minimized.

Copy link

commented Jul 21, 2016

I haven't had a chance to test this, but I believe the daemon flag won't actually work correctly with newer systemd versions. Until opencontainers/runc#931 is fixed, the cpu cgroup path ends up being /sys/fs/cgroup/cpu/init.scope/system.slice. In order for everything to work correctly, rt_runtime and rt_period have to be recursively set down the path.

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2016

@jalaziz I'll double check, but i'm pretty sure I was developing this against 230.

@jalaziz

This comment has been minimized.

Copy link

commented Jul 25, 2016

@erikstmartin For reference, we're running CoreOS with systemd v229. However, I believe the problem exists on other OSes with newer systemd versions if you use --exec-opt native.cgroupdriver=systemd (I'm sure you already knew this).

I'm not sure if the cgroupParent is correct for the CPU subsystem when you call initCgroupsPath. Even if it is, though, the testing I've done seems to indicate you'd have to recursively set the runtime and period settings (e.g. for init.scope, then for init.scope/system.slice).

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2016

@jalaziz You're completely correct. I dont think I tried it using the systemd cgroup driver on that machine.

@erikstmartin erikstmartin force-pushed the erikstmartin:realtime-threads branch 3 times, most recently from 269ee59 to 1b749b4 Aug 16, 2016

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Aug 17, 2016

Latest push should handle climbing the cgroup tree and also account for init.scope in newer versions of systemd

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2016

LGTM

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2016

Moving to docs.

@thaJeztah
Copy link
Member

left a comment

docs LGTM

may need some changes in other parts of the docs after this is merged, so

ping @mstanleyjones

@vdemeester
Copy link
Member

left a comment

LGTM 🐸

@vdemeester vdemeester merged commit 80d6d2e into moby:master Oct 26, 2016

4 checks passed

docker/dco-signed All commits signed
Details
experimental Jenkins build Docker-PRs-experimental 25470 has succeeded
Details
janky Jenkins build Docker-PRs 34071 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 4942 has succeeded
Details
@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2016

🎉
Thanks @erikstmartin!

thaJeztah added a commit to thaJeztah/docker that referenced this pull request Jan 16, 2017

Add version annotation to various flags added in 1.13
Pull request moby#27745 added support for the
client to talk to older versions of the daemon. Various flags were added to
docker 1.13 that are not compatible with older daemons.

This PR adds annotations to those flags, so that they are automatically hidden
if the daemon is older than docker 1.13 (API 1.25).

Not all new flags affect the API (some are client-side only). The following
PR's added new flags to docker 1.13 that affect the API;

- moby#23430 added `--cpu-rt-period`and `--cpu-rt-runtime`
- moby#27800 / moby#25317 added `--group` / `--group-add` / `--group-rm`
- moby#27702 added `--network` to `docker build`
- moby#25962 added `--attachable` to `docker network create`
- moby#27998 added `--compose-file` to `docker stack deploy`
- moby#22566 added `--stop-timeout` to `docker run` and `docker create`
- moby#26061 added `--init` to `docker run` and `docker create`
- moby#26941 added `--init-path` to `docker run` and `docker create`
- moby#27958 added `--cpus` on `docker run` / `docker create`
- moby#27567 added `--dns`, `--dns-opt`, and `--dns-search` to `docker service create`
- moby#27596 added `--force` to `docker service update`
- moby#27857 added `--hostname` to `docker service create`
- moby#28031 added `--hosts`, `--host-add` / `--host-rm` to `docker service create` and `docker service update`
- moby#28076 added `--tty` on `docker service create` / `docker service update`
- moby#26421 added `--update-max-failure-ratio`, `--update-monitor` and `--rollback` on `docker service update`
- moby#27369 added `--health-cmd`, `--health-interval`, `--health-retries`, `--health-timeout` and `--no-healthcheck` options to `docker service create` and `docker service update`

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@IT-huang

This comment has been minimized.

Copy link

commented Mar 1, 2017

@erikstmartin thank you for reading my question.
i meet the same error you said.i submit my question in #31411
you said 'If you receive an error like this, it's because your parent cgroups dont have rt_runtime_us set, or it's a lower value than you're trying to set for your container.'
however i can not change docker cgroup 's rt_runtime_us to more than 0.
can you tell me how you solve this error??
Thank you very much !!!Thank you !

@erikstmartin

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2017

@IT-huang This is going to be dependent on what the parent cgroup of docker is, is it user.slice, etc. You need to make sure that the parents also have rt_runtime allocated. This could also be at the system level of the host itself. Check sysctl kernel.sched_rt_runtime_us and sysctl kernel.sched_rt_period_us

erikstmartin added a commit to erikstmartin/docker that referenced this pull request Mar 14, 2017

Add mount point to cgroup root when initializing cgroup paths for
cpu.rt_runtime

PR moby#23430 introduced a couple more
flags including `--cpu-rt-runtime` to the docker daemon. It appears
recent changes or merge issues may have broken this. It currently does
not take the cgroup mount point into account when determining the cgroup
files to write values to. This breaks docker setting its own
`cpu.rt_runtime` for the daemon. This also means containers aren't able
to set theirs.

Also, the cgroups.FindCgroupMountpointAndRoot returns back a mount point
that includes the cgroup of the currently running container when docker
is run inside a docker container. this breaks the `--cpu-rt-runtime`
flag when running docker in docker. A fix has been placed here, but
potentially could be pulled up into libcontainer if this is a better
place for it.

Signed-off-by: Erik St. Martin <alakriti@gmail.com>

adhulipa added a commit to adhulipa/docker that referenced this pull request Apr 7, 2017

Add mount point to cgroup root when initializing cgroup paths for
cpu.rt_runtime

PR moby#23430 introduced a couple more
flags including `--cpu-rt-runtime` to the docker daemon. It appears
recent changes or merge issues may have broken this. It currently does
not take the cgroup mount point into account when determining the cgroup
files to write values to. This breaks docker setting its own
`cpu.rt_runtime` for the daemon. This also means containers aren't able
to set theirs.

Also, the cgroups.FindCgroupMountpointAndRoot returns back a mount point
that includes the cgroup of the currently running container when docker
is run inside a docker container. this breaks the `--cpu-rt-runtime`
flag when running docker in docker. A fix has been placed here, but
potentially could be pulled up into libcontainer if this is a better
place for it.

Signed-off-by: Erik St. Martin <alakriti@gmail.com>

andrewhsu pushed a commit to docker/docker-ce that referenced this pull request May 19, 2017

Add version annotation to various flags added in 1.13
Pull request moby/moby#27745 added support for the
client to talk to older versions of the daemon. Various flags were added to
docker 1.13 that are not compatible with older daemons.

This PR adds annotations to those flags, so that they are automatically hidden
if the daemon is older than docker 1.13 (API 1.25).

Not all new flags affect the API (some are client-side only). The following
PR's added new flags to docker 1.13 that affect the API;

- moby/moby#23430 added `--cpu-rt-period`and `--cpu-rt-runtime`
- moby/moby#27800 / moby/moby#25317 added `--group` / `--group-add` / `--group-rm`
- moby/moby#27702 added `--network` to `docker build`
- moby/moby#25962 added `--attachable` to `docker network create`
- moby/moby#27998 added `--compose-file` to `docker stack deploy`
- moby/moby#22566 added `--stop-timeout` to `docker run` and `docker create`
- moby/moby#26061 added `--init` to `docker run` and `docker create`
- moby/moby#26941 added `--init-path` to `docker run` and `docker create`
- moby/moby#27958 added `--cpus` on `docker run` / `docker create`
- moby/moby#27567 added `--dns`, `--dns-opt`, and `--dns-search` to `docker service create`
- moby/moby#27596 added `--force` to `docker service update`
- moby/moby#27857 added `--hostname` to `docker service create`
- moby/moby#28031 added `--hosts`, `--host-add` / `--host-rm` to `docker service create` and `docker service update`
- moby/moby#28076 added `--tty` on `docker service create` / `docker service update`
- moby/moby#26421 added `--update-max-failure-ratio`, `--update-monitor` and `--rollback` on `docker service update`
- moby/moby#27369 added `--health-cmd`, `--health-interval`, `--health-retries`, `--health-timeout` and `--no-healthcheck` options to `docker service create` and `docker service update`

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 5d2722f83db9e301c6dcbe1c562c2051a52905db
Component: cli

albers added a commit to albers/docker-cli that referenced this pull request Jul 3, 2017

Add bash completion for `dockerd --cpu-rt-(period|runtime)`
This adds bash completion for moby/moby#23430.

Signed-off-by: Harald Albers <github@albersweb.de>

andrewhsu pushed a commit to docker/docker-ce that referenced this pull request Jul 14, 2017

Add bash completion for `dockerd --cpu-rt-(period|runtime)`
This adds bash completion for moby/moby#23430.

Signed-off-by: Harald Albers <github@albersweb.de>
Upstream-commit: 74a5d1af86a3d0898dda79bb13282f1b039b015c
Component: cli

alshabib added a commit to alshabib/cli that referenced this pull request Aug 1, 2017

Add bash completion for `dockerd --cpu-rt-(period|runtime)`
This adds bash completion for moby/moby#23430.

Signed-off-by: Harald Albers <github@albersweb.de>

srust added a commit to srust/moby that referenced this pull request Nov 30, 2017

Add version annotation to various flags added in 1.13
Pull request moby#27745 added support for the
client to talk to older versions of the daemon. Various flags were added to
docker 1.13 that are not compatible with older daemons.

This PR adds annotations to those flags, so that they are automatically hidden
if the daemon is older than docker 1.13 (API 1.25).

Not all new flags affect the API (some are client-side only). The following
PR's added new flags to docker 1.13 that affect the API;

- moby#23430 added `--cpu-rt-period`and `--cpu-rt-runtime`
- moby#27800 / moby#25317 added `--group` / `--group-add` / `--group-rm`
- moby#27702 added `--network` to `docker build`
- moby#25962 added `--attachable` to `docker network create`
- moby#27998 added `--compose-file` to `docker stack deploy`
- moby#22566 added `--stop-timeout` to `docker run` and `docker create`
- moby#26061 added `--init` to `docker run` and `docker create`
- moby#26941 added `--init-path` to `docker run` and `docker create`
- moby#27958 added `--cpus` on `docker run` / `docker create`
- moby#27567 added `--dns`, `--dns-opt`, and `--dns-search` to `docker service create`
- moby#27596 added `--force` to `docker service update`
- moby#27857 added `--hostname` to `docker service create`
- moby#28031 added `--hosts`, `--host-add` / `--host-rm` to `docker service create` and `docker service update`
- moby#28076 added `--tty` on `docker service create` / `docker service update`
- moby#26421 added `--update-max-failure-ratio`, `--update-monitor` and `--rollback` on `docker service update`
- moby#27369 added `--health-cmd`, `--health-interval`, `--health-retries`, `--health-timeout` and `--no-healthcheck` options to `docker service create` and `docker service update`

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.