New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented memory and CPU limits for LCOW. #37296

Open
wants to merge 1 commit into
base: master
from

Conversation

@yusuf-gunaydin
Copy link

yusuf-gunaydin commented Jun 16, 2018

Signed-off-by: Yusuf Tarık Günaydın yusuf_tarik@hotmail.com

- What I did
Added implementation of --memory and --cpu switches for LCOW.

- How I did it
Filled the Resources object in spec.Windows for both Windows and Linux by separating it into a new method. Also separated the code that fills the CPU and memory part of the hcsshim configuration and called it for both Windows and Linux.

- How to verify it
docker run -it --memory=2G --cpus=8 alpine sh -c "nproc && cat /proc/meminfo"

- Description for the changelog

Added memory and CPU limits for LCOW.

- A picture of a cute animal (not mandatory but encouraged)

@mlaventure

This comment has been minimized.

Copy link
Contributor

mlaventure commented Jun 18, 2018

You need to run gofmt and goimport on the modified files.

Also, could you add unit tests?

@yusuf-gunaydin

This comment has been minimized.

Copy link

yusuf-gunaydin commented Jun 19, 2018

I use VSCode and it automatically applies those tools on save. I cannot see any formatting or importing issues. If there is, can yo point me to it?

For unit testing, I can add test to https://github.com/moby/moby/blob/master/integration-cli/docker_cli_run_test.go, however, the docs says it is deprecated. I couldn't find a way to launch LCOW test from the new integration test suite.

By the way #37294 is very similar to this one and it is getting merged without any tests.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Jun 25, 2018

Yeah, LCOW doesn't currently run in CI here (I think that's being worked on)

ping @johnstep @AntaresS ?

@yusuf-gunaydin looks like this needs a rebase (it shows there's a merge conflict); can you update the PR?

@GordonTheTurtle

This comment has been minimized.

Copy link

GordonTheTurtle commented Jun 26, 2018

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "lcow_limits" git@github.com:yusuf-gunaydin/moby.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354292296
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@yusuf-gunaydin yusuf-gunaydin force-pushed the yusuf-gunaydin:lcow_limits branch from 414577b to c98b26a Jun 26, 2018

@GordonTheTurtle GordonTheTurtle removed the dco/no label Jun 26, 2018

@yusuf-gunaydin

This comment has been minimized.

Copy link

yusuf-gunaydin commented Jun 27, 2018

@thaJeztah Resolved the conflict. However, I am not sure why Janky build fails. The failing test is TestSwarmClusterRotateUnlockKey and it doesn't seem to be related to this PR.

@lahma

This comment has been minimized.

Copy link

lahma commented Aug 17, 2018

May I enquire the status of this? I tested @yusuf-gunaydin 's branch and it indeed resolved the memory starvation we had by allowing more memory to be assigned via command line.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Aug 20, 2018

ping @johnstep @jhowardmsft PTAL

@yusuf-gunaydin looks like there's a merge-commit in this PR; could you rebase your branch, and squash the commits down to a single commit? Let me know if you need help doing so

@lahma

This comment has been minimized.

Copy link

lahma commented Aug 20, 2018

FYI, the memory limit actually works, but CPU doesn't. the configuration is valid and goes to daemon as expected, but Windows doesn't yet handle the parameter properly. So when --cpu 20 is used, container always only sees 1.

@yusuf-gunaydin yusuf-gunaydin force-pushed the yusuf-gunaydin:lcow_limits branch from 4e28575 to 74c57a3 Aug 22, 2018

@codecov

This comment has been minimized.

Copy link

codecov bot commented Aug 22, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@c7444a4). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #37296   +/-   ##
=========================================
  Coverage          ?   36.62%           
=========================================
  Files             ?      608           
  Lines             ?    45173           
  Branches          ?        0           
=========================================
  Hits              ?    16545           
  Misses            ?    26341           
  Partials          ?     2287

@yusuf-gunaydin yusuf-gunaydin force-pushed the yusuf-gunaydin:lcow_limits branch from 74c57a3 to 062ac48 Aug 22, 2018

@yusuf-gunaydin

This comment has been minimized.

Copy link

yusuf-gunaydin commented Aug 22, 2018

@thaJeztah I have applied the rebase. I assume the failing builds are unrelated as other pull requests fail with the same errors.

@lahma I tested with this command and result is correct:

docker run --rm -it --cpus=8 alpine sh -c "nproc"
8

This PR changes the limits on the created Hyper-V machine. I don't think there is an issue with Windows not supporting multiple CPUs with Hyper-V.

@lahma

This comment has been minimized.

Copy link

lahma commented Aug 22, 2018

@yusuf-gunaydin sorry, I left out the fact that my use case is with Windows Server 2019 Preview (build 17733 if I remember right). So it has lacking support for the CPU parameter at least per my testing (/proc/cpuinfo). Bleeding edge with quirks...

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

jhowardmsft commented Aug 22, 2018

@beweedon just in case. As mentioned offline

@richardgavel

This comment has been minimized.

Copy link

richardgavel commented Sep 23, 2018

If all the discussion with this PR is related to CPU and memory is fine, can a PR be generated which just allows LCOW to be used with more than the default (which makes SQL Server for Linux impossible to use with LCOW)? This has been an issue for months.

@jhowardmsft
Copy link
Contributor

jhowardmsft left a comment

I'm entirely in favour of these changes. In the current docker model where there's only one container per utility VM, this is the best that can really be done and mirrors what we do with Hyper-V WCOW containers. However, there was some pushback in previous PRs which did essentially exactly the same, IIRC from @thaJeztah. I can't recall what the pushback was. I believe there have been multiple attempts at this previously.

However, this SGTM. @johnstep PTAL

@@ -355,6 +320,48 @@ func (daemon *Daemon) createSpecLinuxFields(c *container.Container, s *specs.Spe
return nil
}

func setWindowsResourceLimits(c *container.Container, s *specs.Spec, isHyperV bool) {

This comment has been minimized.

@jhowardmsft

jhowardmsft Sep 24, 2018

Contributor

Nit - should this just be called SetResourcesInSpec to be more obvious what it's doing?

@@ -585,6 +564,36 @@ func (c *client) createLinux(id string, spec *specs.Spec, runtimeOptions interfa
return nil
}

func (c *client) setWindowsCPUMemoryLimits(spec *specs.Spec, configuration *hcsshim.ContainerConfig) {

This comment has been minimized.

@jhowardmsft

jhowardmsft Sep 24, 2018

Contributor

Nit - Might be more obvious if this was called something more like extractResourcesFromSpec

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

jhowardmsft commented Sep 24, 2018

Oh one more comment - these are containers on Windows, and the right thing to be controlling these things on Windows as it relates to the Utility VM for both WCOW and LCOW, is these two fields in the CLI: ---cpu-count and --cpu-percent, as opposed to --cpus. This controls the Hyper-V scheduler.

From docker run --help

      --cpu-count int                  CPU count (Windows only)
      --cpu-percent int                CPU percent (Windows only)

@yusuf-gunaydin yusuf-gunaydin force-pushed the yusuf-gunaydin:lcow_limits branch from 7ec2d61 to 201746c Sep 26, 2018

@yusuf-gunaydin

This comment has been minimized.

Copy link

yusuf-gunaydin commented Sep 27, 2018

@jhowardmsft I have made the requested changes.

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

jhowardmsft commented Sep 28, 2018

I don't think this is your change as it seems the same on master using Windows containers, but it's triggering against --cpus, as well as --cpu-count. I'm not sure when start came into being, but I wasn't the one who did most of the resource management implementation. Have verified it operates as expected for both memory and CPU, with the right document being passed to HCS. Example here for 5GB ram and 8 proc:

config={"SystemType":"container","Name":"aa690abd1c6b0894fc837d2a2e67212454ce305ff24d86f1008721cc7c956139","Owner":"docker","LayerFolderPath":"C:\\control\\lcow\\aa690abd1c6b0894fc837d2a2e67212454ce305ff24d86f1008721cc7c956139","Layers":[{"ID":"2e2ab5f1-6ea9-557b-af6c-c401a2a3db37","Path":"C:\\control\\lcow\\fb27f0fc7fbad278718cab180f3e0dcc238fc8509560b5db3e79804b0de1ec4a\\layer.vhd"}],"ProcessorCount":8,"MemoryMaximumInMB":5120,"HvPartition":true,"EndpointList":["F25FBCB5-9740-4B07-81E7-6B581DD1AACD"],"HvRuntime":{"ImagePath":"C:\\Program Files\\Linux Containers","LinuxInitrdFile":"initrd.img","LinuxKernelFile":"kernel"},"AllowUnqualifiedDNSQuery":true,"ContainerType":"linux","TerminateOnLastHandleClosed":true}

So LGTM. @johnstep & @thaJeztah PTAL.

PS E:\go\src\github.com\docker\docker> docker run -it --cpu-count=8 alpine sh -c "nproc"
8
PS E:\go\src\github.com\docker\docker> docker run -it --cpus=8 alpine sh -c "nproc"
8
PS E:\go\src\github.com\docker\docker> git branch
* lcow_limits
...snip....
PS E:\go\src\github.com\docker\docker> git log -1
commit 201746c83bca137ec338b71f7d0c650471dd2e3b (HEAD -> lcow_limits, yusuf/lcow_limits)
Author: Yusuf Tar<C4><B1>k G<C3><BC>nayd<C4><B1>n <yusuf_tarik@hotmail.com>
Date:   Sat Jun 16 10:01:35 2018 +0300

    Implemented memory and CPU limits for LCOW.

    Signed-off-by: Yusuf Tar<C4><B1>k G<C3><BC>nayd<C4><B1>n <yusuf_tarik@hotmail.com>
PS E:\go\src\github.com\docker\docker> docker version
...snip....
Server:
 Engine:
  Version:          0.0.0-dev
  API version:      1.39 (minimum version 1.24)
  Go version:       go1.11
  Git commit:       201746c83b
  Built:            09/28/2018 00:08:04
  OS/Arch:          windows/amd64
  Experimental:     true
PS E:\go\src\github.com\docker\docker>
@jazzdelightsme

This comment has been minimized.

Copy link

jazzdelightsme commented Nov 9, 2018

It looks like this change is ready to go. @thaJeztah can this please be merged?

@Iristyle

This comment has been minimized.

Copy link

Iristyle commented Nov 29, 2018

Also hoping that some form of this can be merged as we're running into problems running Linux containers on Windows due to OOM errors.

@MrLemur

This comment has been minimized.

Copy link

MrLemur commented Nov 30, 2018

As far as I can see after compiling moby from source, I am able to get memory limits working even in Docker compose. Think we are just waiting on this to be merged.

@thaJeztah thaJeztah added this to backlog in maintainers-session Nov 30, 2018

Iristyle added a commit to Iristyle/puppet-agent that referenced this pull request Dec 5, 2018

(maint) Disable Alpine agent builds in Azure
 - There are 2 current issues with building the Alpine based agent
   container in Azure:

   - LCOW doesn't support memory limits on containers and gcc currently
     gets an OOM (out of memory) error during compilation. This issue is
     documented in a few places, notably:

		 docker/for-win#1357
     moby/moby#37296

   - Multi-stage builds aren't fully supported in LCOW, though some
     portions are working. Of note, the following issue is still open:

		 Microsoft/opengcs#156

     And other issues have been resolved:

		 Microsoft/opengcs#168
     Microsoft/opengcs#169

     The LCOW "epic" should eventually be updated when support is complete:

     moby/moby#33850

     It's difficult at this time to tell if the current multi-stage
     support is adequate given the blocking memory problem. It may very
     well be enough.

 - For now, disable the Azure pipelines steps for building until the
   point at which they can be re-enabled.

Iristyle added a commit to Iristyle/puppet-agent that referenced this pull request Dec 5, 2018

(maint) Disable Alpine agent builds in Azure
 - There are 2 current issues with building the Alpine based agent
   container in Azure:

   - LCOW doesn't support memory limits on containers and gcc currently
     gets an OOM (out of memory) error during compilation. This issue is
     documented in a few places, notably:

		 docker/for-win#1357
     moby/moby#37296

   - Multi-stage builds aren't fully supported in LCOW, though some
     portions are working. Of note, the following issue is still open:

		 Microsoft/opengcs#156

     And other issues have been resolved:

		 Microsoft/opengcs#168
     Microsoft/opengcs#169

     The LCOW "epic" should eventually be updated when support is complete:

     moby/moby#33850

     It's difficult at this time to tell if the current multi-stage
     support is adequate given the blocking memory problem. It may very
     well be enough.

 - For now, disable the Azure pipelines steps for building until the
   point at which they can be re-enabled.

Iristyle added a commit to Iristyle/puppet-agent that referenced this pull request Dec 5, 2018

(maint) Disable Alpine agent builds in Azure
 - There are 2 current issues with building the Alpine based agent
   container in Azure:

   - LCOW doesn't support memory limits on containers and gcc currently
     gets an OOM (out of memory) error during compilation. This issue is
     documented in a few places, notably:

		 docker/for-win#1357
     moby/moby#37296

   - Multi-stage builds aren't fully supported in LCOW, though some
     portions are working. Of note, the following issue is still open:

		 Microsoft/opengcs#156

     And other issues have been resolved:

		 Microsoft/opengcs#168
     Microsoft/opengcs#169

     The LCOW "epic" should eventually be updated when support is complete:

     moby/moby#33850

     It's difficult at this time to tell if the current multi-stage
     support is adequate given the blocking memory problem. It may very
     well be enough.

 - For now, disable the Azure pipelines steps for building until the
   point at which they can be re-enabled.
@jhowardmsft

This comment has been minimized.

Copy link
Contributor

jhowardmsft commented Jan 7, 2019

@yusuf-gunaydin Would you be able to rebase this? Let's see if we can have another run at trying to get this in as there's so many people indicating that they have workloads that can't run due to being memory constrained with no means to configure it beyond the default.

@akiller

This comment has been minimized.

Copy link

akiller commented Jan 7, 2019

We're also running into this trying to serve a Tensorflow model container, it needs ~5GB of memory.

I'm new to Docker so apologies if this is a stupid/irrelevant question here but if this pull request gets accepted how long would it be expected to take before it gets incorporated into Docker Enterprise? We're running Server 2019 which I believe is what it ships with.

Implemented memory and CPU limits for LCOW.
Signed-off-by: Yusuf Tarık Günaydın <yusuf_tarik@hotmail.com>

@yusuf-gunaydin yusuf-gunaydin force-pushed the yusuf-gunaydin:lcow_limits branch from 201746c to 7744f44 Jan 12, 2019

@yusuf-gunaydin

This comment has been minimized.

Copy link

yusuf-gunaydin commented Jan 12, 2019

@jhowardmsft I have rebased again. Let's hope this would be the last one. 😇

@qacollective

This comment has been minimized.

Copy link

qacollective commented Jan 16, 2019

I've come here after a long voyage and would like to know how long after merge we should expect to see these changes in Docker for Windows. It's amazing how hard it has been to simply stand up an SQL Server back end container plus a ASP.NET IIS front end container that can communicate!! LCOW is currently being limited by the inability to have a container use more than 2Gb of RAM. Merge, go go go!

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

jhowardmsft commented Jan 16, 2019

@johnstep @thaJeztah Please can you take a look? There are so many people waiting on this long-standing change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment