untar cpu usage #31765

furkanmustafa · 2017-03-12T09:32:16Z

Description

When pulling images, docker's untar is utilizing 100% cpu, which should have definitely been nice'd IMHO. Have this been considered and is there any reason not to nice untar operations?

Additional Reasoning:

Not limited to this, but; when using docker swarm mode, if the manager node also scheduling containers on itself, it's losing it's leadership almost immediately, if the cpu is not fast enough to satisfy both untar and mysteriously too much cpu consuming swarm mode (for 3 managers, total 5 hosts).

Output of docker version:

Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64
 Experimental: false

not sure if this qualifies as a bug or feature request.

The text was updated successfully, but these errors were encountered:

aluzzardi · 2017-03-14T21:56:55Z

Out of curiosity, what kind of tarball is it extracting on which kind of machines?

Also, on what kind of storage media is the machine writing to (is this a regular disk or NFS or ...)?

furkanmustafa · 2017-03-17T13:37:29Z

what kind of tarball

a simple rails docker image, of course a little big around 1~ GB, but not too much.

which kind of machines

single core very limited cpu vpses. bottleneck was the cpu.

what kind of storage media

how the provider is abstracting disks is not known to us. but for the "machine", they're just plain disks.

but. to make it simple to understand, our situation is;

we have 5 machines in total in our docker swarm. 3 nodes are managers.
all nodes are going to run docker containers
we use the command docker stack deploy .. on one of the managers
all machines starts to pull docker images, as they're getting containers scheduled on them. this includes managers too.
manager machines, while pulling the docker images, losing their leadership, cannot communicate a leader properly, lots of "context deadline exceeded" errors happen. just because of docker untar process has the same priority as the swarm managing tasks, and untar is pegging the cpu, swarms are not able to communicate..
it is also confirmed that this is not about network issue, docker registry bandwidth was around 80mbps per host, but while that was going on, I have performed iperf3 tests between machines, and achieved over 2gbps on all of them. and ping latencies were around 0~2 miliseconds.

as a double check. we have run docker pull on every host manually, and waiting for that to complete before actually running docker stack deploy ... this fixed the issue, and we never had "context deadline exceeded" problem this way and leader machine did not lose leadership, and none of the machines was "Unreachable" in the process.

Possible solutions or improvements for this -in my opinion- are;

Separate docker swarm process from other docker processes, and increase it's priority.
Tune raft or allow ways for users to tune parameters for raft in a way that'll work more robust, or more forgiving for small delays.
Of course, increasing 'nice' value for untar process, is necessary regardless of this. that clearly needs to be a low priority task, not just for swarm to communicate properly.

furkanmustafa · 2017-08-22T03:49:38Z

Looks like no one cared.

I did not try with latest versions, but this is a serious issue with possibly easy solutions. Wouldn't it be better if this is fixed for 17.03, or at least just fixed for any version?

thaJeztah · 2017-09-16T16:23:39Z

ping @unclejack - perhaps something of interest to you?

furkanmustafa · 2018-09-13T14:27:22Z

checking after another year. no one cared.

even though its a very simple and good improvement.

thaJeztah · 2018-09-13T14:31:17Z

@furkanmustafa you're welcome to work on it if you want

furkanmustafa · 2018-09-13T14:37:26Z

I'd love to. as I am utilizing docker everyday on my work. But that will take a huge warming up / study time to dive into the code.

I'll see if someone in our company can dedicate some time into this.

thaJeztah · 2018-09-13T14:45:32Z

Just note that "no one cared" doesn't sound nice 😅 . It's usually not that people "don't care" but there's a finite amount of engineering time so priorities have to be made; community participation can definitely help in that respect though.

For this enhancement, it would also be worth checking if containerd handles this better; the goal is to delegate image management (including pulling, extracting images) to containerd. This migration is not trivial, but actively being worked on, so for the long-term, it's best to focus on the image-handling in containerd, and see if improvements should be made there 👍

furkanmustafa · 2018-09-13T14:50:16Z

Just note that "no one cared" doesn't sound nice

Didn't meant to.

Thanks for insight, that's helpful and makes sense.

mishari · 2018-11-28T09:42:56Z

Hi,

I seem to be facing this scenario, perhaps? Docker is showing that docker-tar is taking up all my CPU, pushing load levels up to 14 on a dual core cpu.

Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:48:46 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false

GordonTheTurtle added area/swarm version/17.03 labels Mar 12, 2017

thaJeztah added the area/performance label Mar 12, 2017

thaJeztah added the status/more-info-needed label Mar 15, 2017

thaJeztah removed the status/more-info-needed label Mar 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

untar cpu usage #31765

untar cpu usage #31765

furkanmustafa commented Mar 12, 2017

aluzzardi commented Mar 14, 2017

furkanmustafa commented Mar 17, 2017 •

edited

furkanmustafa commented Aug 22, 2017

thaJeztah commented Sep 16, 2017

furkanmustafa commented Sep 13, 2018

thaJeztah commented Sep 13, 2018

furkanmustafa commented Sep 13, 2018

thaJeztah commented Sep 13, 2018

furkanmustafa commented Sep 13, 2018

mishari commented Nov 28, 2018

untar cpu usage #31765

untar cpu usage #31765

Comments

furkanmustafa commented Mar 12, 2017

aluzzardi commented Mar 14, 2017

furkanmustafa commented Mar 17, 2017 • edited

furkanmustafa commented Aug 22, 2017

thaJeztah commented Sep 16, 2017

furkanmustafa commented Sep 13, 2018

thaJeztah commented Sep 13, 2018

furkanmustafa commented Sep 13, 2018

thaJeztah commented Sep 13, 2018

furkanmustafa commented Sep 13, 2018

mishari commented Nov 28, 2018

furkanmustafa commented Mar 17, 2017 •

edited