Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

untar cpu usage #31765

Open
furkanmustafa opened this issue Mar 12, 2017 · 10 comments
Open

untar cpu usage #31765

furkanmustafa opened this issue Mar 12, 2017 · 10 comments

Comments

@furkanmustafa
Copy link

Description

When pulling images, docker's untar is utilizing 100% cpu, which should have definitely been nice'd IMHO. Have this been considered and is there any reason not to nice untar operations?

Additional Reasoning:

Not limited to this, but; when using docker swarm mode, if the manager node also scheduling containers on itself, it's losing it's leadership almost immediately, if the cpu is not fast enough to satisfy both untar and mysteriously too much cpu consuming swarm mode (for 3 managers, total 5 hosts).

Output of docker version:

Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64
 Experimental: false  

not sure if this qualifies as a bug or feature request.

@aluzzardi
Copy link
Member

Out of curiosity, what kind of tarball is it extracting on which kind of machines?

Also, on what kind of storage media is the machine writing to (is this a regular disk or NFS or ...)?

@furkanmustafa
Copy link
Author

furkanmustafa commented Mar 17, 2017

what kind of tarball

a simple rails docker image, of course a little big around 1~ GB, but not too much.

which kind of machines

single core very limited cpu vpses. bottleneck was the cpu.

what kind of storage media

how the provider is abstracting disks is not known to us. but for the "machine", they're just plain disks.

but. to make it simple to understand, our situation is;

  • we have 5 machines in total in our docker swarm. 3 nodes are managers.
  • all nodes are going to run docker containers
  • we use the command docker stack deploy .. on one of the managers
  • all machines starts to pull docker images, as they're getting containers scheduled on them. this includes managers too.
  • manager machines, while pulling the docker images, losing their leadership, cannot communicate a leader properly, lots of "context deadline exceeded" errors happen. just because of docker untar process has the same priority as the swarm managing tasks, and untar is pegging the cpu, swarms are not able to communicate..
  • it is also confirmed that this is not about network issue, docker registry bandwidth was around 80mbps per host, but while that was going on, I have performed iperf3 tests between machines, and achieved over 2gbps on all of them. and ping latencies were around 0~2 miliseconds.

as a double check. we have run docker pull on every host manually, and waiting for that to complete before actually running docker stack deploy ... this fixed the issue, and we never had "context deadline exceeded" problem this way and leader machine did not lose leadership, and none of the machines was "Unreachable" in the process.

Possible solutions or improvements for this -in my opinion- are;

  • Separate docker swarm process from other docker processes, and increase it's priority.
  • Tune raft or allow ways for users to tune parameters for raft in a way that'll work more robust, or more forgiving for small delays.
  • Of course, increasing 'nice' value for untar process, is necessary regardless of this. that clearly needs to be a low priority task, not just for swarm to communicate properly.

@furkanmustafa
Copy link
Author

Looks like no one cared.

I did not try with latest versions, but this is a serious issue with possibly easy solutions. Wouldn't it be better if this is fixed for 17.03, or at least just fixed for any version?

@thaJeztah
Copy link
Member

ping @unclejack - perhaps something of interest to you?

@furkanmustafa
Copy link
Author

checking after another year. no one cared.

even though its a very simple and good improvement.

@thaJeztah
Copy link
Member

@furkanmustafa you're welcome to work on it if you want

@furkanmustafa
Copy link
Author

I'd love to. as I am utilizing docker everyday on my work. But that will take a huge warming up / study time to dive into the code.

I'll see if someone in our company can dedicate some time into this.

@thaJeztah
Copy link
Member

Just note that "no one cared" doesn't sound nice 😅 . It's usually not that people "don't care" but there's a finite amount of engineering time so priorities have to be made; community participation can definitely help in that respect though.

For this enhancement, it would also be worth checking if containerd handles this better; the goal is to delegate image management (including pulling, extracting images) to containerd. This migration is not trivial, but actively being worked on, so for the long-term, it's best to focus on the image-handling in containerd, and see if improvements should be made there 👍

@furkanmustafa
Copy link
Author

Just note that "no one cared" doesn't sound nice

Didn't meant to.

Thanks for insight, that's helpful and makes sense.

@mishari
Copy link

mishari commented Nov 28, 2018

Hi,

I seem to be facing this scenario, perhaps? Docker is showing that docker-tar is taking up all my CPU, pushing load levels up to 14 on a dual core cpu.

Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:48:46 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false

3__tmux___home_mishari__ssh_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants