Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[systemd] Add value to LimitNOFILE due to performance problems #1735

Merged
merged 1 commit into from
May 6, 2020
Merged

[systemd] Add value to LimitNOFILE due to performance problems #1735

merged 1 commit into from
May 6, 2020

Conversation

stellirin
Copy link
Contributor

When k3s is installed on an OS with default high ulimits, performance
issues can be observed. This was discovered on CoreOS where the default
value is 1073741816. Symptoms include very slow file operations such
as installing a Rook/Ceph cluster will take ~6 hours instead of ~10 minutes.

A google search for 'container LimitNOFILE' will show that most major
projects set this already, including the (unused) containerd systemd unit
found in this repository at /vendor/github.com/containerd/containerd/containerd.service

k3OS is not affected becuasse the default there is already 1048576.

See description in coreos/fedora-coreos-tracker#329

When k3s is installed on an OS with default high ulimits, performance
issues can be observed. This was discovered on CoreOS where the default
value is 1073741816. Symptoms include very slow file operations such
as installing a Rook/Ceph cluster will take ~6 hours instead of ~10 minutes.

A google search for 'container LimitNOFILE' will show that most major
projects set this already, including the (unused) containerd systemd unit
found in this repository at /vendor/github.com/containerd/containerd/containerd.service

k3OS is not affected becuasse the default there is already 1048576.

See description in coreos/fedora-coreos-tracker#329
@brandond
Copy link
Member

brandond commented May 4, 2020

FWIW, I've found that setting a MemoryLimit also helps encourage golang to be a little more reasonable with its memory utilization. With it defaulting to unlimited, it seems to be much less aggressive about garbage collecting and will use a fairly significant chunk of system resources.

@erikwilson
Copy link
Contributor

Thanks for digging into this @stellirin! The bug seems gross. :( Is Python 2 the only culprit?

I think the fix is okay, but just want to ping @ibuildthecloud for lgtm / thoughts

@stellirin
Copy link
Contributor Author

I only know of this specific case, but kubernetes-sigs/kind#760 mentions issues with NFS and MySQL. I guess it requires a specific set of conditions, but it seems it is common enough that it justified the mitigation on the container daemon side.

Copy link
Contributor

@dweomer dweomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ibuildthecloud
Copy link
Contributor

The change seems good the comment confuses me though.

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.

That seems like the comment is saying it should be set to 0. I'm fine with the value of 1048576 as that is the expected value anyhow. I didn't know it could go to 1073741816

@ShylajaDevadiga
Copy link
Contributor

ShylajaDevadiga commented May 6, 2020

Verified on k3s v1.18.2-rc4+k3s1

  1. Set the LimitNOFILE=100000 and verified the limit is reflected using coreDns as example.
ps -aux|grep -i coredns
root      169713  2.0  0.7 146116 30496 ?        Ssl  22:25   0:00 /coredns -conf /etc/coredns/Corefile
rep file /proc/169713/limits
Max file size             unlimited            unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max open files            100000               100000               files     
Max file locks            unlimited            unlimited            locks  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants