Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration: "error reading the kernel parameter" errors during CI #39518

Open
thaJeztah opened this issue Jul 13, 2019 · 15 comments
Open

integration: "error reading the kernel parameter" errors during CI #39518

thaJeztah opened this issue Jul 13, 2019 · 15 comments
Labels
area/networking area/swarm area/testing kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.

Comments

@thaJeztah
Copy link
Member

Noticed that these errors are showing up in our CI;

time="2019-07-13T13:01:11.226286097Z" level=error msg="error reading the kernel parameter net.ipv4.neigh.default.gc_thresh3" error="open /proc/sys/net/ipv4/neigh/default/gc_thresh3: no such file or directory"
time="2019-07-13T13:01:11.226336974Z" level=error msg="error reading the kernel parameter net.ipv4.neigh.default.gc_thresh1" error="open /proc/sys/net/ipv4/neigh/default/gc_thresh1: no such file or directory"
time="2019-07-13T13:01:11.226383933Z" level=error msg="error reading the kernel parameter net.ipv4.neigh.default.gc_thresh2" error="open /proc/sys/net/ipv4/neigh/default/gc_thresh2: no such file or directory"
time="2019-07-13T13:01:11.401028844Z" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"

Error is coming from libnetwork;

var ovConfig = map[string]*kernel.OSValue{
"net.ipv4.neigh.default.gc_thresh1": {"8192", checkHigher},
"net.ipv4.neigh.default.gc_thresh2": {"49152", checkHigher},
"net.ipv4.neigh.default.gc_thresh3": {"65536", checkHigher},
}

// ApplyOSTweaks applies the configuration values passed as arguments
func ApplyOSTweaks(osConfig map[string]*OSValue) {
for k, v := range osConfig {
// read the existing property from disk
oldv, err := readSystemProperty(k)
if err != nil {
logrus.WithError(err).Errorf("error reading the kernel parameter %s", k)
continue
}
if propertyIsValid(oldv, v.Value, v.CheckFn) {
// write new prop value to disk
if err := writeSystemProperty(k, v.Value); err != nil {
logrus.WithError(err).Errorf("error setting the kernel parameter %s = %s, (leaving as %s)", k, v.Value, oldv)
continue
}
logrus.Debugf("updated kernel parameter %s = %s (was %s)", k, v.Value, oldv)
}
}
}

@thaJeztah thaJeztah added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. area/networking area/testing area/swarm labels Jul 13, 2019
@thaJeztah
Copy link
Member Author

ping @arkodg @euanh PTAL - are these settings "optional" (in which case they should be printed as an "info" or "warn" message), are the machines misconfigured, or is there a bug at hand, and we don't check for the right options?

@arkodg
Copy link
Contributor

arkodg commented Jul 15, 2019

@thaJeztah I guess we could change these to Warn but the bigger question is why are these failing ?
What is the linux version and distribution ?

@thaJeztah
Copy link
Member Author

@arkodg hm.. I just realised this error is only shown in the Swarm tests, and those run docker-in-docker, which may explain parts of it.

@cpuguy83
Copy link
Member

I think these params are only in the initial/root network namespace.

@iceback
Copy link

iceback commented Sep 20, 2019

I'm seeing this, in conjunction with these run-time failures in SAM environment

  1. requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.35/images/lambci/lambda:python3.7/json

  2. docker.errors.APIError: 500 Server Error: Internal Server Error ("stat /var/lib/docker/overlay/757677847aa8ac8ebc01ba73ee8733e0d6dfeff7072c5f1dfaf28b122efd9431: no such file or directory")
    2019-09-19 13:33:59 127.0.0.1 - - [19/Sep/2019 13:33:59] "GET /pedigree HTTP/1.1" 502 -

I'm on ubuntu18.04: there is no '/proc/sys/net/ipv4/vs' directory as is mentioned in the 'server docker status' output: 'error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"'
I've posted this issue on the SAM site

@michaelkrog
Copy link

michaelkrog commented Oct 23, 2019

I am seeing this too after a patch of kernel on Centos 7.

Oct 23 08:40:06 monitoring.codezoo.io dockerd[10366]: time="2019-10-23T08:40:06.058652873+02:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"

Docker fails to start....

EDIT
However. After af few restarts of the server and the docker servicer, it works again. Weird.

@prologic
Copy link
Contributor

Looks like I'm running into this as well in https://github.com/prologic/ulinux/issues/22

In my case the Docker daemon is running just fine and I can run local containers. What's not working is deploying services into an overlay network. The creation of an overlay network seemed tok work okay with docker network -d overlay lb.

I suspect I'm missing a Kernel config?

@thaJeztah
Copy link
Member Author

@prologic 👋 you can try the check-config.sh script, which could help finding things that are possibly missing; https://github.com/moby/moby/blob/master/contrib/check-config.sh

@prologic
Copy link
Contributor

@prologic 👋 you can try the check-config.sh script, which could help finding things that are possibly missing; https://github.com/moby/moby/blob/master/contrib/check-config.sh

I already did and have fixed since all Kernel related issues for uLinux. It can now run Docker and Docker Swarm clusters :)

@Jean-Daniel
Copy link

I am seeing this too after a patch of kernel on Centos 7.

Oct 23 08:40:06 monitoring.codezoo.io dockerd[10366]: time="2019-10-23T08:40:06.058652873+02:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"

Just for the record, net.ipv4.vs namespace is only present if the ip_vs kernel module is loaded.

modprobe ip_vs

And it can also be added in a file in /etc/modules-load.d/ to load it at boot time.

@lukeescude
Copy link

Any updates on this? Just installed Docker 19.03.14 on a fresh install of CentOS 7, and the manager will not join... Complains about net.ipv4.vs.expire_nodest_conn then starts doing this nonsense:

Jan 8 04:45:32 dallas-manager1 dockerd: time="2021-01-08T04:45:32.175394496Z" level=warning msg="sending message MsgHeartbeatResp to an unrecognized member ID 5cff323ab22907e5"
Jan 8 04:45:32 dallas-manager1 dockerd: time="2021-01-08T04:45:32.175462233Z" level=warning msg="ignored message MsgHeartbeatResp to unknown peer 5cff323ab22907e5" error="failed to resolve peer: failed to find longest active peer"

Figured it might be related?

@whynotask
Copy link

i'm experiencing similar issue. check-config.sh doesnt seems to be exactly giving any helpful issue to address.

@matfax
Copy link

matfax commented Jan 1, 2023

modprobe ip_vs and respectively adding ip_vs to /etc/modules-load.d/modules.conf seem to have resolved this issue for me.

@svictorino
Copy link

modprobe ip_vs and respectively adding ip_vs to /etc/modules-load.d/modules.conf seem to have resolved this issue for me.

The amount of sites I visited just to find the solution here... Thanks!

@cen1
Copy link

cen1 commented May 26, 2023

Same issue on Oracle Linux 9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking area/swarm area/testing kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

No branches or pull requests