Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent kube-proxy from panicing when sysfs is mounted as read-only. #28697

Merged
merged 1 commit into from
Jul 11, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
40 changes: 40 additions & 0 deletions cmd/kube-proxy/app/conntrack.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ limitations under the License.
package app

import (
"errors"
"io/ioutil"
"strconv"

"github.com/golang/glog"

"k8s.io/kubernetes/pkg/util/mount"
"k8s.io/kubernetes/pkg/util/sysctl"
)

Expand All @@ -32,11 +34,25 @@ type Conntracker interface {

type realConntracker struct{}

var readOnlySysFSError = errors.New("ReadOnlySysFS")

func (realConntracker) SetMax(max int) error {
glog.Infof("Setting nf_conntrack_max to %d", max)
if err := sysctl.SetSysctl("net/netfilter/nf_conntrack_max", max); err != nil {
return err
}
// sysfs is expected to be mounted as 'rw'. However, it may be unexpectedly mounted as
// 'ro' by docker because of a known docker issue (https://github.com/docker/docker/issues/24000).
// Setting conntrack will fail when sysfs is readonly. When that happens, we don't set conntrack
// hashsize and return a special error readOnlySysFSError here. The caller should deal with
// readOnlySysFSError differently.
writable, err := isSysFSWritable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler if we simply captured and analyzed the error we get when the ioutil.WriteFile fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried IsPermissions, and it doesn't seem to work for read only file system error.
And parsing string seems to be a little more brittle than this one.

if err != nil {
return err
}
if !writable {
return readOnlySysFSError
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we just flare an event and carry on? do e2es pass?
@thockin is the node really unusable if we can't set the table size?

Copy link
Member Author

@Random-Liu Random-Liu Jul 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is mentioned in #28697 (comment):

I'm not sure whether we want to mark the node as Unscheduable when this happened, which only needs few lines change. I can do that if we think we should.

We can mark the node as unschedulable. I'm just not sure whether we want to do that. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node will be OK. Kube-proxy will run, it will just be more likely to run out of conntrack entries. It might also affect some of the conntrack-based UDP cleanup in the case of rapid re-use.

If we think that firing an event is actually enough signal for us to a) keep track of how often this occurs in our tests and b) help debug when a user gets a conntrack error, then that is certainly a lighter-weight solution

// TODO: generify this and sysctl to a new sysfs.WriteInt()
glog.Infof("Setting conntrack hashsize to %d", max/4)
return ioutil.WriteFile("/sys/module/nf_conntrack/parameters/hashsize", []byte(strconv.Itoa(max/4)), 0640)
Expand All @@ -46,3 +62,27 @@ func (realConntracker) SetTCPEstablishedTimeout(seconds int) error {
glog.Infof("Setting nf_conntrack_tcp_timeout_established to %d", seconds)
return sysctl.SetSysctl("net/netfilter/nf_conntrack_tcp_timeout_established", seconds)
}

// isSysFSWritable checks /proc/mounts to see whether sysfs is 'rw' or not.
func isSysFSWritable() (bool, error) {
const permWritable = "rw"
const sysfsDevice = "sysfs"
m := mount.New()
mountPoints, err := m.List()
if err != nil {
glog.Errorf("failed to list mount points: %v", err)
return false, err
}
for _, mountPoint := range mountPoints {
if mountPoint.Device != sysfsDevice {
continue
}
// Check whether sysfs is 'rw'
if len(mountPoint.Opts) > 0 && mountPoint.Opts[0] == permWritable {
return true, nil
}
glog.Errorf("sysfs is not writable: %+v", mountPoint)
break
}
return false, nil
}
15 changes: 13 additions & 2 deletions cmd/kube-proxy/app/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -299,8 +299,19 @@ func (s *ProxyServer) Run() error {
// Tune conntrack, if requested
if s.Conntracker != nil {
if s.Config.ConntrackMax > 0 {
if err := s.Conntracker.SetMax(int(s.Config.ConntrackMax)); err != nil {
return err
err := s.Conntracker.SetMax(int(s.Config.ConntrackMax))
if err != nil {
if err != readOnlySysFSError {
return err
}
// readOnlySysFSError is caused by a known docker issue (https://github.com/docker/docker/issues/24000),
// the only remediation we know is to restart the docker daemon.
// Here we'll send an node event with specific reason and message, the
// administrator should decide whether and how to handle this issue,
// whether to drain the node and restart docker.
// TODO(random-liu): Remove this when the docker bug is fixed.
const message = "DOCKER RESTART NEEDED (docker issue #24000): /sys is read-only: can't raise conntrack limits, problems may arise later."
s.Recorder.Eventf(s.Config.NodeRef, api.EventTypeWarning, err.Error(), message)
}
}
if s.Config.ConntrackTCPEstablishedTimeout.Duration > 0 {
Expand Down