New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set conntrack params in kube-proxy #19182
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
/* | ||
Copyright 2015 The Kubernetes Authors All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package app | ||
|
||
import ( | ||
"io/ioutil" | ||
"strconv" | ||
|
||
"github.com/golang/glog" | ||
|
||
"k8s.io/kubernetes/pkg/util/sysctl" | ||
) | ||
|
||
type Conntracker interface { | ||
SetMax(max int) error | ||
SetTCPEstablishedTimeout(seconds int) error | ||
} | ||
|
||
type realConntracker struct{} | ||
|
||
func (realConntracker) SetMax(max int) error { | ||
glog.Infof("Setting nf_conntrack_max to %d", max) | ||
if err := sysctl.SetSysctl("net/netfilter/nf_conntrack_max", max); err != nil { | ||
return err | ||
} | ||
// TODO: generify this and sysctl to a new sysfs.WriteInt() | ||
glog.Infof("Setting conntrack hashsize to %d", max/4) | ||
return ioutil.WriteFile("/sys/module/nf_conntrack/parameters/hashsize", []byte(strconv.Itoa(max/4)), 0640) | ||
} | ||
|
||
func (realConntracker) SetTCPEstablishedTimeout(seconds int) error { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should the type be Duration instead of int ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I chose seconds because any finer granularity is not respected. I could go either way, but given the very limited exposure of this, I think simpler is better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds good |
||
glog.Infof("Setting nf_conntrack_tcp_timeout_established to %d", seconds) | ||
return sysctl.SetSysctl("net/netfilter/nf_conntrack_tcp_timeout_established", seconds) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,24 +51,26 @@ import ( | |
|
||
// ProxyServerConfig contains configures and runs a Kubernetes proxy server | ||
type ProxyServerConfig struct { | ||
BindAddress net.IP | ||
HealthzPort int | ||
HealthzBindAddress net.IP | ||
OOMScoreAdj int | ||
ResourceContainer string | ||
Master string | ||
Kubeconfig string | ||
PortRange util.PortRange | ||
HostnameOverride string | ||
ProxyMode string | ||
IptablesSyncPeriod time.Duration | ||
ConfigSyncPeriod time.Duration | ||
NodeRef *api.ObjectReference // Reference to this node. | ||
MasqueradeAll bool | ||
CleanupAndExit bool | ||
KubeAPIQPS float32 | ||
KubeAPIBurst int | ||
UDPIdleTimeout time.Duration | ||
BindAddress net.IP | ||
HealthzPort int | ||
HealthzBindAddress net.IP | ||
OOMScoreAdj int | ||
ResourceContainer string | ||
Master string | ||
Kubeconfig string | ||
PortRange util.PortRange | ||
HostnameOverride string | ||
ProxyMode string | ||
IptablesSyncPeriod time.Duration | ||
ConfigSyncPeriod time.Duration | ||
NodeRef *api.ObjectReference // Reference to this node. | ||
MasqueradeAll bool | ||
CleanupAndExit bool | ||
KubeAPIQPS float32 | ||
KubeAPIBurst int | ||
UDPIdleTimeout time.Duration | ||
ConntrackMax int | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be time.Duration There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As previous - it can't be any finer than seconds, so I thought simpler was better. If you think it's clearer I can change it, but it will involve getting the value back out by .Seconds() (float64) and then casting to int. |
||
ConntrackTCPTimeoutEstablished int // seconds | ||
} | ||
|
||
type ProxyServer struct { | ||
|
@@ -78,6 +80,7 @@ type ProxyServer struct { | |
Proxier proxy.ProxyProvider | ||
Broadcaster record.EventBroadcaster | ||
Recorder record.EventRecorder | ||
Conntracker Conntracker // if nil, ignored | ||
} | ||
|
||
// AddFlags adds flags for a specific ProxyServer to the specified FlagSet | ||
|
@@ -100,6 +103,8 @@ func (s *ProxyServerConfig) AddFlags(fs *pflag.FlagSet) { | |
fs.Float32Var(&s.KubeAPIQPS, "kube-api-qps", s.KubeAPIQPS, "QPS to use while talking with kubernetes apiserver") | ||
fs.IntVar(&s.KubeAPIBurst, "kube-api-burst", s.KubeAPIBurst, "Burst to use while talking with kubernetes apiserver") | ||
fs.DurationVar(&s.UDPIdleTimeout, "udp-timeout", s.UDPIdleTimeout, "How long an idle UDP connection will be kept open (e.g. '250ms', '2s'). Must be greater than 0. Only applicable for proxy-mode=userspace") | ||
fs.IntVar(&s.ConntrackMax, "conntrack-max", s.ConntrackMax, "Maximum number of NAT connections to track (0 to leave as-is)") | ||
fs.IntVar(&s.ConntrackTCPTimeoutEstablished, "conntrack-tcp-timeout-established", s.ConntrackTCPTimeoutEstablished, "Idle timeout for established TCP connections (0 to leave as-is)") | ||
} | ||
|
||
const ( | ||
|
@@ -119,16 +124,18 @@ func checkKnownProxyMode(proxyMode string) bool { | |
|
||
func NewProxyConfig() *ProxyServerConfig { | ||
return &ProxyServerConfig{ | ||
BindAddress: net.ParseIP("0.0.0.0"), | ||
HealthzPort: 10249, | ||
HealthzBindAddress: net.ParseIP("127.0.0.1"), | ||
OOMScoreAdj: qos.KubeProxyOOMScoreAdj, | ||
ResourceContainer: "/kube-proxy", | ||
IptablesSyncPeriod: 30 * time.Second, | ||
ConfigSyncPeriod: 15 * time.Minute, | ||
KubeAPIQPS: 5.0, | ||
KubeAPIBurst: 10, | ||
UDPIdleTimeout: 250 * time.Millisecond, | ||
BindAddress: net.ParseIP("0.0.0.0"), | ||
HealthzPort: 10249, | ||
HealthzBindAddress: net.ParseIP("127.0.0.1"), | ||
OOMScoreAdj: qos.KubeProxyOOMScoreAdj, | ||
ResourceContainer: "/kube-proxy", | ||
IptablesSyncPeriod: 30 * time.Second, | ||
ConfigSyncPeriod: 15 * time.Minute, | ||
KubeAPIQPS: 5.0, | ||
KubeAPIBurst: 10, | ||
UDPIdleTimeout: 250 * time.Millisecond, | ||
ConntrackMax: 256 * 1024, // 4x default (64k) | ||
ConntrackTCPTimeoutEstablished: 86400, // 1 day (1/5 default) | ||
} | ||
} | ||
|
||
|
@@ -139,6 +146,7 @@ func NewProxyServer( | |
proxier proxy.ProxyProvider, | ||
broadcaster record.EventBroadcaster, | ||
recorder record.EventRecorder, | ||
conntracker Conntracker, | ||
) (*ProxyServer, error) { | ||
return &ProxyServer{ | ||
Client: client, | ||
|
@@ -147,6 +155,7 @@ func NewProxyServer( | |
Proxier: proxier, | ||
Broadcaster: broadcaster, | ||
Recorder: recorder, | ||
Conntracker: conntracker, | ||
}, nil | ||
} | ||
|
||
|
@@ -182,7 +191,7 @@ func NewProxyServerDefault(config *ProxyServerConfig) (*ProxyServer, error) { | |
dbus := utildbus.New() | ||
iptInterface := utiliptables.New(execer, dbus, protocol) | ||
|
||
// We ommit creation of pretty much everything if we run in cleanup mode | ||
// We omit creation of pretty much everything if we run in cleanup mode | ||
if config.CleanupAndExit { | ||
return &ProxyServer{ | ||
Config: config, | ||
|
@@ -293,7 +302,10 @@ func NewProxyServerDefault(config *ProxyServerConfig) (*ProxyServer, error) { | |
UID: types.UID(hostname), | ||
Namespace: "", | ||
} | ||
return NewProxyServer(client, config, iptInterface, proxier, eventBroadcaster, recorder) | ||
|
||
conntracker := realConntracker{} | ||
|
||
return NewProxyServer(client, config, iptInterface, proxier, eventBroadcaster, recorder, conntracker) | ||
} | ||
|
||
// Run runs the specified ProxyServer. This should never exit (unless CleanupAndExit is set). | ||
|
@@ -310,9 +322,6 @@ func (s *ProxyServer) Run(_ []string) error { | |
|
||
s.Broadcaster.StartRecordingToSink(s.Client.Events("")) | ||
|
||
// Birth Cry after the birth is successful | ||
s.birthCry() | ||
|
||
// Start up Healthz service if requested | ||
if s.Config.HealthzPort > 0 { | ||
go util.Until(func() { | ||
|
@@ -323,6 +332,23 @@ func (s *ProxyServer) Run(_ []string) error { | |
}, 5*time.Second, util.NeverStop) | ||
} | ||
|
||
// Tune conntrack, if requested | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not part of this change, but should s.birthCry(), line 326, be just before SyncLoop(), line 353 ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looks good overall.. minor nit.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
if s.Conntracker != nil { | ||
if s.Config.ConntrackMax > 0 { | ||
if err := s.Conntracker.SetMax(s.Config.ConntrackMax); err != nil { | ||
return err | ||
} | ||
} | ||
if s.Config.ConntrackTCPTimeoutEstablished > 0 { | ||
if err := s.Conntracker.SetTCPEstablishedTimeout(s.Config.ConntrackTCPTimeoutEstablished); err != nil { | ||
return err | ||
} | ||
} | ||
} | ||
|
||
// Birth Cry after the birth is successful | ||
s.birthCry() | ||
|
||
// Just loop forever for now... | ||
s.Proxier.SyncLoop() | ||
return nil | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line appears to be the culprit that fails our smoke testing when
max
is non-zero.xref mesosphere/kubernetes-mesos#724
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for posterity: the nf_conntrack module doesn't seem to support setting the value of this hashsize parameter for network namespace other than
init_net
; this is strictly incompatible with our mesos/docker-based testing environment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking it - what is the fix? What can we do?
On Wed, Jan 6, 2016 at 8:54 AM, James DeFelice notifications@github.com
wrote:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for k8s-mesos I've disabled these tuning parameters by default (read: zero by default). this fixes our CI environment immediately. users can still tweak them if needed/wanted. short of changing the way hashsize is implemented in the kernel module i'm not sure how else to really "fix" this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More links I collected for the context:
k8s-mesos issue @jdef mentioned, that might be related to setting hashsize inside nested network namespace
kube-proxy connection tracking adjustments are crashing smoke tests mesosphere/kubernetes-mesos#724
more general discussion regarding whether sysctl are namespace safe
Document per namespace sysctl and how to set them in pods #29572
initial Linux kernel change, that restrict conntrack hash resize to init_net
https://lwn.net/Articles/375395/