Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set conntrack params in kube-proxy #19182

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 48 additions & 0 deletions cmd/kube-proxy/app/conntrack.go
@@ -0,0 +1,48 @@
/*
Copyright 2015 The Kubernetes Authors All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package app

import (
"io/ioutil"
"strconv"

"github.com/golang/glog"

"k8s.io/kubernetes/pkg/util/sysctl"
)

type Conntracker interface {
SetMax(max int) error
SetTCPEstablishedTimeout(seconds int) error
}

type realConntracker struct{}

func (realConntracker) SetMax(max int) error {
glog.Infof("Setting nf_conntrack_max to %d", max)
if err := sysctl.SetSysctl("net/netfilter/nf_conntrack_max", max); err != nil {
return err
}
// TODO: generify this and sysctl to a new sysfs.WriteInt()
glog.Infof("Setting conntrack hashsize to %d", max/4)
return ioutil.WriteFile("/sys/module/nf_conntrack/parameters/hashsize", []byte(strconv.Itoa(max/4)), 0640)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line appears to be the culprit that fails our smoke testing when max is non-zero.
xref mesosphere/kubernetes-mesos#724

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for posterity: the nf_conntrack module doesn't seem to support setting the value of this hashsize parameter for network namespace other than init_net; this is strictly incompatible with our mesos/docker-based testing environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking it - what is the fix? What can we do?

On Wed, Jan 6, 2016 at 8:54 AM, James DeFelice notifications@github.com
wrote:

In cmd/kube-proxy/app/conntrack.go
#19182 (comment)
:

+type Conntracker interface {

  • SetMax(max int) error
  • SetTCPEstablishedTimeout(seconds int) error
    +}

+type realConntracker struct{}
+
+func (realConntracker) SetMax(max int) error {

  • glog.Infof("Setting nf_conntrack_max to %d", max)
  • if err := sysctl.SetSysctl("net/netfilter/nf_conntrack_max", max); err != nil {
  •   return err
    
  • }
  • // TODO: generify this and sysctl to a new sysfs.WriteInt()
  • glog.Infof("Setting conntrack hashsize to %d", max/4)
  • return ioutil.WriteFile("/sys/module/nf_conntrack/parameters/hashsize", []byte(strconv.Itoa(max/4)), 0640)

for posterity: the nf_conntrack module doesn't seem to support setting the
value of this hashsize parameter for network namespace other than init_net;
this is strictly incompatible with our mesos/docker-based testing
environment.


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/19182/files#r48979830.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for k8s-mesos I've disabled these tuning parameters by default (read: zero by default). this fixes our CI environment immediately. users can still tweak them if needed/wanted. short of changing the way hashsize is implemented in the kernel module i'm not sure how else to really "fix" this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More links I collected for the context:

}

func (realConntracker) SetTCPEstablishedTimeout(seconds int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the type be Duration instead of int ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose seconds because any finer granularity is not respected. I could go either way, but given the very limited exposure of this, I think simpler is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

glog.Infof("Setting nf_conntrack_tcp_timeout_established to %d", seconds)
return sysctl.SetSysctl("net/netfilter/nf_conntrack_tcp_timeout_established", seconds)
}
92 changes: 59 additions & 33 deletions cmd/kube-proxy/app/server.go
Expand Up @@ -51,24 +51,26 @@ import (

// ProxyServerConfig contains configures and runs a Kubernetes proxy server
type ProxyServerConfig struct {
BindAddress net.IP
HealthzPort int
HealthzBindAddress net.IP
OOMScoreAdj int
ResourceContainer string
Master string
Kubeconfig string
PortRange util.PortRange
HostnameOverride string
ProxyMode string
IptablesSyncPeriod time.Duration
ConfigSyncPeriod time.Duration
NodeRef *api.ObjectReference // Reference to this node.
MasqueradeAll bool
CleanupAndExit bool
KubeAPIQPS float32
KubeAPIBurst int
UDPIdleTimeout time.Duration
BindAddress net.IP
HealthzPort int
HealthzBindAddress net.IP
OOMScoreAdj int
ResourceContainer string
Master string
Kubeconfig string
PortRange util.PortRange
HostnameOverride string
ProxyMode string
IptablesSyncPeriod time.Duration
ConfigSyncPeriod time.Duration
NodeRef *api.ObjectReference // Reference to this node.
MasqueradeAll bool
CleanupAndExit bool
KubeAPIQPS float32
KubeAPIBurst int
UDPIdleTimeout time.Duration
ConntrackMax int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be time.Duration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previous - it can't be any finer than seconds, so I thought simpler was better. If you think it's clearer I can change it, but it will involve getting the value back out by .Seconds() (float64) and then casting to int.

ConntrackTCPTimeoutEstablished int // seconds
}

type ProxyServer struct {
Expand All @@ -78,6 +80,7 @@ type ProxyServer struct {
Proxier proxy.ProxyProvider
Broadcaster record.EventBroadcaster
Recorder record.EventRecorder
Conntracker Conntracker // if nil, ignored
}

// AddFlags adds flags for a specific ProxyServer to the specified FlagSet
Expand All @@ -100,6 +103,8 @@ func (s *ProxyServerConfig) AddFlags(fs *pflag.FlagSet) {
fs.Float32Var(&s.KubeAPIQPS, "kube-api-qps", s.KubeAPIQPS, "QPS to use while talking with kubernetes apiserver")
fs.IntVar(&s.KubeAPIBurst, "kube-api-burst", s.KubeAPIBurst, "Burst to use while talking with kubernetes apiserver")
fs.DurationVar(&s.UDPIdleTimeout, "udp-timeout", s.UDPIdleTimeout, "How long an idle UDP connection will be kept open (e.g. '250ms', '2s'). Must be greater than 0. Only applicable for proxy-mode=userspace")
fs.IntVar(&s.ConntrackMax, "conntrack-max", s.ConntrackMax, "Maximum number of NAT connections to track (0 to leave as-is)")
fs.IntVar(&s.ConntrackTCPTimeoutEstablished, "conntrack-tcp-timeout-established", s.ConntrackTCPTimeoutEstablished, "Idle timeout for established TCP connections (0 to leave as-is)")
}

const (
Expand All @@ -119,16 +124,18 @@ func checkKnownProxyMode(proxyMode string) bool {

func NewProxyConfig() *ProxyServerConfig {
return &ProxyServerConfig{
BindAddress: net.ParseIP("0.0.0.0"),
HealthzPort: 10249,
HealthzBindAddress: net.ParseIP("127.0.0.1"),
OOMScoreAdj: qos.KubeProxyOOMScoreAdj,
ResourceContainer: "/kube-proxy",
IptablesSyncPeriod: 30 * time.Second,
ConfigSyncPeriod: 15 * time.Minute,
KubeAPIQPS: 5.0,
KubeAPIBurst: 10,
UDPIdleTimeout: 250 * time.Millisecond,
BindAddress: net.ParseIP("0.0.0.0"),
HealthzPort: 10249,
HealthzBindAddress: net.ParseIP("127.0.0.1"),
OOMScoreAdj: qos.KubeProxyOOMScoreAdj,
ResourceContainer: "/kube-proxy",
IptablesSyncPeriod: 30 * time.Second,
ConfigSyncPeriod: 15 * time.Minute,
KubeAPIQPS: 5.0,
KubeAPIBurst: 10,
UDPIdleTimeout: 250 * time.Millisecond,
ConntrackMax: 256 * 1024, // 4x default (64k)
ConntrackTCPTimeoutEstablished: 86400, // 1 day (1/5 default)
}
}

Expand All @@ -139,6 +146,7 @@ func NewProxyServer(
proxier proxy.ProxyProvider,
broadcaster record.EventBroadcaster,
recorder record.EventRecorder,
conntracker Conntracker,
) (*ProxyServer, error) {
return &ProxyServer{
Client: client,
Expand All @@ -147,6 +155,7 @@ func NewProxyServer(
Proxier: proxier,
Broadcaster: broadcaster,
Recorder: recorder,
Conntracker: conntracker,
}, nil
}

Expand Down Expand Up @@ -182,7 +191,7 @@ func NewProxyServerDefault(config *ProxyServerConfig) (*ProxyServer, error) {
dbus := utildbus.New()
iptInterface := utiliptables.New(execer, dbus, protocol)

// We ommit creation of pretty much everything if we run in cleanup mode
// We omit creation of pretty much everything if we run in cleanup mode
if config.CleanupAndExit {
return &ProxyServer{
Config: config,
Expand Down Expand Up @@ -293,7 +302,10 @@ func NewProxyServerDefault(config *ProxyServerConfig) (*ProxyServer, error) {
UID: types.UID(hostname),
Namespace: "",
}
return NewProxyServer(client, config, iptInterface, proxier, eventBroadcaster, recorder)

conntracker := realConntracker{}

return NewProxyServer(client, config, iptInterface, proxier, eventBroadcaster, recorder, conntracker)
}

// Run runs the specified ProxyServer. This should never exit (unless CleanupAndExit is set).
Expand All @@ -310,9 +322,6 @@ func (s *ProxyServer) Run(_ []string) error {

s.Broadcaster.StartRecordingToSink(s.Client.Events(""))

// Birth Cry after the birth is successful
s.birthCry()

// Start up Healthz service if requested
if s.Config.HealthzPort > 0 {
go util.Until(func() {
Expand All @@ -323,6 +332,23 @@ func (s *ProxyServer) Run(_ []string) error {
}, 5*time.Second, util.NeverStop)
}

// Tune conntrack, if requested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not part of this change, but should s.birthCry(), line 326, be just before SyncLoop(), line 353 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall.. minor nit..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if s.Conntracker != nil {
if s.Config.ConntrackMax > 0 {
if err := s.Conntracker.SetMax(s.Config.ConntrackMax); err != nil {
return err
}
}
if s.Config.ConntrackTCPTimeoutEstablished > 0 {
if err := s.Conntracker.SetTCPEstablishedTimeout(s.Config.ConntrackTCPTimeoutEstablished); err != nil {
return err
}
}
}

// Birth Cry after the birth is successful
s.birthCry()

// Just loop forever for now...
s.Proxier.SyncLoop()
return nil
Expand Down
4 changes: 3 additions & 1 deletion docs/admin/kube-proxy.md
Expand Up @@ -57,6 +57,8 @@ kube-proxy
--bind-address=0.0.0.0: The IP address for the proxy server to serve on (set to 0.0.0.0 for all interfaces)
--cleanup-iptables[=false]: If true cleanup iptables rules and exit.
--config-sync-period=15m0s: How often configuration from the apiserver is refreshed. Must be greater than 0.
--conntrack-max=262144: Maximum number of NAT connections to track (0 to leave as-is)
--conntrack-tcp-timeout-established=86400: Idle timeout for established TCP connections (0 to leave as-is)
--google-json-key="": The Google Cloud Platform Service Account JSON Key to use for authentication.
--healthz-bind-address=127.0.0.1: The IP address for the health check server to serve on, defaulting to 127.0.0.1 (set to 0.0.0.0 for all interfaces)
--healthz-port=10249: The port to bind the health check server. Use 0 to disable.
Expand All @@ -74,7 +76,7 @@ kube-proxy
--udp-timeout=250ms: How long an idle UDP connection will be kept open (e.g. '250ms', '2s'). Must be greater than 0. Only applicable for proxy-mode=userspace
```

###### Auto generated by spf13/cobra on 8-Dec-2015
###### Auto generated by spf13/cobra on 30-Dec-2015


<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
Expand Down
2 changes: 2 additions & 0 deletions hack/verify-flags/known-flags.txt
Expand Up @@ -51,6 +51,8 @@ concurrent-endpoint-syncs
concurrent-resource-quota-syncs
config-sync-period
configure-cbr0
conntrack-max
conntrack-tcp-timeout-established
container-port
container-runtime
contain-pod-resources
Expand Down
2 changes: 1 addition & 1 deletion pkg/kubemark/hollow_proxy.go
Expand Up @@ -73,7 +73,7 @@ func NewHollowProxyOrDie(
endpointsConfig.Channel("api"),
)

hollowProxy, err := proxyapp.NewProxyServer(client, config, iptInterface, &FakeProxier{}, broadcaster, recorder)
hollowProxy, err := proxyapp.NewProxyServer(client, config, iptInterface, &FakeProxier{}, broadcaster, recorder, nil)
if err != nil {
glog.Fatalf("Error while creating ProxyServer: %v\n", err)
}
Expand Down