Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix kube-proxy panic because of nil sessionAffinityConfig #51500

Merged

Conversation

m1093782566
Copy link
Contributor

@m1093782566 m1093782566 commented Aug 29, 2017

What this PR does / why we need it:

fix kube-proxy panic because of nil sessionAffinityConfig

Which issue this PR fixes: closes #51499

Special notes for your reviewer:

I apology that this bug is introduced by #49850 :(

@thockin @smarterclayton @gnufied

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 29, 2017
@k8s-github-robot k8s-github-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 29, 2017
@m1093782566
Copy link
Contributor Author

/assign @thockin

@@ -196,7 +196,15 @@ func newServiceInfo(svcPortName proxy.ServicePortName, port *api.ServicePort, se
}
var stickyMaxAgeSeconds int
if service.Spec.SessionAffinity == api.ServiceAffinityClientIP {
stickyMaxAgeSeconds = int(*service.Spec.SessionAffinityConfig.ClientIP.TimeoutSeconds)
// We changed the API definition that sessionAffinityConfig is required when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that set in defaulting? When we read it back from API it should get default values. Before we merge this, I need to understand why that isn't working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that set in defaulting? When we read it back from API it should get default values.

When we set the default value in pkg/api/v1/defaults.go, we meet the UT issues.

--- FAIL: TestRoundTripTypes (0.31s)

roundtrip.go:326: Service: diff: 
		object.Spec.Selector:
		  a: map[string]string{}
		  b: map[string]string(nil)
		object.Spec.ExternalIPs:
		  a: []string{}
		  b: []string(nil)
		object.Spec.SessionAffinityConfig:
		  a: <nil>
		  b: &api.SessionAffinityConfig{ClientIP:(*api.ClientIPConfig)(0xc420643f48)}
		object.Spec.LoadBalancerSourceRanges:
		  a: []string{}
		  b: []string(nil)
		object.Status.LoadBalancer.Ingress:
		  a: []api.LoadBalancerIngress{}
		  b: []api.LoadBalancerIngress(nil)

It looks like a testcase that goes through defaulting, which causes the diff to fail. Unfortunately, we don't have a great general solution to this (yet) except to patch up the test case to not depend on defaulting. Set that value explicitly.

So, we set the value explicitly which means the timeout value is required when session affinity type is client ip. For example, we set it explicitly in pkg/master/controller.go.

I guess your may refer to

// ClientIPConfig represents the configurations of Client IP based session affinity.
type ClientIPConfig struct {
	// timeoutSeconds specifies the seconds of ClientIP type session sticky time.
	// The value must be >0 && <=86400(for 1 day) if ServiceAffinity == "ClientIP".
	// Default value is 10800(for 3 hours).
	// +optional
	TimeoutSeconds *int32
}

The comment message maybe not proper.

@thockin
Copy link
Member

thockin commented Aug 29, 2017 via email

@m1093782566
Copy link
Contributor Author

How does that manifest as a nil pointer on the consumer?

Because the service data is still stored in etcd - for example, service kubernetes. When we delete the legacy services whose session affinity type is Client IP, kube-proxy stops panic.

@m1093782566
Copy link
Contributor Author

See https://github.com/kubernetes/kubernetes/blob/master/pkg/master/controller.go#L245

KUBE-APISERVER currently doesn't support updating session affinity config for service kubernetes - it only updates service type and ports.

@thockin
Copy link
Member

thockin commented Aug 29, 2017 via email

@m1093782566
Copy link
Contributor Author

m1093782566 commented Aug 29, 2017

But the defaulting still is in place, right?

Do you refer to

func SetDefaults_Service(obj *v1.Service) {
	if obj.Spec.SessionAffinity == "" {
		obj.Spec.SessionAffinity = v1.ServiceAffinityNone
	}
	if obj.Spec.Type == "" {
		obj.Spec.Type = v1.ServiceTypeClusterIP
	}
	for i := range obj.Spec.Ports {
		sp := &obj.Spec.Ports[i]
		if sp.Protocol == "" {
			sp.Protocol = v1.ProtocolTCP
		}
		if sp.TargetPort == intstr.FromInt(0) || sp.TargetPort == intstr.FromString("") {
			sp.TargetPort = intstr.FromInt(int(sp.Port))
		}
	}
	// Defaults ExternalTrafficPolicy field for NodePort / LoadBalancer service
	// to Global for consistency.
	if (obj.Spec.Type == v1.ServiceTypeNodePort ||
		obj.Spec.Type == v1.ServiceTypeLoadBalancer) &&
		obj.Spec.ExternalTrafficPolicy == "" {
		obj.Spec.ExternalTrafficPolicy = v1.ServiceExternalTrafficPolicyTypeCluster
	}
}

?

I am afraid I didn't set default values here.

@m1093782566 m1093782566 force-pushed the fix-kube-proxy-panic branch 2 times, most recently from 6953697 to 820645c Compare August 29, 2017 11:38
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 29, 2017
@m1093782566
Copy link
Contributor Author

m1093782566 commented Aug 29, 2017

I think I have almost found out how to fix the defaulting issue - I need to hack pkg/api/fuzzer/fuzzer.go.

@thockin
Please see if the changes are fine. I apologize again for introducing the serious bug. Soooooorry...

@m1093782566
Copy link
Contributor Author

When we read it out of the API, I thought it should get defaulted again, as
it transitions from internal to versioned.

It's true.

@@ -439,6 +439,14 @@ var Funcs = func(codecs runtimeserializer.CodecFactory) []interface{} {
ss.Ports[i].TargetPort.StrVal = "x" + ss.Ports[i].TargetPort.StrVal // non-empty
}
}
if ss.SessionAffinity == api.ServiceAffinityClientIP && ss.SessionAffinityConfig == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should fuzz affinity explicitly. There are many examples to follow, but basically declare a slice of valid values, pick a random number mod the length of that slice, set the value and THEN do this fixup. Basically every enumerated field should go through this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chance of the fuzzer choosing a valid string is ~0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Please see if the changes are fine.

PS. Since there is a restriction on affinity type and affinity config - config is required when type is ClientIP. So, I employ a temporary struct,

struct {
	affinityType   api.ServiceAffinity
	affinityConfig *api.SessionAffinityConfig
}

to describe the relationship between them.

@thockin
Copy link
Member

thockin commented Aug 29, 2017

This was in some form of the original PR, Iknow I looked at it. Github tools ... ugh. Yes, defaulting is necessary

@m1093782566
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-bazel

@m1093782566
Copy link
Contributor Author

kindly ping @smarterclayton

Can you help reviewing this PR when you have a chance?

@@ -439,6 +439,28 @@ var Funcs = func(codecs runtimeserializer.CodecFactory) []interface{} {
ss.Ports[i].TargetPort.StrVal = "x" + ss.Ports[i].TargetPort.StrVal // non-empty
}
}

timeoutSeconds := api.DefaultClientIPServiceAffinitySeconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we really want to fuzz this, we should do:

possible := []string{ api.ServiceAffinityNone, api.ServiceAffinityClientIP }
ss.SessionAffinity = possible[random % len(possible)]
switch ss.SessionAffinity {
case api.ServiceAffinityClientIP:
  ss.SessionAffinityConfig = new(api.SessionAffinityConfig)
  ss.SessionAffinityConfig.ClientIP = new(api.ClientIPConfig)
  ss.SessionAffinityConfig.ClientIP.TimeoutSeconds = random() % api.SessionAffinityMax
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your good example :)

@@ -101,6 +101,14 @@ func SetDefaults_Service(obj *v1.Service) {
if obj.Spec.SessionAffinity == "" {
obj.Spec.SessionAffinity = v1.ServiceAffinityNone
}
if obj.Spec.SessionAffinity == v1.ServiceAffinityClientIP && obj.Spec.SessionAffinityConfig == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget || obj.Spec.SessionAffinityConfig.ClientIP == nil || obj.Spec.SessionAffinityConfig.ClientIP.TimeoutSeconds == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. But, do you mean obj.Spec.SessionAffinityConfig.ClientIP.TimeoutSeconds == nil?

TimeoutSeconds is *int32 type and we treat *TimeoutSeconds == 0 as an invalid value, ref: https://github.com/kubernetes/kubernetes/blob/master/pkg/api/validation/validation.go#L1913

@@ -101,6 +101,14 @@ func SetDefaults_Service(obj *v1.Service) {
if obj.Spec.SessionAffinity == "" {
obj.Spec.SessionAffinity = v1.ServiceAffinityNone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check for ServiceAffinityNone and set SessionAffinityConfig == nil ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we will ignore SessionAffinityConfig when ServiceAffinityNone, but I am fine to check it :)

@thockin
Copy link
Member

thockin commented Aug 31, 2017

And update tests

@m1093782566
Copy link
Contributor Author

m1093782566 commented Sep 1, 2017

@thockin

I pushed an update. PTAL.

@m1093782566
Copy link
Contributor Author

/retest

@thockin
Copy link
Member

thockin commented Sep 1, 2017

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2017
@thockin
Copy link
Member

thockin commented Sep 1, 2017

/retest

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: m1093782566, thockin

Associated issue: 51499

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2017
@abgworrall abgworrall modified the milestone: v1.8 Sep 2, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51819, 51706, 51761, 51818, 51500)

@k8s-github-robot k8s-github-robot merged commit a31bc44 into kubernetes:master Sep 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-proxy panic because of service sessionAffinityConfig is nil
7 participants