New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve LB session affinity tests #92427
Improve LB session affinity tests #92427
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these tests were marked [DisabledForLargeClusters]
because there were problems with NodePort services in very large clusters... and it doesn't seem like this patch addresses that problem? Why exactly is this needed for re-enabling the tests?
test/e2e/network/service.go
Outdated
if execPod == nil { | ||
timeout = e2eservice.GetServiceLoadBalancerPropagationTimeout(cs) | ||
interval = 2 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh... this code is a mess; if execPod != nil
then each iteration of the PollImmediate
makes AffinityConfirmCount
attempts to connect, but if execPod == nil
then each iteration makes a single attempt to connect. So... the change to interval
here is correct, but it would be better to rewrite this some more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship - would splitting these into two somewhat separate functions address this concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first thought, but there's a lot of code that's shared too so that might result in too much duplication? I dunno. Maybe just make the non-execPod case inside the poll also do AffinityConfirmCount
tries?
We enabled these tests couple weeks ago and they were flaky back then (mostly timeouts). I think that may contribute to improving them (I would like to re-enable them again). So I actually think it may potentially help. |
7a437ac
to
4d5a205
Compare
test/e2e/network/service.go
Outdated
} | ||
|
||
interval, timeout, getHosts := pollingArgs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship - thanks a lot for the first pass of review; I refactored this test a bit and hopefully it's much cleaner now.
PTAL
/retest |
@danwinship - friendly ping |
test/e2e/network/service.go
Outdated
|
||
return func() (time.Duration, time.Duration, func() []string) { | ||
return interval, timeout, getHosts | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird to have these return a function that returns those arguments, rather than just returning the arguments themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function itself is slightly longer then, but I'm fine with changing too.
Done
test/e2e/network/service.go
Outdated
@@ -147,6 +192,7 @@ func checkAffinity(cs clientset.Interface, execPod *v1.Pod, serviceIP string, se | |||
return false | |||
} | |||
if !trackerFulfilled { | |||
serviceIPPort := net.JoinHostPort(serviceIP, strconv.Itoa(servicePort)) | |||
checkAffinityFailed(tracker, fmt.Sprintf("Connection to %s timed out or not enough responses.", serviceIPPort)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(you could just remove serviceIPPort
from the error message... it's obvious from context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
4d5a205
to
3c34b56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship - thanks; PTAL
test/e2e/network/service.go
Outdated
|
||
return func() (time.Duration, time.Duration, func() []string) { | ||
return interval, timeout, getHosts | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function itself is slightly longer then, but I'm fine with changing too.
Done
test/e2e/network/service.go
Outdated
@@ -147,6 +192,7 @@ func checkAffinity(cs clientset.Interface, execPod *v1.Pod, serviceIP string, se | |||
return false | |||
} | |||
if !trackerFulfilled { | |||
serviceIPPort := net.JoinHostPort(serviceIP, strconv.Itoa(servicePort)) | |||
checkAffinityFailed(tracker, fmt.Sprintf("Connection to %s timed out or not enough responses.", serviceIPPort)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/retest |
/lgtm |
In preparation of enabling those tests for LB session affinity (5k-node) clusters.
Ref: #56138