Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to EndpointSlices #154

Merged
merged 1 commit into from Jul 31, 2020

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Jul 20, 2020

Handle endpoint slices so that we can deal with dual-stack pods.

Needs: openshift/cluster-ingress-operator#426 - now merged

Needs: openshift/openshift-apiserver#125 - now merged

openshift/cluster-ingress-operator#428 allows us to turn on/off support for endpointslices via an ingresscontroller or the ingress config.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 20, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2020
@frobware
Copy link
Contributor Author

frobware commented Jul 20, 2020

Expecting this to fail without openshift/cluster-ingress-operator#426 and currently testing via cluster-bot.

/hold

@frobware
Copy link
Contributor Author

(I think) This is currently failing some of the haproxy tests because the dependency on openshift/cluster-ingress-operator#426 is not in any CI release at the moment.

426 was in this build/release https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.ci/release/4.6.0-0.ci-2020-07-22-003338 but latest CI release is currently https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.ci/release/4.6.0-0.ci-2020-07-21-114552.

@frobware
Copy link
Contributor Author

E0722 13:01:35.668887       1 reflector.go:178] github.com/openshift/router/pkg/router/controller/factory/factory.go:131: Failed to list *v1beta1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:e2e-test-unprivileged-router-8nsx9:default" cannot list resource "endpointslices" in API group "discovery.k8s.io" in the namespace "e2e-test-unprivileged-router-8nsx9"

root cause is lack of RBAC for endpointslices. But this may be for just new routers. Investigating.

pkg/router/controller/factory/factory.go Show resolved Hide resolved
pkg/router/controller/router_controller.go Outdated Show resolved Hide resolved
pkg/router/controller/router_controller.go Outdated Show resolved Hide resolved
formatIPAddr := func(addr string) string {
ip := net.ParseIP(addr)
if ip != nil && strings.Count(addr, ":") >= 2 {
return "[" + addr + "]"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use ip in this return line than addr? Also, I believe a valid ipv6 addr parsed by net.ParseIP(addr) will meet the condition strings.Count(addr, ":") >= 2. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge cases:

Would it be better to use ip in this return line than addr? Also, I believe a valid ipv6 addr parsed by net.ParseIP(addr) will meet the condition strings.Count(addr, ":") >= 2. Thoughts?

This needs more thought and unit tests:

package main

import (
	"fmt"
	"net"
)

func main() {
	ip := net.ParseIP("::FFFF:127.0.0.1")
	fmt.Println(ip)
	fmt.Println("is IPv4", ip.To4())
}

https://play.golang.org/p/4xXQTQNs45g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still expecting this to fail CI haproxy tests where the system:router cluster role does not have privileges to list/watch endpointslices.

Needs: openshift/openshift-apiserver#125

Known failures are:

"[sig-network][Feature:Router] The HAProxy router converges when multiple routers are writing status [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should override the route host for overridden domains with a custom value [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should override the route host with a custom value [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should run even if it has no access to update status [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should serve a route that points to two services and respect weights [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should serve the correct routes when scoped to a single namespace and label set [Suite:openshift/conformance/parallel]"

Copy link
Contributor Author

@frobware frobware Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With latest changes all are now passing with the exception of:

"[sig-network][Feature:Router] The HAProxy router should serve a route that points to two services and respect weights [Suite:openshift/conformance/parallel]"

pkg/router/template/plugin.go Outdated Show resolved Hide resolved
pkg/router/template/plugin.go Show resolved Hide resolved
@frobware
Copy link
Contributor Author

Still expecting this to fail CI haproxy tests where the system:router cluster role does not have privileges to list/watch endpointslices.

Needs: openshift/openshift-apiserver#125

@frobware
Copy link
Contributor Author

@sgreene570 thanks for the reviews. Continuously pushing changes as I run into issues.

pkg/router/controller/factory/factory.go Show resolved Hide resolved
pkg/router/controller/factory/factory.go Outdated Show resolved Hide resolved
pkg/router/controller/factory/factory.go Outdated Show resolved Hide resolved
pkg/router/controller/factory/factory.go Outdated Show resolved Hide resolved
pkg/router/controller/router_controller.go Outdated Show resolved Hide resolved
pkg/router/template/plugin.go Show resolved Hide resolved
if serviceName := endpointSliceServiceName(eps); serviceName == "" {
utilruntime.HandleError(fmt.Errorf("EndpointSlice %s/%s has no %q label", eps.Namespace, eps.Name, ServiceNameLabel))
} else {
objMeta := eps.ObjectMeta.DeepCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using a copy necessary?


for i := range objs {
eps := objs[i].(*discoveryv1beta1.EndpointSlice)
fullSet = append(fullSet, *eps.DeepCopy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using a copy necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes out of the store. But will look to see if we mutate at all and if not we can use it as-is.

pkg/router/controller/router_controller.go Show resolved Hide resolved
@frobware
Copy link
Contributor Author

/retest

@frobware
Copy link
Contributor Author

frobware commented Jul 24, 2020

I tried to use the auto commit suggestions - does this still build?

/retest

@frobware frobware force-pushed the endpointslices branch 2 times, most recently from dfa50bd to 1e94a11 Compare July 24, 2020 17:11
@frobware
Copy link
Contributor Author

@Miciah thanks for the review and suggestions. Mostly taken all in the now squashed commits. Currently debugging one CI flake which may be related to the change. Need to discuss what it means to sort the complete subset of endpoints.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@frobware
Copy link
Contributor Author

Last failure was:

release "release-initial" failed: could not create watcher for pod: unknown (get pods)

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@frobware
Copy link
Contributor Author

/hold

Not sure if openshift/kubernetes#300 is the fix for upgrade failures but there's no point continuously failing here (on the upgrade job).

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@frobware
Copy link
Contributor Author

/hold cancel

openshift/kubernetes#300 now merged.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@frobware
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@frobware
Copy link
Contributor Author

/hold

I pushed a2e6e2c to flush out whether shifting to endpointslices and the CI failures are related.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@Miciah
Copy link
Contributor

Miciah commented Jul 30, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Miciah
Copy link
Contributor

Miciah commented Jul 31, 2020

error: build error: failed to pull image: error pulling image configuration: Get https://docker-registry.default.svc:5000/v2/ci-op-84vhvll6/pipeline/blobs/sha256:2ce56bf6f5c9e7bcf5a0c6ad0b27fb1e81bd87431e9888718387c324c1246fe2: EOF

/test e2e
/test e2e-upgrade

@frobware
Copy link
Contributor Author

/retest

@frobware
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2020
@openshift-merge-robot openshift-merge-robot merged commit a8577e5 into openshift:master Jul 31, 2020
@frobware frobware deleted the endpointslices branch May 1, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants