Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1688099: Default to HostNetwork on libvirt #165

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Mar 13, 2019

Default to HostNetwork on libvirt

  • pkg/operator/controller/controller.go (haTypeForInfra): Use HostNetwork as the default endpoint publishing strategy if the platform is libvirt.

Rename haTypeForInfra, enforceEffectiveHighAvailability

  • pkg/operator/controller/controller.go (haTypeForInfra): Rename...
    (publishingStrategyTypeForInfra): ...to this.
    (enforceEffectiveHighAvailability): Rename...
    (enforceEffectiveEndpointPublishingStrategy): ...to this.

This commit fixes bug 1688099.

https://bugzilla.redhat.com/show_bug.cgi?id=1688099

* pkg/operator/controller/controller.go (haTypeForInfra): Use HostNetwork
as the default endpoint publishing strategy if the platform is libvirt.
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 13, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2019
@danehans
Copy link
Contributor

Can changing haType to publishingStrategyType be included or should it be addressed in a separate PR?

@Miciah
Copy link
Contributor Author

Miciah commented Mar 13, 2019

Can changing haType to publishingStrategyType be included or should it be addressed in a separate PR?

I'm fine with addressing it in this PR. I'll push a second commit to that end.

* pkg/operator/controller/controller.go (haTypeForInfra): Rename...
(publishingStrategyTypeForInfra): ...to this.
(enforceEffectiveHighAvailability): Rename...
(enforceEffectiveEndpointPublishingStrategy): ...to this.
@Miciah
Copy link
Contributor Author

Miciah commented Mar 13, 2019

We got several failures:

=== RUN   TestClusterIngressControllerCreateDelete
--- FAIL: TestClusterIngressControllerCreateDelete (35.11s)
	operator_test.go:174: failed to finalize IngressController openshift-ingress-operator/test: timed out waiting for the condition
=== RUN   TestRouterServiceInternalEndpoints
--- PASS: TestRouterServiceInternalEndpoints (1.06s)
=== RUN   TestClusterProxyProtocol
--- PASS: TestClusterProxyProtocol (1.04s)
=== RUN   TestClusterIngressUpdate
--- FAIL: TestClusterIngressUpdate (12.16s)
	operator_test.go:422: failed to observe clean-up of CA certificate configmap: timed out waiting for the condition
=== RUN   TestClusterIngressScale
--- PASS: TestClusterIngressScale (3.09s)
=== RUN   TestRouterCACertificate
--- PASS: TestRouterCACertificate (1.25s)
=== RUN   TestHostNetworkEndpointPublishingStrategy
--- FAIL: TestHostNetworkEndpointPublishingStrategy (61.19s)
	operator_test.go:752: failed to get deployment: timed out waiting for the condition
	operator_test.go:756: failed to delete the ingresscontroller: timed out waiting for the condition

Looking at the operator logs, it looks like TestClusterIngressControllerCreateDelete may be suffering from https://bugzilla.redhat.com/show_bug.cgi?id=1683515:

2019-03-13T15:52:15.360Z        ERROR   operator.init.kubebuilder.controller    controller/controller.go:217    Reconciler error        {"controller": "operator-controller", "request": "openshift-ingress-operator/test", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test: [failed to delete DNS record &{{ map[Name:ci-op-0k8rjd04-43abb-x8r72-int kubernetes.io/cluster/ci-op-0k8rjd04-43abb-x8r72:owned]} ALIAS *.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com -> a9ef207f845a711e9ad6d0e31b390578-755582900.us-east-1.elb.amazonaws.com} for ingress openshift-ingress-operator/test: failed to update alias in zone Z201R7TPP4F8H3: couldn't update DNS record in zone Z201R7TPP4F8H3: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: f912e25b-45a7-11e9-b7e5-7f4ecbc7eb19, failed to delete DNS record &{{Z328TAU66YDJH7 map[]} ALIAS *.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com -> a9ef207f845a711e9ad6d0e31b390578-755582900.us-east-1.elb.amazonaws.com} for ingress openshift-ingress-operator/test: failed to update alias in zone Z328TAU66YDJH7: couldn't update DNS record in zone Z328TAU66YDJH7: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: f91b4699-45a7-11e9-a4cb-cf7e945d48f7]", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test: [failed to delete DNS record &{{ map[Name:ci-op-0k8rjd04-43abb-x8r72-int kubernetes.io/cluster/ci-op-0k8rjd04-43abb-x8r72:owned]} ALIAS *.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com -> a9ef207f845a711e9ad6d0e31b390578-755582900.us-east-1.elb.amazonaws.com} for ingress openshift-ingress-operator/test: failed to update alias in zone Z201R7TPP4F8H3: couldn't update DNS record in zone Z201R7TPP4F8H3: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: f912e25b-45a7-11e9-b7e5-7f4ecbc7eb19, failed to delete DNS record &{{Z328TAU66YDJH7 map[]} ALIAS *.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com -> a9ef207f845a711e9ad6d0e31b390578-755582900.us-east-1.elb.amazonaws.com} for ingress openshift-ingress-operator/test: failed to update alias in zone Z328TAU66YDJH7: couldn't update DNS record in zone Z328TAU66YDJH7: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.test.ci-op-0k8rjd04-43abb.origin-ci-int-aws.dev.rhcloud.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: f91b4699-45a7-11e9-a4cb-cf7e945d48f7]"}]}

If we never delete the ingresscontroller that TestClusterIngressControllerCreateDelete creates, then TestClusterIngressUpdate will fail because it requires that there be no ingresscontroller using the default certificate.

As for TestHostNetworkEndpointPublishingStrategy, I suspect that the ingresscontroller that it creates never attains the desired number of replicas because it cannot schedule the pod since the default ingresscontroller plus the ingresscontroller that TestClusterIngressControllerCreateDelete creates are using up all the worker nodes.

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 13, 2019
@ironcladlou
Copy link
Contributor

/lgtm
/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Miciah
Copy link
Contributor Author

Miciah commented Mar 13, 2019

level=error msg="\t* module.vpc.aws_route_table_association.route_net[2]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.2: timeout while waiting for state to become 'success' (timeout: 5m0s)"

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit ba6f7e6 into openshift:master Mar 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants