Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure: generate loopback kubeconfig to access API locally #2085

Conversation

jhixson74
Copy link
Member

@jhixson74 jhixson74 commented Jul 24, 2019

This code generates a kubeconfig that uses localhost for API access.

This is necessary due to a limitation with Azure internal load balancers. See limitation #2 here: https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview#limitations

"Unlike public Load Balancers which provide outbound connections when transitioning from private IP addresses inside the virtual network to public IP addresses, internal Load Balancers do not translate outbound originated connections to the frontend of an internal Load Balancer as both are in private IP address space. This avoids potential for SNAT port exhaustion inside unique internal IP address space where translation is not required. The side effect is that if an outbound flow from a VM in the backend pool attempts a flow to frontend of the internal Load Balancer in which pool it resides and is mapped back to itself, both legs of the flow don't match and the flow will fail."

https://jira.coreos.com/browse/CORS-1094

…I access

This code generates a kubeconfig that uses localhost for API access. This avoids
clients getting black-holed by hitting the load balancer which is only in front
of the bootstrap node during bootstrapping.
@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 24, 2019
@abhinavdahiya
Copy link
Contributor

The tile and commit do not match the changes in the PR.

@jhixson74 jhixson74 changed the title Azure: Restrict all clients on bootstrap host to localhost for k8s API access azure: generate loopback kubeconfig to access API locally Jul 25, 2019
@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, jhixson74

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 6e3bad2 into openshift:master Jul 26, 2019
@openshift-ci-robot
Copy link
Contributor

@jhixson74: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 bf59ebf link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking added a commit to wking/openshift-installer that referenced this pull request Aug 20, 2019
…ys for etcd-signer

Since the pivots to prefer loopback Kube-API access:

* bf59ebf (azure: generate loopback kubeconfig to access API
  locally, 2019-07-17, openshift#2085).
* 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API
  access, 2019-07-24, openshift#2086).
* openshift/cluster-bootstrap@61d1428bea (pkg/start: use loopback
  kubeconfig to talk to API, 2019-07-23,
  openshift/cluster-bootstrap#28).
* possibly more

logs on the bootstrap machine have contained distracting errors like
these reported in [1]:

  $ grep 'not localhost\|etcd-signer' journal-bootstrap.log
  ...
  Aug 20 10:33:56 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[8366]: 2019-08-20 10:33:56.090073216 +0000 UTC m=+2.644782091 container start d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer)
  Aug 20 10:33:58 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost
  Aug 20 10:34:01 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com approve-csr.sh[2870]: Unable to connect to the server: x509: certificate is valid for api.bm1.oc4, not localhost
  ...
  Aug 20 10:43:55 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost
  Aug 20 10:43:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[15272]: 2019-08-20 10:43:59.68789639 +0000 UTC m=+0.188325679 container died d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer)
  ...

With this commit, we pass the localhost cert to etcd-signer so we can
form the TLS connection to gracefully say "sorry, I'm not really a
Kube API server".  Fixes [2].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1743661
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1743840
jhixson74 pushed a commit to jhixson74/installer that referenced this pull request Dec 6, 2019
…ys for etcd-signer

Since the pivots to prefer loopback Kube-API access:

* bf59ebf (azure: generate loopback kubeconfig to access API
  locally, 2019-07-17, openshift#2085).
* 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API
  access, 2019-07-24, openshift#2086).
* openshift/cluster-bootstrap@61d1428bea (pkg/start: use loopback
  kubeconfig to talk to API, 2019-07-23,
  openshift/cluster-bootstrap#28).
* possibly more

logs on the bootstrap machine have contained distracting errors like
these reported in [1]:

  $ grep 'not localhost\|etcd-signer' journal-bootstrap.log
  ...
  Aug 20 10:33:56 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[8366]: 2019-08-20 10:33:56.090073216 +0000 UTC m=+2.644782091 container start d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer)
  Aug 20 10:33:58 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost
  Aug 20 10:34:01 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com approve-csr.sh[2870]: Unable to connect to the server: x509: certificate is valid for api.bm1.oc4, not localhost
  ...
  Aug 20 10:43:55 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost
  Aug 20 10:43:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[15272]: 2019-08-20 10:43:59.68789639 +0000 UTC m=+0.188325679 container died d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer)
  ...

With this commit, we pass the localhost cert to etcd-signer so we can
form the TLS connection to gracefully say "sorry, I'm not really a
Kube API server".  Fixes [2].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1743661
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1743840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants