Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HTTPS as etcd-apiserver protocol when mTLS is enabled #77561

Merged
merged 1 commit into from Jul 29, 2019

Conversation

@wenjiaswe
Copy link
Contributor

commented May 7, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change

/kind bug

/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
#70144 added mTLS for etcd server and kube-apiserver connection, however, ETCD_SERVER is still using http not https, so the fix didn't really secure anything. This PR will fix that.

  • #74690 did the same thing, it was merged but reverted (#77158) for performance regression investigation (#77080). Now I am adding this back in. We will test this PR with all necessary performance tests and make sure it doesn't introduce unexpected performance regression before we remove [WIP] label.
  • This PR also changed etcd metrics port to 2382 when mTLS is enabled, otherwise kubelet wouldn't be able to access https://localhost:2379/metrics and https://localhost:2379/health.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Use HTTPS as etcd-apiserver protocol when mTLS between etcd and kube-apiserver on master is enabled, change etcd metrics/health port to 2382.

cc @wojtek-t @mikedanese

@mikedanese

This comment has been minimized.

Copy link
Member

commented May 8, 2019

looks good. let me know when you want an lgtm

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2019

@mikedanese thanks for your quick response! I am waiting for @wojtek-t to help to do some performance test. There is performance regression and it is possible that #74690 is one of the cause so we reverted it in #77158. But since there are a lot of PRs at the same period of time so it's hard to tell how much impact does it have if it does cause performance regression. Now I am creating a new PR. And when @wojtek-t has time, he will help to do the GCE-5000 performance test for isolated performance investigation on this one PR. Once we are ready, I will sure ping you, thanks!

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2019

/test pull-kubernetes-dependencies

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2019

/assign @wojtek-t to help with GCE-5000 performance test.

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2019

/sig api-machinery

@wenjiaswe wenjiaswe force-pushed the wenjiaswe:fix-etcd-server branch from 8673ddd to 9ae120d Jun 3, 2019

@k8s-ci-robot k8s-ci-robot added sig/gcp and removed needs-rebase labels Jun 3, 2019

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jun 4, 2019

/retest

@wenjiaswe wenjiaswe force-pushed the wenjiaswe:fix-etcd-server branch from 9ae120d to 7c1f0ba Jun 7, 2019

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

/test pull-kubernetes-e2e-gce-100-performance

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

/retest

@wenjiaswe wenjiaswe force-pushed the wenjiaswe:fix-etcd-server branch from 7c1f0ba to bc7a2fc Jul 2, 2019

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2019

/test pull-kubernetes-e2e-gce-100-performance

@wenjiaswe wenjiaswe force-pushed the wenjiaswe:fix-etcd-server branch from bc7a2fc to 1393ed8 Jul 2, 2019

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

For record, this PR passed the gce_5000 test run separately, checked with @krzysied .

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

@mm4tt it looks like there is still some issue with setting up prometheus stack. This is gce_500 test, and gce_5000 test has passed for this PR. Would you mind helping take another look?

W0724 22:50:28.428] F0724 22:50:28.414485 101549 clusterloader.go:238] Error while setting up prometheus stack: timed out waiting for the condition

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 25, 2019

@mm4tt - would your PR work differently in kubemark than in GCE tests? It seems that you fixed gce, but kubemark is still failing.

@mm4tt

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

Doh, forgot that for kubemark we have separate ServiceMonitors. kubernetes/perf-tests#689 should fix the issue

@mm4tt

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

/retest

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2019

/priority important-soon

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2019

/assign @mikedanese @wojtek-t I think this is ready for final review. Thanks!

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

This looks reasonable to me, but I would like @mikedanese to also take a look.

For now I'm temporarily holding it - I would like this to merge on Sunday when there are less merges to allow easier comparison on 5k-scalability tests.
I will hold cancel on Sunday morning.
/hold

@wojtek-t wojtek-t self-assigned this Jul 26, 2019

@wenjiaswe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2019

@wojtek-t Thanks! Just want to let you know that I have ran gce_5000 test and the test passed. I just sent you the result:) But still, I have no problem waiting till Sunday for some more 5k comparison:)

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

@wojtek-t Thanks! Just want to let you know that I have ran gce_5000 test and the test passed. I just sent you the result:) But still, I have no problem waiting till Sunday for some more 5k comparison:)

Thanks - I though you've run only 2k-node tests. In that case, I'm less concerned.
But I would like someone from security team (@mikedanese ?) to take a look if what you did is what we want from security POV.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

I just realized that this is almost a copy of #74690 which was already approved by @mikedanese
So let's try to go ahead with it.

/lgtm
/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm and removed do-not-merge/hold labels Jul 29, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wenjiaswe, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

commented Jul 29, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot

This comment has been minimized.

Copy link

commented Jul 29, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

commented Jul 29, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 3be827e into kubernetes:master Jul 29, 2019

23 checks passed

cla/linuxfoundation wenjiaswe authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Jul 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.