Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver: change default reconciler to LeaseEndpoint #58474

Conversation

rphillips
Copy link
Member

@rphillips rphillips commented Jan 18, 2018

What this PR does / why we need it:

Graduates the lease-endpoint-reconciler flag to beta, and defaults the endpoint reconciler to LeaseEndpoint instead of MasterCount.

  • Default Lease Endpoint Reconciler to on

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #57617

Special notes for your reviewer:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 18, 2018
@rphillips
Copy link
Member Author

/test pull-kubernetes-bazel-test

@rphillips
Copy link
Member Author

/test pull-kubernetes-e2e-kops-aws

@rphillips rphillips force-pushed the feat/change_default_reconciler_type branch from 12eff62 to 8e1b2c7 Compare January 18, 2018 22:07
@rphillips
Copy link
Member Author

/cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jan 22, 2018
@errordeveloper
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2018
@errordeveloper
Copy link
Member

/assign @deads2k

@jamiehannaford
Copy link
Contributor

@kubernetes/sig-testing-misc Are there any issues or recommended way to test HA apiserver?

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jan 22, 2018
@rphillips
Copy link
Member Author

@jamiehannaford There are not any test suites written for HA E2E API server testing.

do we need to block this on HA E2E testing?

cc @kubernetes/sig-api-machinery-api-reviews

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jan 22, 2018
@liggitt
Copy link
Member

liggitt commented Jan 23, 2018

do we need to block this on HA E2E testing?

Honestly, the lack of any HA e2e tests should have blocked the existing master count reconciler. I'd expect at least a minimal e2e test before making changes in this area.

For those using the existing reconciler, how does an upgrade work? What does a rolling upgrade look like with a mix of reconcilers running? Given that upgrades are a tricky thing to get right with HA, that seems like another key area to have an e2e test in place.

@errordeveloper
Copy link
Member

I hear that kops issue effects many other PRs. I wonder who we should ask about the status of it... (Ping @justinsb).

@jamiehannaford
Copy link
Contributor

@rphillips Maybe it's worth pinging some folks in SIG testing (meeting info here) and ask whether no pre-existing e2e tests should block this. They might also have ideas about how we can bootstrap some initial test coverage.

@marun
Copy link
Contributor

marun commented Jan 23, 2018

@liggitt Could this behavior be validated in isolation in an integration test in advance of the more involved setup that would be required for e2e?

@rphillips
Copy link
Member Author

@jamiehannaford I should be able to attend the sig-testing meeting today.

@errordeveloper
Copy link
Member

I also wonder where and how has this been used/tested. Was there a flag or feature gate that some folks have tried out?

@rphillips
Copy link
Member Author

@errordeveloper the api server has a --endpoint-reconciler-type flag documented alpha.

@liggitt
Copy link
Member

liggitt commented Feb 19, 2018

@liggitt Could this behavior be validated in isolation in an integration test in advance of the more involved setup that would be required for e2e?

possibly. I could envision an integration test that started 3 API servers in sequence, with various reconcilers, and monitored the kubernetes service during startup to ensure the transitions are what we'd expect, and the end state is what we want.

  • 0 servers -> rolling start of 3 master count -> rolling restart to 3 master count
  • 0 servers -> rolling start of 3 master count -> rolling restart of 3 leasing with downtime longer than lease time
  • 0 servers -> rolling start of 3 leasing -> rolling restart of 3 leasing with downtime longer than lease time

etc

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Apr 30, 2018
@rphillips rphillips force-pushed the feat/change_default_reconciler_type branch from cfdf794 to f2ef723 Compare April 30, 2018 14:10
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 30, 2018
@rphillips
Copy link
Member Author

/test pull-kubernetes-kubemark-e2e-gce

@@ -94,7 +94,7 @@ func NewServerRunOptions() *ServerRunOptions {
EnableLogsHandler: true,
EventTTL: 1 * time.Hour,
MasterCount: 1,
EndpointReconcilerType: string(reconcilers.MasterCountReconcilerType),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the help text on the apiserver-count flag to indicate it only applies when using --endpoint-reconciler-type=master-count?

do we also want to mark the master-count reconciler deprecated this release, for removal in 1.13? I can't think of a reason we'd want to keep it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put that into this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the release note as well.

@rphillips rphillips force-pushed the feat/change_default_reconciler_type branch from f2ef723 to 0227534 Compare April 30, 2018 15:07
@rphillips
Copy link
Member Author

updated the PR with the comment change on apiserver-count.

@rphillips
Copy link
Member Author

/retest

@rphillips
Copy link
Member Author

flakiness this morning with tests

@liggitt
Copy link
Member

liggitt commented Apr 30, 2018

opened #63302 for the volume binding stress flake

@liggitt
Copy link
Member

liggitt commented Apr 30, 2018

/retest

@liggitt
Copy link
Member

liggitt commented Apr 30, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 30, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: errordeveloper, liggitt, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit d39eac9 into kubernetes:master Apr 30, 2018
@ixdy
Copy link
Member

ixdy commented May 2, 2018

I'm a little worried this PR may have caused the verify job to start failing, but I can't conclusively determine that yet.

Please take a look at #63378 for details and comment if you can provide any additional insight. Thanks!

ixdy added a commit to ixdy/kubernetes that referenced this pull request May 2, 2018
…e_default_reconciler_type"

This reverts commit d39eac9, reversing
changes made to 9feb486.
@liggitt
Copy link
Member

liggitt commented May 2, 2018

Reverting in #63380 until the cause for the apiserver not exiting is found

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 2, 2018
@liggitt
Copy link
Member

liggitt commented May 3, 2018

issue fixed, re-enabled in #63383

k8s-github-robot pushed a commit that referenced this pull request May 4, 2018
Automatic merge from submit-queue (batch tested with PRs 63315, 63383, 63318, 63439). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Re-enable lease reconciler, fix shutdown race

Fixes #63378
Fixes #57617

* Fixes the openapi script to wait for the apiserver on shutdown (like all the other scripts do)
* Fixes the apiserver shutdown to not hang forever if the kubernetes service reconciler cannot persist to etcd
* Readds #58474 to make the default the lease reconciler

```release-note
kube-apiserver: the default `--endpoint-reconciler-type` is now `lease`. The `master-count` endpoint reconciler type is deprecated and will be removed in 1.13.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graduate the Lease Endpoint Reconciler to beta in v1.11
9 participants