Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gce/upgrade.sh: Prompt if etcd version is unspecified. #57121

Merged
merged 1 commit into from Dec 14, 2017

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Dec 13, 2017

We shouldn't upgrade etcd without first warning the user that some etcd
version transitions can't be undone. We don't know what version the user
currently has, so we require either an explicit version and image, or an
interactive acknowledgement of this caveat.

This is modeled after the STORAGE_MEDIA_TYPE prompt just above.

@BenTheElder @krousey @wojtek-t If we do this, it would probably require all upgrade jobs to explicitly specify the etcd version, or otherwise add some variable to bypass this check. That's unfortunate coupling, but I think we need this or something like it to protect users. What do you think?

Fixes #57013

/hold
until we decide how to fix upgrade/downgrade e2e tests to account for this.

NONE

@enisoc enisoc added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/gcp sig/release Categorizes an issue or PR as relevant to SIG Release. labels Dec 13, 2017
@enisoc enisoc added this to the v1.9 milestone Dec 13, 2017
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 13, 2017
Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me.

@BenTheElder
Copy link
Member

This PR SGTM, agree that more coupling is unfortunate but it seems like if we want to force a version we should definitely set it in the job config(s). I don't think this is unreasonable at all, especially since it seems it depends on which version(s) we are upgrading between.

@BenTheElder
Copy link
Member

/retest
garbage collector tests are so flaky :(

@wojtek-t
Copy link
Member

/approve no-issue

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2017
@dims
Copy link
Member

dims commented Dec 13, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2017
@enisoc
Copy link
Member Author

enisoc commented Dec 13, 2017

@BenTheElder Can you help me figure out all the places we'd need to plumb in etcd versions to the test jobs, so this PR won't break them?

@BenTheElder
Copy link
Member

@enisoc will do, it should just be jobs doing master upgrades right?

@enisoc
Copy link
Member Author

enisoc commented Dec 13, 2017

@BenTheElder Thanks! Yeah, master upgrade or downgrade. The new check in this PR shouldn't trigger for node-only migrations.

@BenTheElder
Copy link
Member

I'm pretty certain I have all the jobs that upgrade or downgrade, but I'm not sure which etcd versions they need to pin to yet, @krousey previously pinned some jobs here: kubernetes/test-infra#5909 kubernetes/test-infra#5920

@enisoc
Copy link
Member Author

enisoc commented Dec 13, 2017

We have a couple options for which etcd versions to pin for each job: either the 1.8 default (etcd 3.0.17) or the 1.9 default (etcd 3.1.10).

Ideally we should also pin the etcd version during initial kube-up, so we know for sure what we're starting with.

I think these are the main scenarios we're going to recommend to users:

  • upgrade 1.8 -> 1.9 (allow etcd upgrade)
    • pin to 3.0.17 to bring up 1.8 cluster
    • pin to 3.1.10 to upgrade to 1.9
  • downgrade 1.9 -> 1.8 (stay on new etcd)
    • pin to 3.1.10 to bring up 1.9 cluster
    • pin to 3.1.10 to downgrade to 1.8

If we have enough jobs to cover other scenarios as well, we could do these:

  • upgrade 1.8 -> 1.9 (stay on old etcd)
    • pin to 3.0.17 to bring up 1.8 cluster
    • pin to 3.0.17 to upgrade to 1.9
  • downgrade 1.9 -> 1.8 (stay on old etcd)
    • pin to 3.0.17 to bring up 1.9 cluster
    • pin to 3.0.17 to downgrade to 1.8

@enisoc
Copy link
Member Author

enisoc commented Dec 13, 2017

Along those lines, do the -cluster upgrade/downgrade tests also incidentally do master upgrades? Or do they only touch nodes?

@BenTheElder
Copy link
Member

I believe they do master as well per:

// ClusterUpgrade indicates that both master and nodes are
// being upgraded.
ClusterUpgrade

(these jobs have --ginkgo.focus=\\[Feature:ClusterUpgrade\\] ...)

@BenTheElder
Copy link
Member

Also we have upgrade/downgrade jobs for 1.6/1.7 at the moment, but we can probably safely ignore 1.6 related tests...

We shouldn't upgrade etcd without first warning the user that some etcd
version transitions can't be undone. We don't know what version the user
currently has, so we require either an explicit version and image, or an
interactive acknowledgement of this caveat.

This is modeled after the STORAGE_MEDIA_TYPE prompt just above.
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2017
@enisoc
Copy link
Member Author

enisoc commented Dec 13, 2017

Please take another look at the changes.

After talking with @BenTheElder, we realized that hard-coding the versions is more complex than I thought. In particular, we don't currently have the ability to separately specify the starting and ending etcd version for an upgrade. Adding this capability would require coordinating changes across both the test code and the job configs.

As a workaround, we decided instead to add a variable to let tests bypass the check, as if the user has said, "Yes, continue with the default upgrade behavior. I understand and accept the consequences." We believe this is a more realistic scenario than testing only the case where the user pins the etcd version prior to upgrading.

Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2017
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder, dims, enisoc, wojtek-t

Associated issue: #57013

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Current

@BenTheElder @dims @enisoc @gmarek @vishh

Note: This pull request is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required
Pull Request Labels
  • sig/gcp sig/release: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/cleanup: Adding tests, refactoring, fixing old bugs.
Help

@enisoc enisoc added retest-not-required and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 13, 2017
@enisoc
Copy link
Member Author

enisoc commented Dec 14, 2017

/hold cancel

@enisoc enisoc merged commit bba84d7 into kubernetes:master Dec 14, 2017
@enisoc enisoc deleted the gce-upgrade-warning branch December 14, 2017 00:42
@xiangpengzhao
Copy link
Contributor

@enisoc since this PR gets merged, should we revert kubernetes/test-infra#5909 (and kubernetes/test-infra#5920)
cc @krousey

@enisoc
Copy link
Member Author

enisoc commented Dec 14, 2017

@xiangpengzhao No, we still need that fix. This PR on its own would have no effect on e2e tests (it wouldn't have fixed the problem there). This is just adding a warning for users.

@xiangpengzhao
Copy link
Contributor

@enisoc thanks! I misunderstood sth here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/release Categorizes an issue or PR as relevant to SIG Release. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gce-1.9-1.8-downgrade fails due to etcd crashloopbacking
10 participants