Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node upgrades: basic mechanism #6082

Closed
mbforbes opened this issue Mar 27, 2015 · 8 comments
Closed

Node upgrades: basic mechanism #6082

mbforbes opened this issue Mar 27, 2015 · 8 comments
Assignees
Labels
area/upgrade priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@mbforbes
Copy link
Contributor

See issue #6079 as the roll-up for node upgrades, including the specification of the minimum requirement that this issue will fulfill, as well as improvements that will come after.

This is the worklist item for a minimum viable node upgrade mechanism for 1.0.

Plan:

  • in-place: yes; restart the node
  • provider: GCE, then GKE

Outline:

Provisioning a new node is blocked on allowing new nodes to dynamically join a running cluster (#6087). In the mean time, the mechanism will do an in-place upgrade, blowing away as much as is necessary.

  • For each node:
    1. Stop running binaries
    2. Get the node back to a "blank" state as much as possible. This involves trying to un-configure what salt did. Ideally, this looks like running salt in reverse.
    3. Grab the latest config from the metadata server (including the new startup script)
    4. Upgrade the kernel/OS
    5. Reboot the machine, which also runs the new startup script, which also runs salt
@mbforbes mbforbes added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/upgrade labels Mar 27, 2015
@mbforbes mbforbes self-assigned this Mar 27, 2015
@mbforbes mbforbes added this to the v1.0 milestone Mar 27, 2015
@mbforbes mbforbes mentioned this issue Mar 27, 2015
10 tasks
@mbforbes mbforbes added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Mar 27, 2015
@alex-mohr
Copy link
Contributor

I thought salt was supposed to drive things to a specified end-state. Ideally, each salt stanza is idempotent, such that running it from e.g. base containerVM results in the same state as re-running it a second time on its previous output?

If so, would it also be reasonable to audit those salt blocks and make sure they can also take outputs from previous versions of the salt config and drive the node to the same state as running against bare container VM? Would that be more robust than an "unsalt" config?

That would be something like:

  • For each node:
    1. Stop running binaries
    2. Grab the latest salt config from the metadata server
    3. Run salt

I wonder if we really wanted to bet on salt, would we make each stanza also responsible for stopping any relevant server before it twiddles the config if there's a change? Might be too prone to race conditions, though, so pausing the various servers might also be easier.

@alex-mohr
Copy link
Contributor

I realize I may have misunderstand: does "stop running binaries" mean e.g. kubelet, or do you mean all containers as well?

Also, if the kernel version is changing, will need a reboot in there somewhere...?

@multilinear
Copy link

This sounds much much harder than just destroying and re-creating the machine from scratch. Idempotency in machine setup is a minefield none have yet crossed. There are so many gotchas, and it won't work at all if you happen to hit an inconsistent intermediate state, if a machine hangs, etc. There are a lot of ways it can break.

Given that kubernetes is designed primarily for virtualized environments... What's the advantage of trying to use already set up machines instead of just chucking them and getting new ones?

@multilinear
Copy link

Basically, I'm arguing that fast machine setup is a far easier problem than idempotent machine setup.

@zmerlynn
Copy link
Member

We wanted to start with the "throw each node away" plan, but as pointed out
in the initial comment, it's blocked by dynamic clustering right now. We
can't add nodes dynamically for a few reasons (on GCE, for example, they
need a cbr0 allocation). Those are going away soon.

At the limit, though, we believe that people aren't going to want to deal
with downtime, so this (upgrade existing) may be the plan we need anyways.
On Mar 30, 2015 7:08 AM, "Matthew Brewer" notifications@github.com wrote:

Basically, I'm arguing that fast machine setup is a far easier problem
than idempotent machine setup.


Reply to this email directly or view it on GitHub
#6082 (comment)
.

@multilinear
Copy link

Aaahh, thanks for the explanation. That makes some sense.

It would certainly be nice if you can get it to work... it's just that it working 98% would be very very bad.

@zmerlynn
Copy link
Member

@alex-mohr: Salt is only as state aware as the stanza files dictate. The biggest problem is that this is very brittle. If you do something like add a file in version X and forget to add a file.delete for it in version Y, but it's no longer applicable and yet still has a systemic effect, you just failed. So now you have to figure out how to test that, etc.

This is why I was talking about crazy things like ephemeral boot disks. But let me finish #6070 so we stop talking about generalities.

@mbforbes
Copy link
Contributor Author

mbforbes commented Apr 1, 2015

After a couple days of implementation, we discussed this path yesterday, and it turns out this actually isn't what we want to build.

The following matrix has rows of "what we're going to upgrade" and columns of "how:"

in-place new node
kubelet, kube-proxy #6099 ---
kubelet, kube-proxy, docker, kernel/OS #6082 #6088

(This issue is the one in the bottom-left.)

The problem is that because we will have set version tuples (#4855), doing a kernel/OS upgrade with apt-get dist-upgrade or a Docker update will likely bring us into an unsupported version tuple without explicit mechanisms to limit what can be upgraded to. Of course, #6099 has exactly the same problem in that newer versions of kubelet and kube-proxy must be supported for the same Docker, kernel/OS version, but the scope is smaller and we control all of the pieces.

There are also the problems that @multilinear pointed out (and heavily discussed in #3333) about leftover node state. #6099 won't entirely avoid this either, but given we'll only be upgrading kukbe-* binaries, the considerations will be smaller in scope.

I'm closing this issue now as it's a non-priority, and moving work over to #6099 (as #6088 is still blocked by #6087).

@mbforbes mbforbes closed this as completed Apr 1, 2015
@mbforbes mbforbes removed this from the v1.0 milestone Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrade priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests

4 participants