Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. #45748

Merged
merged 2 commits into from
Jun 4, 2017

Conversation

mml
Copy link
Contributor

@mml mml commented May 12, 2017

This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!

Fixes #38336

@kow3ns @bdbauer @krousey @erictune @maisem @davidopp

On GCE, node upgrades will now respect PodDisruptionBudgets, if present.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 12, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 12, 2017
@mml
Copy link
Contributor Author

mml commented May 12, 2017

/assign krousey

@mml mml added this to the v1.7 milestone May 12, 2017
@mml
Copy link
Contributor Author

mml commented May 12, 2017

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels May 12, 2017
@mml mml removed the release-note-none Denotes a PR that doesn't merit a release note. label May 12, 2017
@mml
Copy link
Contributor Author

mml commented May 12, 2017

No, actually probably need a release note.

@mml mml assigned krousey and unassigned gmarek and mwielgus May 12, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2017
@mml
Copy link
Contributor Author

mml commented May 12, 2017

OK! This is now passing for me. Whee!

@mml mml force-pushed the reliable-node-upgrade branch 2 times, most recently from def5ce8 to 0ee98f4 Compare May 15, 2017 21:17
@@ -73,7 +73,7 @@ func (t *ServiceUpgradeTest) Test(f *framework.Framework, done <-chan struct{},
case MasterUpgrade:
t.test(f, done, true)
default:
t.test(f, done, false)
t.test(f, done, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be the default for two reasons.

  1. The enum could expand to upgrade types that you don't intend to be tested this way.
  2. Some cloud providers might not support a graceful node upgrade like this. There's should be a check for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mml
Copy link
Contributor Author

mml commented May 15, 2017

@krousey PTAL

@mml mml added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels May 15, 2017
@mml
Copy link
Contributor Author

mml commented May 15, 2017

@krousey it's squashed now and should have a decent release note.

@krousey
Copy link
Contributor

krousey commented May 16, 2017

I only did a quick review earlier. Here's a more in-depth review. Overall it looks good.

fi
echo "Rolling update ${update} is still in ${result} state."
sleep 10
echo "== Waiting for ${instance} to become young. ==" >&2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Become young? That's... confusing. Why not wait for recreation to complete?

sleep 10
echo "== Waiting for ${instance} to become young. ==" >&2
while true; do
age=$(( $(date +%s) - $(date --date=$("${KUBE_ROOT}/cluster/kubectl.sh" get node "${instance}" --output='jsonpath={.metadata.creationTimestamp}' ) +%s) ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very dense to read. I would rather something that checked the recreate operation's status. This can't really tell the difference between a recreate and a system crash/restart. Also, the 60 second window just makes me uncomfortable with timing flakiness. If this comes really quick after starting the cluster, you may mistake the instance as having been restarted. If for some reason this takes too long, then you might miss the window.

What about checking the status of the recreate operation? Something like

gcloud compute operations list --limit=1 --filter="targetLink:*${instance} AND operationType:recreate" --format='get(status)'

should print "DONE" when it's complete. It's probably as dense, but avoids timing ambiguities and date math. You shouldcheck with gcloud operations list --format=json to make sure I got the filters right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth noting that this currently works because kubelet deletes and recreates the node object on upgrade. This would not work if kubelet switched to just updating the node object.

Copy link
Contributor Author

@mml mml May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the timing flakiness, and I agree I'd like to be sure that the instance has been recreated.

But both of those can happen and it can still be the case that the kubelet hasn't deleted and recreated the node object, so we still need to make sure that the node is Ready, and we need to make sure we're checking that the new node is healthy, because we may observe the old node with the same name still being Ready.

I think I can do this if the script itself deletes the node once it's been drained. The order would be

  1. kubectl drain
  2. kubectl delete node
  3. gcloud ... recreate-instances
  4. poll gcloud compute operations list
  5. poll kubectl get node waiting for SchedulingDisabled=false and Ready=true.

This depends on a different assumption about kubelet's behavior, that it won't re-add itself after I delete it. That seems to be the case if nothing happens on the node, but if kubelet randomly gets restarted (or changes its behavior), it will get re-added, and it will no longer be drained. IOW, deleting the node opens the door for the node to acquire pods again after the code thinks it's drained. A different race.

So having written all that, I'm inclined to poll the operation. You're right that seems cleaner. But I still need a way to distinguish old node from new node, and I can't think of a better way than polling for age. I guess I could poll for version with kubectl get -o jsonpath='{.status.nodeInfo}' node/foo. That would work as long as we're sure that we're actually changing kubelet versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I didn't mean not checking the nodes reported status. I just meant that checking the operation status is preferred to checking timestamps to see if the node rebooted. You still have all the other issues you mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's another problem with the operation polling approach. The recreate isn't a single operation. It's two, apparently kicked off in succession asynchronously. So I need to wait for the insert operation to be DONE. But I need to make sure it's the right insert operation. Consider the output below if I'm working on rx0c. I don't want to accidentally match on the insert from 05-12 in this case if it's today.

This looks like I need more timestamp math. WDYT?

% gcloud compute operations list --limit=6 --filter='targetLink:us-central1-f/instances/e2e-minion-group-*' --format='table[box](insertTime,startTime,endTime,operationType,targetLink,status)' --sort-by='~insertTime'                         
┌───────────────────────────────┬───────────────────────────────┬───────────────────────────────┬────────┬───────────────────────────────────────────────┬────────┐
│           TIMESTAMP           │           START_TIME          │            END_TIME           │  TYPE  │                     TARGET                    │ STATUS │
├───────────────────────────────┼───────────────────────────────┼───────────────────────────────┼────────┼───────────────────────────────────────────────┼────────┤
│ 2017-05-16T15:18:37.154-07:00 │ 2017-05-16T15:18:37.709-07:00 │ 2017-05-16T15:18:57.966-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE   │
│ 2017-05-16T15:17:26.260-07:00 │ 2017-05-16T15:17:26.599-07:00 │ 2017-05-16T15:18:35.020-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE   │
│ 2017-05-16T15:14:30.903-07:00 │ 2017-05-16T15:14:31.369-07:00 │ 2017-05-16T15:14:52.354-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-6lw8 │ DONE   │
│ 2017-05-16T15:13:11.547-07:00 │ 2017-05-16T15:13:11.905-07:00 │ 2017-05-16T15:14:28.942-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-6lw8 │ DONE   │
│ 2017-05-12T16:20:35.272-07:00 │ 2017-05-12T16:20:35.990-07:00 │ 2017-05-12T16:20:54.404-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE   │
│ 2017-05-12T16:19:26.596-07:00 │ 2017-05-12T16:19:26.908-07:00 │ 2017-05-12T16:20:33.207-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE   │
└───────────────────────────────┴───────────────────────────────┴───────────────────────────────┴────────┴───────────────────────────────────────────────┴────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't think it would be sufficient to grab the latest insert op?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can watch for the UID/externalID of the node object to change. Or you can keep fetching the instance from GCE and wait for the ID to change. This is the authoritative way to know that the instance did indeed get recreated.
Once the ID changes, you need to wait for the node to enter Ready state.

gcloud compute instances describe $INSTANCE --format='get(id)'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krousey, no because the "latest insert op" will actually be an old insert op for the first minute or so (maybe longer).

@maisem this is good. I cache the id before I kick off recreate-instances, then poll until that id changes. Then I poll get nodes until the externalID matches. Then I can start polling node state.

% cluster/kubectl.sh get no e2e-minion-group-rx0c --output=jsonpath='{.spec.externalID}'
7990879616757056595
% gcloud compute instances describe e2e-minion-group-rx0c --format='get(id)'
7990879616757056595

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maisem I think you're the best one to review this file. Can you PTAL?

@@ -490,13 +501,18 @@ func (j *ServiceTestJig) newRCTemplate(namespace string) *v1.ReplicationControll
Labels: j.Labels,
},
Spec: v1.ReplicationControllerSpec{
Replicas: func(i int) *int32 { x := int32(i); return &x }(1),
Replicas: func(i int) *int32 { x := int32(i); return &x }(2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... not your fault, but could use just var replicas int32 = 2 at the top of this function and use &replicas here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Selector: j.Labels,
Template: &v1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: j.Labels,
},
Spec: v1.PodSpec{
Affinity: &v1.Affinity{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is starting to change all of the services e2e tests. I don't know enough about them to say that this is OK or not. You should find the people who own those tests to review this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is PROBABLY alright, but would we be safer to only set this for a testcase that is explicitly targetting this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually a hook in RunOrFail where I can add this logic (tweak function). I'll externalize it and only set it up for this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. @krousey can you take a look at how I structured it? I used the existing tweak hook, but the implementation is a little ugly.

@mml
Copy link
Contributor Author

mml commented May 25, 2017

cc @monopole

@@ -36,6 +36,8 @@ type ServiceUpgradeTest struct {

func (ServiceUpgradeTest) Name() string { return "service-upgrade" }

func shouldTestPDBs() bool { return framework.ProviderIs("gce") }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be enabled for GKE as well.

@maisem
Copy link

maisem commented May 25, 2017

@mml - upgrade.sh changes LGTM.

@mml mml assigned maisem and unassigned krousey Jun 2, 2017
Copy link

@maisem maisem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after comment fix.

@@ -72,6 +79,9 @@ func (t *ServiceUpgradeTest) Test(f *framework.Framework, done <-chan struct{},
switch upgrade {
case MasterUpgrade:
t.test(f, done, true)
case NodeUpgrade:
// Node upgrades should test during disruption only on GCE for now.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCE/GKE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mml mml dismissed krousey’s stale review June 2, 2017 00:12

Addressed.

@mml mml added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2017
@vishh
Copy link
Contributor

vishh commented Jun 2, 2017

/approve

@lavalamp
Copy link
Member

lavalamp commented Jun 2, 2017

/approve since @krousey is on vacation

(although someone else should review the shell :) )

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2017
mml added 2 commits June 1, 2017 17:58
Respect PDBs during node upgrades and add test coverage to the
ServiceTest upgrade test.  Modified that test so that we include pod
anti-affinity constraints and a PDB.
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2017
@maisem
Copy link

maisem commented Jun 2, 2017

@k8s-bot unit test this

@maisem
Copy link

maisem commented Jun 2, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, maisem, mml, vishh

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@fejta
Copy link
Contributor

fejta commented Jun 2, 2017

@k8s-bot pull-kubernetes-federation-e2e-gce test this
ref: #46827

1 similar comment
@shashidharatd
Copy link

@k8s-bot pull-kubernetes-federation-e2e-gce test this
ref: #46827

@mml
Copy link
Contributor Author

mml commented Jun 2, 2017

@shashidharatd thanks!

@mml
Copy link
Contributor Author

mml commented Jun 2, 2017

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this #38275

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 2, 2017

@mml: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
Jenkins unit/integration c71165ad34cd33873a77c8d3975c038cdfe91658 link @k8s-bot unit test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mml
Copy link
Contributor Author

mml commented Jun 2, 2017

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 3837d95 into kubernetes:master Jun 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet