Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use kubectl-with-retry on pause & resume #26243

Merged
merged 1 commit into from
May 26, 2016

Conversation

metral
Copy link
Contributor

@metral metral commented May 25, 2016

attempts to fix #25645 by using kubectl-with-retry on rollout {pause,resume} (resume is for safe measures) instead of kubectl directly, as is done with other rollout {pause,resume} tests in this same script.

@k8s-bot
Copy link

k8s-bot commented May 25, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-bot
Copy link

k8s-bot commented May 25, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

1 similar comment
@k8s-bot
Copy link

k8s-bot commented May 25, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels May 25, 2016
@eparis
Copy link
Contributor

eparis commented May 25, 2016

ok to test

@metral
Copy link
Contributor Author

metral commented May 25, 2016

The e2e and Node e2e tests are hitting the following errors:

ERROR: (gcloud.compute.firewall-rules.delete) Some requests did not succeed:
 - The resource 'projects/kubernetes-jenkins-pull/global/firewalls/e2e-gce-agent-pr-3-0-minion-e2e-gce-agent-pr-3-0-http-alt' was not found
ERROR: (gcloud.compute.firewall-rules.delete) Some requests did not succeed:
 - The resource 'projects/kubernetes-jenkins-pull/global/firewalls/e2e-gce-agent-pr-3-0-minion-e2e-gce-agent-pr-3-0-nodeports' was not found
Bringing down cluster using provider: gce
ERROR: gcloud crashed (OSError): [Errno 39] Directory not empty: '/usr/local/share/google/google-cloud-sdk.staging/lib/googlecloudsdk/third_party/apis'

If you would like to report this issue, please run the following command:
  gcloud feedback
Project: kubernetes-jenkins-pull
Zone: us-central1-f
INSTANCE_GROUPS=
NODE_NAMES=
Bringing down cluster
ERROR: gcloud failed to load: No module named argcomplete
WARNING: You are creating a legacy network. Using --mode=legacy will be required in future releases.
ERROR: (gcloud.compute.networks.create) Some requests did not succeed:
 - Quota 'ROUTES' exceeded.  Limit: 170.0

2016/05/25 06:28:42 e2e.go:206: Error running up: exit status 1
2016/05/25 06:28:42 e2e.go:202: Step 'up' finished in 47.621947144s
2016/05/25 06:28:42 e2e.go:114: Error starting e2e cluster. Aborting.

@eparis
Copy link
Contributor

eparis commented May 25, 2016

@k8s-bot e2e test this flake: #26271

@@ -1087,13 +1087,11 @@ __EOF__
kube::test::get_object_assert deployment "{{range.items}}{{$deployment_image_field}}:{{end}}" "${IMAGE_NGINX}:${IMAGE_NGINX}:"
kube::test::if_has_string "${output_message}" "Object 'Kind' is missing"
## Pause the deployment
output_message=$(! kubectl rollout pause -f hack/testdata/recursive/deployment --recursive 2>&1 "${kube_flags[@]}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you change the title comment of this test too (L1077)? These tests are about "--recursive" and the "rollout" commands, but not about "Rollback a deployment". It's somewhat misleading. Changing each sub-title would help too (maybe add "recursively" to the end of each sub-title, and mention that we expect error message "Object 'Kind' is missing").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing

@janetkuo janetkuo added release-note-none Denotes a PR that doesn't merit a release note. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. kind/flake Categorizes issue or PR as related to a flaky test. and removed release-note-label-needed labels May 25, 2016
kube::test::get_object_assert deployment "{{range.items}}{{.spec.paused}}:{{end}}" "true:true:"
kube::test::if_has_string "${output_message}" "Object 'Kind' is missing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we still want this check?

Copy link
Contributor Author

@metral metral May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but I removed it as it was the only idea I had for a solution to attempt to triage the flake quickly by using kubectl-with-retry instead of kubectl.

However, in doing so, kubectl-with-retry directs stderr to a file that is removed once it is done executing, and does not allow me to review it for this missing object as I was doing

Copy link
Member

@janetkuo janetkuo May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a retry function with customized error message check, or we can pass the expected error message to that retry function. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need 2>&1 when we're using kubectl-with-retry?

Copy link
Member

@janetkuo janetkuo May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe creating a new retry function with your own check is easier, and we can add a comment saying something like "remove this function once #20437 is fixed"

@metral
Copy link
Contributor Author

metral commented May 25, 2016

The e2e test is hitting flake #26210

@janetkuo
Copy link
Member

@k8s-bot e2e test this issue: #IGNORE

@metral
Copy link
Contributor Author

metral commented May 25, 2016

@janetkuo I've gone ahead and addressed your feedback:

  • cleared up test title comments to describe the tests more effectively
  • removed 2>&1 from kubectl-with-retry as it wasn't needed
  • modified kubectl-with-retry slightly to allow me to preserve the error file for review/usage in my tests, allowing me to put the checks that I originally took out, back in

PTAL

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 25, 2016
@janetkuo
Copy link
Member

Thanks, looks good now. Please squash and I'll apply the tag.

@metral
Copy link
Contributor Author

metral commented May 25, 2016

@janetkuo all set, thanks!

@janetkuo janetkuo added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 25, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented May 26, 2016

GCE e2e build/test passed for commit 54e6d23.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 128e7f1 into kubernetes:master May 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test-cmd flake: kubectl rollout pause deployments --recursive
6 participants