Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple node-problem-detector release from kubernetes #73288

Merged
merged 2 commits into from
Feb 28, 2019

Conversation

wangzhen127
Copy link
Member

@wangzhen127 wangzhen127 commented Jan 24, 2019

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

This PR decouples node-problem-detector (NPD) release from kubernetes. Before, NPD version is hard coded in configure.sh and node e2e tests.This PR

  • allows configuring node-problem-detector release path, version, hash and custom flags on GCI clusters (GKE and GCE)
  • adds NPD cluster e2e test
  • allows configuring NPD image version in node e2e test.
  • bumps default NPD image version in node e2e test and fixes the test

With this PR, we can create presubmit and CI jobs for NPD PRs (kubernetes/node-problem-detector#236). This will prevent future test breaks and make sure NPD is working correctly.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Node-Problem-Detector configuration is now decoupled from the Kubernetes release on GKE/GCE.

Manual cluster test:
Run the following command with this PR using an old version of NPD and customized release path and flags. Current NPD used is v0.6.0.

NODE_PROBLEM_DETECTOR_VERSION="v0.5.0" \
NODE_PROBLEM_DETECTOR_TAR_HASH="650ecfb2ae495175ee43706d0bd862a1ea7f1395" \
NODE_PROBLEM_DETECTOR_RELEASE_PATH=https://storage.googleapis.com/zhenw-gke-dev-public \
NODE_PROBLEM_DETECTOR_CUSTOM_FLAGS="--v=2 --logtostderr --system-log-monitors=/home/kubernetes/node-problem-detector/config/kernel-monitor.json --custom-plugin-monitors=/home/kubernetes/node-problem-detector/config/kernel-monitor-counter.json --port=20256" \
kubetest --build --up \
--provider=gce \
--gcp-project=zhenw-gke-dev \
--gcp-zone=us-central1-c

After cluster is up, manually verify a node:

  1. kube-env of the VM instance: environment variables are correctly set.
  2. sudo journalctl -u kube-node-installation.service: correct NPD version is downloaded.
  3. sudo cat /etc/systemd/system/node-problem-detector.service: NPD flags are set correctly.
  4. sudo journalctl -u node-problem-detector.service: NPD is running fine.

Node E2E test:
This PR bumps the version to v0.6.2. Run the following command with this PR using a different version of NPD.

make test-e2e-node FOCUS="NodeProblemDetector" SKIP="" REMOTE=true CLEANUP=true DELETE_INSTANCES=true EXTRA_ENVS="NODE_PROBLEM_DETECTOR_IMAGE=k8s.gcr.io/node-problem-detector:v0.5.0"
  1. Verify the tests pass.
  2. Verify the log that NPD v0.5.0 is used (instead of v0.6.2), which indicates that we can customize the NPD version in node e2e tests.

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 24, 2019
@wangzhen127
Copy link
Member Author

/sig node
/assign @Random-Liu

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 24, 2019
@wangzhen127 wangzhen127 force-pushed the npd-config branch 2 times, most recently from 7c584dc to 84697b1 Compare January 28, 2019 07:20
@wangzhen127
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Feb 1, 2019
@wangzhen127
Copy link
Member Author

/retest

@wangzhen127
Copy link
Member Author

/test pull-kubernetes-e2e-gke

@wangzhen127
Copy link
Member Author

/retest

@wangzhen127
Copy link
Member Author

Also added E2E tests, passing on both GKE and GCE. PTAL

@wangzhen127 wangzhen127 force-pushed the npd-config branch 3 times, most recently from 2fb138e to 33a9414 Compare February 6, 2019 06:44
@wangzhen127
Copy link
Member Author

/retest

@wangzhen127
Copy link
Member Author

/test pull-kubernetes-e2e-gke

@Random-Liu Random-Liu added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 25, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 25, 2019
@Random-Liu
Copy link
Member

@dchen1107 for top level approval.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2019
@wangzhen127
Copy link
Member Author

/test pull-kubernetes-e2e-gke

@k8s-ci-robot
Copy link
Contributor

@wangzhen127: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gke 6df207b link /test pull-kubernetes-e2e-gke

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wangzhen127
Copy link
Member Author

/skip

@wangzhen127
Copy link
Member Author

Just fixed the merge conflict. @Random-Liu Can you take another look?

@Random-Liu
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 28, 2019
@dchen1107
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, Random-Liu, wangzhen127

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2019
@k8s-ci-robot k8s-ci-robot merged commit 02b8056 into kubernetes:master Feb 28, 2019
@wangzhen127 wangzhen127 deleted the npd-config branch February 28, 2019 16:24
k8s-ci-robot added a commit that referenced this pull request Mar 28, 2019
…73288-upstream-release-1.12

Automated cherry pick of #73288 to release-1.12: Decouple node-problem-detector release from kubernetes
k8s-ci-robot added a commit that referenced this pull request Apr 1, 2019
…73288-upstream-release-1.11

Automated cherry pick of #73288 to release-1.11: Decouple node-problem-detector release from kubernetes
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Apr 1, 2019
k8s-ci-robot added a commit that referenced this pull request Apr 4, 2019
…73288-upstream-release-1.13

Automated cherry pick of #73288 to release-1.13: Decouple node-problem-detector release from kubernetes
@wangzhen127
Copy link
Member Author

wangzhen127 commented Apr 10, 2019

This change will be available in Kubernetes v1.11.10+, v1.12.8+, v1.13.6+ and v1.14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants