Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install and use crictl in gce kube-up.sh #63357

Merged
merged 3 commits into from May 16, 2018

Conversation

@Random-Liu
Copy link
Member

Random-Liu commented May 2, 2018

Download and use crictl in gce kube-up.sh.

This PR:

  1. Downloads crictl v1.0.0-beta.0 onto the node, which supports CRI v1alpha2. We'll upgrade it to v1.0.0-beta.1 soon after the release is cut.
  2. Change kube-docker-monitor to kube-container-runtime-monitor, and let it use crictl to do health monitoring.
  3. Change e2e-image-puller to use crictl. Because of #63355, it doesn't work now. But in crictl v1.0.0-beta.1, we are going to statically link it, and the e2e-image-puller should work again.
  4. Use systemctl kill --kill-who=main instead of pkill, the reason is that:
    a. pkill docker will send SIGTERM to all processes including dockerd, docker-containerd, docker-containerd-shim. This is not a problem for Docker 17.03 CE, because containerd-shim in containerd 0.2.x doesn't exit with SIGERM (see code). However, containerd-shim in containerd 1.0+ does exit with SIGTERM (see code). This means that pkill docker and pkill containerd will kill all shim processes for Docker 17.11+ and containerd 1.0+.
    b. We can use pkill -x instead. However, docker systemd service name is docker, but daemon process name is dockerd. We have to introduce another environment variable to specify "daemon process name". Given so, it seems easier to just use systemctl kill which only requires systemd service name. systemctl kill --kill-who=main will make sure only main process receives SIGTERM.

Signed-off-by: Lantao Liu lantaol@google.com

/cc @filbranden @yujuhong @feiskyer @mrunalp @kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews

Release note:

Kubernetes cluster on GCE have crictl installed now. Users can use it to help debug their node. The documentation of crictl can be found https://github.com/kubernetes-incubator/cri-tools/blob/master/docs/crictl.md.
@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 2, 2018

/test pull-kubernetes-e2e-gke

/test pull-kubernetes-verify

@Random-Liu Random-Liu force-pushed the Random-Liu:install-and-use-crictl branch 2 times, most recently from 004cdf5 to 22d1a0f May 2, 2018

sed -i -e "s@{{ *container_runtime *}}@${CONTAINER_RUNTIME_NAME:-docker}@g" "${configmap_yaml}"
local -r file="$1"
sed -i -e "s@{{ *container_runtime *}}@${CONTAINER_RUNTIME_NAME:-docker}@g" "${file}"
container_runtime_endpoint=${CONTAINER_RUNTIME_ENDPOINT:-"unix:///var/run/dockershim.sock"}

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

Make this variable local (and readonly too).

Do both substitutions using a single sed command:

  sed -i \
      -e "s@{{ *container_runtime *}}@${CONTAINER_RUNTIME_NAME:-docker}@g" \
      -e "s@{{ *container_runtime_endpoint *}}@${container_runtime_endpoint#unix://}@g" \
      "${file}"

Also remove the quotes around "unix://" as they do nothing... Removing them means you no longer have "nested quotes."

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done

@@ -2375,8 +2379,9 @@ EOF
# Starts an image-puller - used in test clusters.
function start-image-puller {
echo "Start image-puller"
cp "${KUBE_HOME}/kube-manifests/kubernetes/gci-trusty/e2e-image-puller.manifest" \
/etc/kubernetes/manifests/
e2e_image_puller_manifest="${KUBE_HOME}/kube-manifests/kubernetes/gci-trusty/e2e-image-puller.manifest"

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

Make this local -r too.

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done.


# Create crictl config file.
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: ${CONTAINER_RUNTIME_ENDPOINT:-"unix:///var/run/dockershim.sock"}

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

nitpick: Remove the quotes.

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done

sed -i -e "s@{{ *container_runtime *}}@${CONTAINER_RUNTIME_NAME:-docker}@g" "${configmap_yaml}"
local -r file="$1"
sed -i -e "s@{{ *container_runtime *}}@${CONTAINER_RUNTIME_NAME:-docker}@g" "${file}"
container_runtime_endpoint=${CONTAINER_RUNTIME_ENDPOINT:-"unix:///var/run/dockershim.sock"}

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

nitpick: Remove the quotes, this is enough:

local -r container_runtime_endpoint=${CONTAINER_RUNTIME_ENDPOINT:-unix:///var/run/dockershim.sock}

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done.

local -r initial_attempts=5
local attempt=1
local -r crictl="${KUBE_HOME}/bin/crictl"
local -r container_runtime="${CONTAINER_RUNTIME_NAME:-"docker"}"

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

nitpick: Remove the inner quotes. (You could remove all quotes here too, that would be fine.)

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

It seems that google convention is to always quote ${}.

Removed quote for docker.

function container_runtime_monitoring {
# Container runtime startup takes time. Make initial attempts before starting
# killing the container runtime.
local -r initial_attempts=5

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

max_attempts ?

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done.

local -r crictl="${KUBE_HOME}/bin/crictl"
local -r container_runtime="${CONTAINER_RUNTIME_NAME:-"docker"}"
until timeout timeout 60 "${crictl}" pods > /dev/null || (( attempt == initial_attempts ))
do

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

nitpick: do in the same line:

  until timeout 60 "${crictl}" pods >/dev/null || (( attempt == initial_attempts )) ; do

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done.

until timeout timeout 60 "${crictl}" pods > /dev/null || (( attempt == initial_attempts ))
do
echo "$attempt initial attempt \"${crictl} pods\"! Trying again in $attempt seconds..."
sleep $(( attempt++ ))

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

For correctness... quotes around the expression:

  sleep "$(( attempt++ ))"

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done

local attempt=1
local -r crictl="${KUBE_HOME}/bin/crictl"
local -r container_runtime="${CONTAINER_RUNTIME_NAME:-"docker"}"
until timeout timeout 60 "${crictl}" pods > /dev/null || (( attempt == initial_attempts ))

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

Personally, I think it would be best to move the check for reaching max attempts inside the loop, in which case you could actually print a message saying you're giving up before breaking out of the loop...

Yes, I think in fact that might be critical, otherwise if you reach max attempts you will just proceed with the code. Is that really correct?

Well, maybe fallthrough is what you want, but I think at least logging a message about it would be good, then it's clear what happened here.

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Yes, I think in fact that might be critical, otherwise if you reach max attempts you will just proceed with the code. Is that really correct?

Yeah, that is what we want. The logic here is just to make sure we give docker some time to startup, before we start killing it (similar with the sleep 120 below for kubelet).

I can move the attempt inside.

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done.

do
echo "$attempt initial attempt \"${crictl} pods\"! Trying again in $attempt seconds..."
sleep $(( attempt++ ))
done
while [ 1 ]; do

This comment has been minimized.

@filbranden

filbranden May 2, 2018

Member

Not really something you touched... But can you please change this to while : ; do ? Using [ 1 ] is just too ugly!

This comment has been minimized.

@Random-Liu

Random-Liu May 3, 2018

Author Member

Done. Changed to while true.

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 3, 2018

@filbranden Addressed comments. :)

@Random-Liu Random-Liu force-pushed the Random-Liu:install-and-use-crictl branch from 7899a9b to 5295af7 May 3, 2018

@filbranden
Copy link
Member

filbranden left a comment

Looks good, thanks for taking the time to address the comments!

@Random-Liu Random-Liu force-pushed the Random-Liu:install-and-use-crictl branch 2 times, most recently from b9a39f3 to bb144de May 3, 2018

Random-Liu added some commits May 2, 2018

Install and use crictl in gce kube-up.sh
Signed-off-by: Lantao Liu <lantaol@google.com>
Collect logs for health monitor services.
Signed-off-by: Lantao Liu <lantaol@google.com>

@Random-Liu Random-Liu force-pushed the Random-Liu:install-and-use-crictl branch from bb144de to c23d0be May 4, 2018

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 4, 2018

I added a commit to make health monitor still use docker ps for docker.

The reason is that:

  # We still need to use `docker ps` when container runtime is "docker". This is because
  # dockershim is still part of kubelet today. When kubelet is down, crictl pods
  # will also fail, and docker will be killed. This is undesirable especially when
  # docker live restore is disabled.
@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 4, 2018

/test pull-kubernetes-e2e-containerd-gce

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 4, 2018

/retest

@Random-Liu Random-Liu force-pushed the Random-Liu:install-and-use-crictl branch from c23d0be to 70087af May 4, 2018

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 4, 2018

/test pull-kubernetes-node-e2e-containerd

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 4, 2018

/test pull-kubernetes-bazel-test

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented May 4, 2018

/test containerd

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented May 4, 2018

@Random-Liu: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-containerd-gce 70087af link /test pull-kubernetes-e2e-containerd-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@luxas

This comment has been minimized.

Copy link
Member

luxas commented May 12, 2018

local -r file="$1"
local -r container_runtime_endpoint="${CONTAINER_RUNTIME_ENDPOINT:-unix:///var/run/dockershim.sock}"
sed -i \
-e "s@{{ *container_runtime *}}@${CONTAINER_RUNTIME_NAME:-docker}@g" \

This comment has been minimized.

@yujuhong

yujuhong May 14, 2018

Member

Maybe we should just set the default value for CONTAINER_RUNTIME_ENDPOINT and CONTAINER_RUNTIME_NAME in cluster/gce/util.sh, so that we don't have to populate this in multiple places.

This comment has been minimized.

@Random-Liu

Random-Liu May 15, 2018

Author Member

Offline discussed. :)

Some bootstrapping tools only use this script, and relies on the default.

if [[ "${CONTAINER_RUNTIME:-docker}" != "docker" ]]; then
healthcheck_command="${crictl} pods"
fi
until timeout 60 ${healthcheck_command} > /dev/null; do

This comment has been minimized.

@yujuhong

yujuhong May 14, 2018

Member

Move comment in line 28 to here since it describe this block of code

This comment has been minimized.

@Random-Liu

Random-Liu May 15, 2018

Author Member

Will do.

This comment has been minimized.

@Random-Liu

Random-Liu May 15, 2018

Author Member

Done.

break
fi
echo "$attempt initial attempt \"${healthcheck_command}\"! Trying again in $attempt seconds..."
sleep "$(( attempt++ ))"

This comment has been minimized.

@yujuhong

yujuhong May 14, 2018

Member

The max_attempts is 5, so the code will sleep maximum 1+2+3+4 = 10 seconds

  • Is 10s sufficient for startup?
  • Is it necessary to use in 1-second incremental periods?

This comment has been minimized.

@Random-Liu

Random-Liu May 15, 2018

Author Member

10s is sufficient for startup, but I'll change this to (( 2 ** attempt++ ))

This comment has been minimized.

@Random-Liu

Random-Liu May 15, 2018

Author Member

Done

Still use `docker ps` for docker health monitoring.
Signed-off-by: Lantao Liu <lantaol@google.com>

@Random-Liu Random-Liu force-pushed the Random-Liu:install-and-use-crictl branch from 70087af to f952b09 May 15, 2018

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented May 15, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 15, 2018

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented May 15, 2018

/lgtm

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented May 15, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, Random-Liu, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 15, 2018

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@Random-Liu @dchen1107 @yujuhong

Pull Request Labels
  • sig/cluster-lifecycle sig/node: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/feature: New functionality.
Help
@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 16, 2018

Automatic merge from submit-queue (batch tested with PRs 63167, 63357). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 7b8bb6e into kubernetes:master May 16, 2018

16 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation Random-Liu authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Job succeeded.
Details
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@Random-Liu Random-Liu deleted the Random-Liu:install-and-use-crictl branch May 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.