Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimze detecting if calico exists #9873

Merged

Conversation

hangscer8
Copy link
Member

@hangscer8 hangscer8 commented Mar 9, 2023

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

/kind bug

What this PR does / why we need it:

  • optimize detect the existence of calico, because sometimes the file calicoctl.sh exists ,but not the related crds, such as clusterinformations.
  • Add check for ClusterInformation of calico.
TASK [network_plugin/calico : Get current calico version] **********************
fatal: [node1]: FAILED! => {"changed": false,"stdout": "v3.24.5", "stdout_lines": ["v3.24.5"] , "cmd": "set -o pipefail && /usr/local/bin/calicoctl.sh version | grep 'Client Version:' | awk '{ print $3}'", "delta": "0:00:02.873448", "end": "2023-03-09 17:19:44.681338", "msg": "non-zero return code", "rc": 1, "start": "2023-03-09 17:19:41.807890", "stderr": "Unable to retrieve Cluster Version or Type: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org \"default\" not found", "stderr_lines": ["Unable to retrieve Cluster Version or Type: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org \"default\" not found"]}

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

[Calico] Optimize the detection of calico existence 

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 9, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 9, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @hangscer8. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 9, 2023
@hangscer8 hangscer8 changed the title fix get current calico version fix getting current calico version Mar 9, 2023
@floryut
Copy link
Member

floryut commented Mar 9, 2023

@hangscer8 Please check #9861

This part has changed a lot recently ;)

@hangscer8 hangscer8 changed the title fix getting current calico version Ignore error while getting current calico version Mar 10, 2023
@cyclinder
Copy link
Contributor

check version fails because calico's crd(clusterinformations.crd.projectcalico.org) is not yet installed, which causes calicoctl version to fail, in this case, the exit code should not be 1, we should ignore this error indeed.

But is there a better way to resolve it? There are some thoughts:

  • ignore this error
  • add a new task to check if stdout matches the version of the regular expression:
grep  -Eo 'v[0-9]+.[0-9]+.[0-9]' <<< stdout

/cc @zhan9san Any thoughts?

@cyclinder
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 10, 2023
@zhan9san
Copy link
Contributor

@cyclinder

Thanks for making me involved.

I don't think ignoring error is a good option. Instead I prefer to catch the expected error.

@zhan9san
Copy link
Contributor

IMO, first time we introduce the calico check.yml is to work around some compatibility issues when we perform the calico upgrade.

Can we check whether calico's crd(clusterinformations.crd.projectcalico.org) is installed like below?

- name: Check if calico exists
stat:
path: "{{ bin_dir }}/calicoctl.sh"
register: calico_exists
run_once: True
delegate_to: "{{ groups['kube_control_plane'][0] }}"

@cyclinder
Copy link
Contributor

I'm curious why the crds doesn't exist when calico is upgraded. check_version is only happen in calico upgrade?

@zhan9san
Copy link
Contributor

I'm curious why the crds doesn't exist when calico is upgraded.

Yeah, it's weird that crds doesn't exist when calico is upgraded.

check_version is only happen in calico upgrade?

It always run in cluster.yml -> role: kubernetes/preinstall -> Run calico checks

Except the fresh install, as if it is a fresh install, calicoctl.sh would not exist, and Check that current calico version is enough for upgrade block would not run as its condition(when: calico_exists.stat.exists) is false.

- { role: kubernetes/preinstall, tags: preinstall }

- name: Run calico checks
include_role:
name: network_plugin/calico
tasks_from: check
when:
- kube_network_plugin == 'calico'
- not ignore_assert_errors

I'll try to reproduce this issue.

@hangscer8
Copy link
Member Author

I'm curious why the crds doesn't exist when calico is upgraded.

Yeah, it's weird that crds doesn't exist when calico is upgraded.

check_version is only happen in calico upgrade?

It always run in cluster.yml -> role: kubernetes/preinstall -> Run calico checks

Except the fresh install, as if it is a fresh install, calicoctl.sh would not exist, and Check that current calico version is enough for upgrade block would not run as its condition(when: calico_exists.stat.exists) is false.

- { role: kubernetes/preinstall, tags: preinstall }

- name: Run calico checks
include_role:
name: network_plugin/calico
tasks_from: check
when:
- kube_network_plugin == 'calico'
- not ignore_assert_errors

I'll try to reproduce this issue.

It is possible that i sometimes kill the e2e ci job which runs cluster.yml playbook on the target ci node to install k8s cluster when the job is running, it causes the half installation of calico.

@zhan9san
Copy link
Contributor

zhan9san commented Mar 10, 2023

It is possible that i sometimes kill the e2e ci job which runs cluster.yml playbook on the target ci node to install k8s cluster when the job is running, it causes the half installation of calico.

So the issue becomes how to detect the calico is installed correctly.

How about checking both calicoctl.sh and crd?

@hangscer8
Copy link
Member Author

hangscer8 commented Mar 10, 2023

It is possible that i sometimes kill the e2e ci job which runs cluster.yml playbook on the target ci node to install k8s cluster when the job is running, it causes the half installation of calico.

So the issue becomes how to detect the calico is installed correctly.

How about checking both calicoctl.sh and crd?

I think it is a good idea. I will fix that.

@hangscer8 hangscer8 changed the title Ignore error while getting current calico version [WIP] Ignore error while getting current calico version Mar 10, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2023
@hangscer8 hangscer8 changed the title [WIP] Ignore error while getting current calico version [WIP] FIX getting current calico version Mar 10, 2023
@hangscer8 hangscer8 force-pushed the fix_fet_current_calico_version branch from 7375217 to 5efa8a4 Compare March 10, 2023 07:21
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 10, 2023
@hangscer8 hangscer8 changed the title [WIP] FIX getting current calico version [WIP] Optimze detecting if calico exists already Mar 10, 2023
@hangscer8 hangscer8 changed the title [WIP] Optimze detecting if calico exists already [WIP] Optimze detecting if calico exists Mar 10, 2023
@hangscer8 hangscer8 changed the title [WIP] Optimze detecting if calico exists Optimze detecting if calico exists Mar 10, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2023
@hangscer8 hangscer8 force-pushed the fix_fet_current_calico_version branch 2 times, most recently from 09beeb0 to e55d1a4 Compare March 10, 2023 07:37
@hangscer8 hangscer8 force-pushed the fix_fet_current_calico_version branch 2 times, most recently from 3c66930 to 6539fe3 Compare March 10, 2023 08:01
Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
@hangscer8 hangscer8 force-pushed the fix_fet_current_calico_version branch from 6539fe3 to 0fa8412 Compare March 10, 2023 08:11
@cyclinder
Copy link
Contributor

Seems lgtm

/cc @floryut @zhan9san, Can you take a final look?

@zhan9san
Copy link
Contributor

Thanks @hangscer8

LGTM

@cyclinder
Copy link
Contributor

Thanks!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2023
Copy link
Member

@floryut floryut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hangscer8 Thank you
And thanks guys for the discussion and review
/kind Network

@k8s-ci-robot k8s-ci-robot added the kind/network Network section in the release note label Mar 10, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: floryut, hangscer8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2023
@floryut floryut removed the kind/bug Categorizes issue or PR as related to a bug. label Mar 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit 99115ad into kubernetes-sigs:master Mar 10, 2023
nolimitkun pushed a commit to nolimitkun/kubespray that referenced this pull request Mar 19, 2023
Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
@yankay yankay mentioned this pull request May 15, 2023
pedro-peter pushed a commit to pedro-peter/kubespray that referenced this pull request May 8, 2024
Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/network Network section in the release note lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants