Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netchecker: Error occurred while checking the agents. Details: unknown (get agents.network-checker.ext netchecker-agent-xxxxx) #3281

Closed
TheBurnDoc opened this issue Sep 10, 2018 · 4 comments · Fixed by #3705
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@TheBurnDoc
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Environment:

  • Cloud provider or hardware configuration:
    Bare metal VMs

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
    Ubuntu 18.04

  • Version of Ansible (ansible --version):
    2.6.3

Kubespray version (commit) (git rev-parse --short HEAD):
b79dd60

Network plugin used:
calico

Copy of your inventory file:

[all]
kube00 ansible_host=kube00.REDACTED ip=REDACTED etcd_member_name=etcd1
kube01 ansible_host=kube01.REDACTED ip=REDACTED etcd_member_name=etcd2
kube02 ansible_host=kube02.REDACTED ip=REDACTED etcd_member_name=etcd3
kube03 ansible_host=kube03.REDACTED ip=REDACTED
kube04 ansible_host=kube04.REDACTED ip=REDACTED

[kube-master]
kube00
kube01

[kube-node]
kube02
kube03
kube04

[etcd]
kube00
kube01
kube02

[k8s-cluster:children]
kube-node
kube-master

Command used to invoke ansible:
pipenv run ansible-playbook --become --inventory inventories/kubespray/hosts.ini kubespray/cluster.yml

Output of ansible run:

Anything else do we need to know:

After a "fresh install" of netchecker, curl http://localhost:31081/api/v1/connectivity_check produces the following error:

Error occurred while checking the agents. Details: unknown (get agents.network-checker.ext netchecker-agent-xxxxx)

The netchecker-server log has this repeating:

E0910 17:15:25.308402       1 storer_k8s.go:110] unknown (get agents.network-checker.ext netchecker-agent-hostnet-2b4hm)
I0910 17:15:25.310800       1 storer_k8s.go:129] Updated agent netchecker-agent-hostnet-2b4hm unknown (put agents.network-checker.ext netchecker-agent-hostnet-2b4hm)
E0910 17:15:25.310846       1 storer_k8s.go:133] unknown (put agents.network-checker.ext netchecker-agent-hostnet-2b4hm)
[negroni] 2018-09-10T17:15:25Z | 0 | 	 5.088171ms | netchecker-service:8081 | POST /api/v1/agents/netchecker-agent-hostnet-2b4hm 
[negroni] 2018-09-10T17:15:25Z | 0 | 	 20.881µs | netchecker-service:8081 | GET /api/v1/ping 

Is this a netchecker bug or a kubespray buy, or a problem with my environment/config?

@mirwan
Copy link
Contributor

mirwan commented Sep 10, 2018

I'm facing the same issue since the switch from l23network/k8s-netchecker to Mirantis/k8s-netchecker-* images.

@TheBurnDoc
Copy link
Author

@mirwan I suspect a netchecker bug, I've submitted an issue on their repo here

@Atoms Atoms added the kind/bug Categorizes issue or PR as related to a bug. label Sep 12, 2018
@pahaz
Copy link

pahaz commented Sep 29, 2018

My workaround

Change variables:

# netchecker
agent_img: "quay.io/l23network/k8s-netchecker-agent:v1.0"
server_img: "quay.io/l23network/k8s-netchecker-server:v1.0"

Apply changes:

ansible-playbook -i inventory/mycluster/hosts.ini -bvv cluster.yml --tags netchecker

Test it (ssh root@node1):

root@node1:~# curl http://localhost:31081/api/v1/connectivity_check
{"Message":"All 6 pods successfully reported back to the server","Absent":null,"Outdated":null}

@jjo
Copy link
Contributor

jjo commented Oct 3, 2018

Also consistently getting this on VMs (-ish, LXCs actually) deployment,
perfectly fixed by @pahaz workaround, thanks! ->

ansible-playbook -i inventory/mycluster/hosts.ini cluster.yml --tags netchecker \
--extra-vars '{ 
  deploy_netchecker: True,
  netcheck_agent_img_repo: "quay.io/l23network/k8s-netchecker-agent",
  netcheck_server_img_repo: "quay.io/l23network/k8s-netchecker-server",
  netcheck_agent_tag: "v1.0",
  netcheck_server_tag: "v1.0"
}'

AlexeyKasatkin added a commit to AlexeyKasatkin/kargo that referenced this issue May 8, 2019
…lusterRole

So that it could access the resource after it is created.

Corresponding issues:
Mirantis/k8s-netchecker-server#125
kubernetes-sigs#3281
k8s-ci-robot pushed a commit that referenced this issue May 9, 2019
* Add sha256 hashes for calicoctl v3.6.1

Hashes are added to calicoctl_binary_checksums for both adm and arm platforms.

* Add rules for "network-checker.ext" resource to "netchecker-server" ClusterRole

So that it could access the resource after it is created.

Corresponding issues:
Mirantis/k8s-netchecker-server#125
#3281
unbreakab1e pushed a commit to joomcode/kubespray that referenced this issue Jun 5, 2019
* Add sha256 hashes for calicoctl v3.6.1

Hashes are added to calicoctl_binary_checksums for both adm and arm platforms.

* Add rules for "network-checker.ext" resource to "netchecker-server" ClusterRole

So that it could access the resource after it is created.

Corresponding issues:
Mirantis/k8s-netchecker-server#125
kubernetes-sigs#3281
LuckySB pushed a commit to southbridgeio/kubespray that referenced this issue Aug 3, 2019
* Add sha256 hashes for calicoctl v3.6.1

Hashes are added to calicoctl_binary_checksums for both adm and arm platforms.

* Add rules for "network-checker.ext" resource to "netchecker-server" ClusterRole

So that it could access the resource after it is created.

Corresponding issues:
Mirantis/k8s-netchecker-server#125
kubernetes-sigs#3281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants