Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get role working on CentOS 7 (and Red Hat, by extension) #3

Closed
geerlingguy opened this issue May 9, 2018 · 5 comments
Closed

Get role working on CentOS 7 (and Red Hat, by extension) #3

geerlingguy opened this issue May 9, 2018 · 5 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented May 9, 2018

Currently, I have almost everything working... except for it gets stuck on the kubeadm init task now (it wasn't getting stuck earlier today... so some fix I applied for something else must be blocking init from completing):

TASK [role_under_test : Initialize the Kubernetes master with kubeadm init.] ***
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

Failed build: https://travis-ci.org/geerlingguy/ansible-role-kubernetes/jobs/377042422

Probably something blocking/breaking kubelet startup (this was happening for a variety of reasons earlier in this role's as-yet short history).

@geerlingguy
Copy link
Owner Author

geerlingguy commented May 9, 2018

Yep, some output from journalctl -eu kubelet:

May 09 22:23:23 0b9252601962 kubelet[1182]: E0509 22:23:23.018605    1182 eviction_manager.go:246] eviction manager: failed to get get summary stats: failed to get node info: node "0b9252601962" not found
May 09 22:23:23 0b9252601962 kubelet[1182]: W0509 22:23:23.132950    1182 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
May 09 22:23:25 0b9252601962 kubelet[1182]: E0509 22:23:25.689124    1182 event.go:209] Unable to write event: 'Post https://172.17.0.2:6443/api/v1/namespaces/default/events: dial tcp 172.17.0.2:6443: getsockopt: connection refused' (may retry after sleeping)
May 09 22:23:25 0b9252601962 kubelet[1182]: E0509 22:23:25.689171    1182 event.go:144] Unable to write event '&v1.Event {TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"0b9252601962.152d19dbdd88901a", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64) (nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences []v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"0b9252601962", UID:"0b9252601962", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientDisk", Message:"Node 0b9252601962 status is now: NodeHasSufficientDisk", Source:v1.EventSource{Component:"kubelet", Host:"0b9252601962"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbeb4fac632e2a01a, ext:295113266, loc:(*time.Location)(0x5b9f020)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbeb4fac632e2a01a, ext:295113266, loc:(*time.Location)(0x5b9f020)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!)
May 09 22:23:25 0b9252601962 kubelet[1182]: E0509 22:23:25.689370    1182 event.go:209] Unable to write event: 'Post https://172.17.0.2:6443/api/v1/namespaces/default/events: dial tcp 172.17.0.2:6443: getsockopt: connection refused' (may retry after sleeping)
May 09 22:23:26 0b9252601962 kubelet[1182]: E0509 22:23:26.429219    1182 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://172.17.0.2:6443/api/v1/pods?fieldSelector=spec.nodeName%3D0b9252601962&limit=500&resourceVersion=0: dial tcp 172.17.0.2:6443: getsockopt: connection refused
May 09 22:23:26 0b9252601962 kubelet[1182]: E0509 22:23:26.448023    1182 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://172.17.0.2:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.0.2:6443: getsockopt: connection refused
May 09 22:23:26 0b9252601962 kubelet[1182]: E0509 22:23:26.450937    1182 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://172.17.0.2:6443/api/v1/nodes?fieldSelector=metadata.name%3D0b9252601962&limit=500&resourceVersion=0: dial tcp 172.17.0.2:6443: getsockopt: connection refused

@geerlingguy
Copy link
Owner Author

Well, digging deeper, at least locally, the problem was due to a docker-in-docker aufs issue—you can't store aufs volumes inside the host aufs volume directory, so you have to make sure /var/lib/docker is a volume in the containing Docker container... see warning in http://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/

@geerlingguy
Copy link
Owner Author

Everything worked great after I added --volume=/var/lib/docker to my docker run command. We'll see if Travis CI is happy as well.

@geerlingguy
Copy link
Owner Author

Well... it works locally :P — Travis CI seems to hang.

@geerlingguy
Copy link
Owner Author

Works now, it looks like it may have been related to the daemon needing a reload after changing the kubelet options.

geerlingguy added a commit that referenced this issue Jun 1, 2018
vulturm pushed a commit to vulturm/ansible-role-kubernetes that referenced this issue Mar 23, 2020
=============================================
This represents the squashed history as it is not too relevant for us..

Below are all the commit messages:
=============================================

Change a line to trigger a Travis CI build.

A few small fixes for automated test builds.

More updates to make things work better in various situations.

Fix idempotence for web ui enablement.

Fix dashboard UI service.

Fix kube utils installation not working on Debian.

Fix install on CentOS 7.

Fix idempotence for Flannel networking task.

Really fix idempotence for Flannel tasks, and get CentOS mostly working.

Add more variables and docs.

Space in the defaults file. [ci skip]

Issue geerlingguy#3: Allow failures on CentOS 7 Travis CI build for now.

Fixes geerlingguy#2: Make role work with nodes joining master.

Spellcheck.

Reload kubelet unit file if config is changed.

Fixes geerlingguy#3: CentOS builds now passing.

Change order when applying flannel templates.

Issue geerlingguy#5: Add more configuration ability to default Flannel network manifests.

Tick kubernetes stable version up from 1.10 to 1.11

Fixes geerlingguy#10: Set kubernetes_join_command more reliably.

Fixes geerlingguy#15: Add kubelet extra args to the correct file for 1.11 and beyond.

Fixes geerlingguy#16: CentOS 7 configuration of KUBELET_EXTRA_ARGS was broken.

Update master-setup.yml

Add option for additional kubeadm init options

Update main.yml

kubeadm_init_opts default value

Changes for the comments under PR geerlingguy#19

Switch tests to use Molecule.

Issue geerlingguy#17: Attempt to fix installation on CentOS.

Issue geerlingguy#18: Attempt to fix version pinning issues on RedHat and Debian.

Fix boolean on Debian setup, add more tests.

Fixing lint issues

incredibly sloppy day.

fix typo kuberenetes

PR geerlingguy#24 follow-up: Use verbosity instead of debug variable for debug info.

Update tests for optimum efficiency.

Fix YAML error in molecule config.

Fixes failing Ubuntu 18.04 test.

Fix some new ansible-lint issues.

Issue geerlingguy#33: Set default Kubernetes version to 1.13.1.

Bump Kubernetes RHEL package to 1.13.3.

Fix ansible-lint issue - ignore rule 306.

Update kubelet-setup.yml

geerlingguy#42

Use same options for all tests and default to Ansible IP correctly.

Remove unused tests.

Fixes geerlingguy#54: Update to Kubernetes 1.15.

Fixes geerlingguy#55: Support and test Debian 10 Buster.

Fix typo referenced in geerlingguy#49

Update main.yml

Create FUNDING.yml

YAML syntax fix.

Add kubernetes_join_command_extra_opts variable.

calico cni choice

PR geerlingguy#53 follow-up: Requested changes for simplicity.

PR geerlingguy#53 follow-up: Add test for calico networking.

PR geerlingguy#53 follow-up: Remove extra conditional.

Bump to Kubernetes 1.16.

Default to calico 3.10 manifest.

PR geerlingguy#53 follow-up: Remove extra unneccessary loop.

Fix README formatting.

Add a test for CentOS 8.

Update molecule configuration to work with 3.0.

Update molecule configuration to work with 3.0.

Update molecule configuration to work with 3.0.

Make sure molecule lint script has set -e option.

Add probot/stale configuration to repository for stale issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant