New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment stuck when allow-privileged is set to true #262

Closed
ktsakalozos opened this Issue Apr 21, 2017 · 17 comments

Comments

Projects
None yet
8 participants
@ktsakalozos
Member

ktsakalozos commented Apr 21, 2017

Here is how to reproduce. Deploy with conjure-up and set allow-privileged to true in kubernetes-worker and kubernetes-master.

The status log of the kubernetes-master shows that it is stuck on the waiting state: http://paste.ubuntu.com/24425472/ and here is the juju status: http://paste.ubuntu.com/24425479/

Looking at the logs of kubernetes-master (full log: http://paste.ubuntu.com/24425487/) you find this:

  2017-04-21 08:11:38 INFO update-status ^M[|] Run configure hook of "cdk-addons" snapESC[K^MESC[KThe Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.152.183.10": provided IP is already allocated
2017-04-21 08:11:38 INFO update-status Traceback (most recent call last):
2017-04-21 08:11:38 INFO update-status   File "/snap/cdk-addons/10/apply", line 85, in <module>
2017-04-21 08:11:38 INFO update-status     main()
2017-04-21 08:11:38 INFO update-status   File "/snap/cdk-addons/10/apply", line 13, in main
2017-04-21 08:11:38 INFO update-status     apply_addons()
2017-04-21 08:11:38 INFO update-status   File "/snap/cdk-addons/10/apply", line 56, in apply_addons
2017-04-21 08:11:38 INFO update-status     "--force"
2017-04-21 08:11:38 INFO update-status   File "/snap/cdk-addons/10/apply", line 65, in kubectl
2017-04-21 08:11:38 INFO update-status     return subprocess.check_output(cmd)
2017-04-21 08:11:38 INFO update-status   File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2017-04-21 08:11:38 INFO update-status     **kwargs).stdout
2017-04-21 08:11:38 INFO update-status   File "/usr/lib/python3.5/subprocess.py", line 708, in run
2017-04-21 08:11:38 INFO update-status     output=stdout, stderr=stderr)
2017-04-21 08:11:38 INFO update-status subprocess.CalledProcessError: Command '['/snap/cdk-addons/10/kubectl', 'apply', '-f', '/root/snap/cdk-addons/10/addons', '--recursive', '--namespace=kube-system', '-l', 'kubernetes.io/cluster-service=true', '--prune=true', '--force']' returned non-zero exit status 1

And here is the error we hit with the kubectl:

root@ip-172-31-3-173:/home/ubuntu# '/snap/cdk-addons/10/kubectl' 'apply' '-f' '/root/snap/cdk-addons/10/addons' '--recursive' '--namespace=kube-system' '-l' 'kubernetes.io/cluster-service=true' '--prune=true' '--force'
deployment "kubernetes-dashboard" configured
service "kubernetes-dashboard" configured
service "monitoring-grafana" configured
deployment "heapster-v1.3.0" configured
service "heapster" configured
replicationcontroller "monitoring-influxdb-grafana-v4" configured
service "monitoring-influxdb" configured
deployment "kube-dns" configured
serviceaccount "kube-dns" configured
The Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.152.183.10": provided IP is already allocated
@ktsakalozos

This comment has been minimized.

Show comment
Hide comment
@ktsakalozos

ktsakalozos Apr 21, 2017

Member

Reported for LXD here: kubernetes/kubernetes#40386 not quite the same but similar

Member

ktsakalozos commented Apr 21, 2017

Reported for LXD here: kubernetes/kubernetes#40386 not quite the same but similar

@Cynerva

This comment has been minimized.

Show comment
Hide comment
@Cynerva

Cynerva Apr 21, 2017

Contributor

I do think this is a duplicate of kubernetes/kubernetes#40386 - there's nothing special about LXD there, and nothing special about allow-privileged=true here. It's a failure that happens randomly, with a low chance, on any deployment.

This happens because the kube-dns service has a hard-coded *.*.*.10 IP address. If other services are created before kube-dns, it's possible for them to claim that IP, which eventually causes a collision.

IMO the best solution would be to move nginx-ingress-controller and default-http-backend into the cdk-addons snap, and make sure everything is deployed in the right order there.

Contributor

Cynerva commented Apr 21, 2017

I do think this is a duplicate of kubernetes/kubernetes#40386 - there's nothing special about LXD there, and nothing special about allow-privileged=true here. It's a failure that happens randomly, with a low chance, on any deployment.

This happens because the kube-dns service has a hard-coded *.*.*.10 IP address. If other services are created before kube-dns, it's possible for them to claim that IP, which eventually causes a collision.

IMO the best solution would be to move nginx-ingress-controller and default-http-backend into the cdk-addons snap, and make sure everything is deployed in the right order there.

@lazypower

This comment has been minimized.

Show comment
Hide comment
@lazypower

lazypower Apr 21, 2017

Contributor

+1 @Cynerva

Please move forward with that approach so we can avoid collisions. It's unfortunate that this is a thing. How would we allow ingress to still be toggled? I assume via configuring the cdk-addon snap and that would trigger the workload to be executed/removed appropriately?

Contributor

lazypower commented Apr 21, 2017

+1 @Cynerva

Please move forward with that approach so we can avoid collisions. It's unfortunate that this is a thing. How would we allow ingress to still be toggled? I assume via configuring the cdk-addon snap and that would trigger the workload to be executed/removed appropriately?

@Cynerva

This comment has been minimized.

Show comment
Hide comment
@Cynerva

Cynerva Apr 21, 2017

Contributor

I believe @ktsakalozos is looking into this actually.

How would we allow ingress to still be toggled? I assume via configuring the cdk-addon snap and that would trigger the workload to be executed/removed appropriately?

Yep. We do that with the dashboard addons already.

We also discussed a potential alternative where we remove the hard-coded IP from the kube-dns service, and configure kubelet to use whatever IP it gets assigned.

I used to think this would cause a chicken-and-egg problem (kubelet needs kube-dns IP, kube-dns pod needs kubelet) but actually, I don't think there's anything stopping us from creating all the addons before kubelet is running. Can't deploy pods, but that should resolve itself once kubelet is brought up, I think.

Contributor

Cynerva commented Apr 21, 2017

I believe @ktsakalozos is looking into this actually.

How would we allow ingress to still be toggled? I assume via configuring the cdk-addon snap and that would trigger the workload to be executed/removed appropriately?

Yep. We do that with the dashboard addons already.

We also discussed a potential alternative where we remove the hard-coded IP from the kube-dns service, and configure kubelet to use whatever IP it gets assigned.

I used to think this would cause a chicken-and-egg problem (kubelet needs kube-dns IP, kube-dns pod needs kubelet) but actually, I don't think there's anything stopping us from creating all the addons before kubelet is running. Can't deploy pods, but that should resolve itself once kubelet is brought up, I think.

@lazypower

This comment has been minimized.

Show comment
Hide comment
@lazypower

lazypower Apr 21, 2017

Contributor

+1 I think we should pilot this and see if it causes issues. That or look at what other solutions are doing in regard to this, as its not a CDK only problem. Thanks for triaging @Cynerva

Contributor

lazypower commented Apr 21, 2017

+1 I think we should pilot this and see if it causes issues. That or look at what other solutions are doing in regard to this, as its not a CDK only problem. Thanks for triaging @Cynerva

@ktsakalozos

This comment has been minimized.

Show comment
Hide comment
@ktsakalozos

ktsakalozos Apr 21, 2017

Member

Yeap, first step is to find a way to consistently reproduce this error, because it seems it is too rare. We were thinking about reducing the pool of IPs kubernetes has available so that collisions happen more often.

Member

ktsakalozos commented Apr 21, 2017

Yeap, first step is to find a way to consistently reproduce this error, because it seems it is too rare. We were thinking about reducing the pool of IPs kubernetes has available so that collisions happen more often.

@ktsakalozos

This comment has been minimized.

Show comment
Hide comment
@ktsakalozos
Member

ktsakalozos commented Apr 26, 2017

These PRs should address the issue:
kubernetes/kubernetes#44945
juju-solutions/cdk-addons#18

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Apr 26, 2017

Merge pull request #44945 from ktsakalozos/bug/dns-fix
Automatic merge from submit-queue

Send dns details only after cdk-addons are configured

**What this PR does / why we need it**: This is a bugfix on the deployment of Kubernetes via Juju. See issue below.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #40386 and
juju-solutions/bundle-canonical-kubernetes#262

**Special notes for your reviewer**:

**Release note**:

```
Fix KubeDNS issue in Juju deployments. 
```
@lazypower

This comment has been minimized.

Show comment
Hide comment
@lazypower

lazypower Jun 21, 2017

Contributor

prereqs have been merged. Closing for now. If the issue is not resolved please re-open and comment here.

Contributor

lazypower commented Jun 21, 2017

prereqs have been merged. Closing for now. If the issue is not resolved please re-open and comment here.

@lazypower lazypower closed this Jun 21, 2017

@mach-kernel

This comment has been minimized.

Show comment
Hide comment
@mach-kernel

mach-kernel Aug 5, 2017

I followed the documentation to deploy this and ran into a similar class of issue... It works fine until you actually restart the bare metal box hosting the VM. Power was not lost, graceful shutdown, etc, yields this:

kubernetes-master/0*  waiting   idle   0        10.52.203.16    6443/tcp        Waiting to retry addon deployment
  flannel/0           waiting   idle            10.52.203.16                    Waiting for Flannel
kubernetes-worker/0*  waiting   idle   1        10.52.203.242   80/tcp,443/tcp  Waiting for kubelet to start.
  flannel/1*          waiting   idle            10.52.203.242                   Waiting for Flannel

OK -- no problem. Let's SSH into the master node and investigate.

unit-kubernetes-master-0.log
2017-08-05 16:04:23 INFO collect-metrics The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
2017-08-05 16:04:24 INFO config-changed Traceback (most recent call last):
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 93, in <module>
2017-08-05 16:04:24 INFO config-changed     main()
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 12, in main
2017-08-05 16:04:24 INFO config-changed     render_templates()
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 23, in render_templates
2017-08-05 16:04:24 INFO config-changed     "num_nodes": get_node_count()
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 77, in get_node_count
2017-08-05 16:04:24 INFO config-changed     output = kubectl("get", "nodes", "-o", "name")
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 73, in kubectl
2017-08-05 16:04:24 INFO config-changed     return subprocess.check_output(cmd)
2017-08-05 16:04:24 INFO config-changed   File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2017-08-05 16:04:24 INFO config-changed     **kwargs).stdout
2017-08-05 16:04:24 INFO config-changed   File "/usr/lib/python3.5/subprocess.py", line 708, in run
2017-08-05 16:04:24 INFO config-changed     output=stdout, stderr=stderr)
2017-08-05 16:04:24 INFO config-changed subprocess.CalledProcessError: Command '['/snap/cdk-addons/71/kubectl', 'get', 'nodes', '-o', 'name']' returned non-zero exit status 1

Interesting. kubectl also tells us to go away, so this error message makes sense. The documentation recommends that we scp ~/config from kubernetes-master/0, so what if we do kubectl --config=~/config? That seems to get us what we want:

Command "clusterinfo" is deprecated, use "cluster-info" instead
Kubernetes master is running at http://localhost:8080

Taking a look at that apply script, two functions seem important:

def get_snap_config(name, required=True):
    path = os.path.join(os.environ["SNAP_DATA"], "config", name)
    with open(path) as f:
        value = f.read().rstrip()
    if not value and required:
        raise MissingSnapConfig("%s is required" % name)
    return value

def kubectl(*args):
    cmd = [os.path.join(os.environ["SNAP"], "kubectl")]
    kubeconfig = get_snap_config("kubeconfig", required=False)
    if kubeconfig:
        cmd += ["--kubeconfig", kubeconfig]
    cmd += list(args)
    return subprocess.check_output(cmd)

MissingSnapConfig isn't being raised, so I assume that get_snap_config is returning what it should be at runtime...but it must be a different/incorrect config file? How should I go about fixing this without making a mess?

mach-kernel commented Aug 5, 2017

I followed the documentation to deploy this and ran into a similar class of issue... It works fine until you actually restart the bare metal box hosting the VM. Power was not lost, graceful shutdown, etc, yields this:

kubernetes-master/0*  waiting   idle   0        10.52.203.16    6443/tcp        Waiting to retry addon deployment
  flannel/0           waiting   idle            10.52.203.16                    Waiting for Flannel
kubernetes-worker/0*  waiting   idle   1        10.52.203.242   80/tcp,443/tcp  Waiting for kubelet to start.
  flannel/1*          waiting   idle            10.52.203.242                   Waiting for Flannel

OK -- no problem. Let's SSH into the master node and investigate.

unit-kubernetes-master-0.log
2017-08-05 16:04:23 INFO collect-metrics The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
2017-08-05 16:04:24 INFO config-changed Traceback (most recent call last):
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 93, in <module>
2017-08-05 16:04:24 INFO config-changed     main()
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 12, in main
2017-08-05 16:04:24 INFO config-changed     render_templates()
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 23, in render_templates
2017-08-05 16:04:24 INFO config-changed     "num_nodes": get_node_count()
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 77, in get_node_count
2017-08-05 16:04:24 INFO config-changed     output = kubectl("get", "nodes", "-o", "name")
2017-08-05 16:04:24 INFO config-changed   File "/snap/cdk-addons/71/apply", line 73, in kubectl
2017-08-05 16:04:24 INFO config-changed     return subprocess.check_output(cmd)
2017-08-05 16:04:24 INFO config-changed   File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2017-08-05 16:04:24 INFO config-changed     **kwargs).stdout
2017-08-05 16:04:24 INFO config-changed   File "/usr/lib/python3.5/subprocess.py", line 708, in run
2017-08-05 16:04:24 INFO config-changed     output=stdout, stderr=stderr)
2017-08-05 16:04:24 INFO config-changed subprocess.CalledProcessError: Command '['/snap/cdk-addons/71/kubectl', 'get', 'nodes', '-o', 'name']' returned non-zero exit status 1

Interesting. kubectl also tells us to go away, so this error message makes sense. The documentation recommends that we scp ~/config from kubernetes-master/0, so what if we do kubectl --config=~/config? That seems to get us what we want:

Command "clusterinfo" is deprecated, use "cluster-info" instead
Kubernetes master is running at http://localhost:8080

Taking a look at that apply script, two functions seem important:

def get_snap_config(name, required=True):
    path = os.path.join(os.environ["SNAP_DATA"], "config", name)
    with open(path) as f:
        value = f.read().rstrip()
    if not value and required:
        raise MissingSnapConfig("%s is required" % name)
    return value

def kubectl(*args):
    cmd = [os.path.join(os.environ["SNAP"], "kubectl")]
    kubeconfig = get_snap_config("kubeconfig", required=False)
    if kubeconfig:
        cmd += ["--kubeconfig", kubeconfig]
    cmd += list(args)
    return subprocess.check_output(cmd)

MissingSnapConfig isn't being raised, so I assume that get_snap_config is returning what it should be at runtime...but it must be a different/incorrect config file? How should I go about fixing this without making a mess?

@ktsakalozos

This comment has been minimized.

Show comment
Hide comment
@ktsakalozos

ktsakalozos Aug 7, 2017

Member

Hi @mach-kernel you mention that you have a bare-metal hosting a VM. Could you describe your setup in more detail. Is it possible the VMs hosted are lxd containers? I see you posted your question here #357 as well.

Would you be able to run an agent https://github.com/juju-solutions/cdk-field-agent to collect the state and logs of your infrastructure?

Member

ktsakalozos commented Aug 7, 2017

Hi @mach-kernel you mention that you have a bare-metal hosting a VM. Could you describe your setup in more detail. Is it possible the VMs hosted are lxd containers? I see you posted your question here #357 as well.

Would you be able to run an agent https://github.com/juju-solutions/cdk-field-agent to collect the state and logs of your infrastructure?

@mach-kernel

This comment has been minimized.

Show comment
Hide comment
@mach-kernel

mach-kernel Aug 7, 2017

@ktsakalozos I have a VMDK snapshot of that image (I'm running "bare-metal" on an ESXi host) too if needed. I'll spin it up and get you logs by this evening. lxd related discussion in the other issue.

Thanks for the quick response!

mach-kernel commented Aug 7, 2017

@ktsakalozos I have a VMDK snapshot of that image (I'm running "bare-metal" on an ESXi host) too if needed. I'll spin it up and get you logs by this evening. lxd related discussion in the other issue.

Thanks for the quick response!

@gstanden

This comment has been minimized.

Show comment
Hide comment
@gstanden

gstanden Sep 2, 2017

I may be having a similar problem. Can anyone help with a fix for this? Thanks.

ubuntu@kube1:~$ juju status
Model                         Controller                Cloud/Region         Version  SLA
conjure-canonical-kubern-8cc  conjure-up-localhost-d3d  localhost/localhost  2.2.1    unsupported

App                    Version  Status   Scale  Charm                  Store       Rev  OS      Notes
easyrsa                3.0.1    active       1  easyrsa                jujucharms   15  ubuntu  
etcd                   2.3.8    active       3  etcd                   jujucharms   48  ubuntu  
flannel                0.7.0    active       4  flannel                jujucharms   26  ubuntu  
kubeapi-load-balancer  1.10.3   active       1  kubeapi-load-balancer  jujucharms   25  ubuntu  exposed
kubernetes-master      1.7.4    waiting      1  kubernetes-master      jujucharms   47  ubuntu  
kubernetes-worker      1.7.4    waiting      3  kubernetes-worker      jujucharms   52  ubuntu  exposed

Unit                      Workload  Agent      Machine  Public address  Ports           Message
easyrsa/0*                active    idle       6        10.2.33.158                     Certificate Authority connected.
etcd/0                    active    idle       1        10.2.33.120     2379/tcp        Healthy with 3 known peers
etcd/1                    active    idle       3        10.2.33.250     2379/tcp        Errored with 0 known peers
etcd/2*                   active    idle       2        10.2.33.234     2379/tcp        Healthy with 3 known peers
kubeapi-load-balancer/0*  active    idle       4        10.2.33.71      443/tcp         Loadbalancer ready.
kubernetes-master/0*      waiting   executing  5        10.2.33.246     6443/tcp        (update-status) Waiting to retry addon deployment
  flannel/0*              active    idle                10.2.33.246                     Flannel subnet 10.1.86.1/24
kubernetes-worker/0*      waiting   idle       7        10.2.33.46      80/tcp,443/tcp  Waiting for kubelet,kube-proxy to start.
  flannel/3               active    idle                10.2.33.46                      Flannel subnet 10.1.95.1/24
kubernetes-worker/1       active    idle       0        10.2.33.197     80/tcp,443/tcp  Kubernetes worker running.
  flannel/2               active    idle                10.2.33.197                     Flannel subnet 10.1.88.1/24
kubernetes-worker/2       active    idle       8        10.2.33.121     80/tcp,443/tcp  Kubernetes worker running.
  flannel/1               active    idle                10.2.33.121                     Flannel subnet 10.1.67.1/24

Machine  State    DNS          Inst id        Series  AZ  Message
0        started  10.2.33.197  juju-12b89b-0  xenial      Running
1        started  10.2.33.120  juju-12b89b-1  xenial      Running
2        started  10.2.33.234  juju-12b89b-2  xenial      Running
3        started  10.2.33.250  juju-12b89b-3  xenial      Running
4        started  10.2.33.71   juju-12b89b-4  xenial      Running
5        started  10.2.33.246  juju-12b89b-5  xenial      Running
6        started  10.2.33.158  juju-12b89b-6  xenial      Running
7        started  10.2.33.46   juju-12b89b-7  xenial      Running
8        started  10.2.33.121  juju-12b89b-8  xenial      Running

Relation           Provides               Consumes               Type
certificates       easyrsa                etcd                   regular
certificates       easyrsa                kubeapi-load-balancer  regular
certificates       easyrsa                kubernetes-master      regular
certificates       easyrsa                kubernetes-worker      regular
cluster            etcd                   etcd                   peer
etcd               etcd                   flannel                regular
etcd               etcd                   kubernetes-master      regular
cni                flannel                kubernetes-master      regular
cni                flannel                kubernetes-worker      regular
loadbalancer       kubeapi-load-balancer  kubernetes-master      regular
kube-api-endpoint  kubeapi-load-balancer  kubernetes-worker      regular
cni                kubernetes-master      flannel                subordinate
kube-control       kubernetes-master      kubernetes-worker      regular
cni                kubernetes-worker      flannel                subordinate

ubuntu@kube1:~$

gstanden commented Sep 2, 2017

I may be having a similar problem. Can anyone help with a fix for this? Thanks.

ubuntu@kube1:~$ juju status
Model                         Controller                Cloud/Region         Version  SLA
conjure-canonical-kubern-8cc  conjure-up-localhost-d3d  localhost/localhost  2.2.1    unsupported

App                    Version  Status   Scale  Charm                  Store       Rev  OS      Notes
easyrsa                3.0.1    active       1  easyrsa                jujucharms   15  ubuntu  
etcd                   2.3.8    active       3  etcd                   jujucharms   48  ubuntu  
flannel                0.7.0    active       4  flannel                jujucharms   26  ubuntu  
kubeapi-load-balancer  1.10.3   active       1  kubeapi-load-balancer  jujucharms   25  ubuntu  exposed
kubernetes-master      1.7.4    waiting      1  kubernetes-master      jujucharms   47  ubuntu  
kubernetes-worker      1.7.4    waiting      3  kubernetes-worker      jujucharms   52  ubuntu  exposed

Unit                      Workload  Agent      Machine  Public address  Ports           Message
easyrsa/0*                active    idle       6        10.2.33.158                     Certificate Authority connected.
etcd/0                    active    idle       1        10.2.33.120     2379/tcp        Healthy with 3 known peers
etcd/1                    active    idle       3        10.2.33.250     2379/tcp        Errored with 0 known peers
etcd/2*                   active    idle       2        10.2.33.234     2379/tcp        Healthy with 3 known peers
kubeapi-load-balancer/0*  active    idle       4        10.2.33.71      443/tcp         Loadbalancer ready.
kubernetes-master/0*      waiting   executing  5        10.2.33.246     6443/tcp        (update-status) Waiting to retry addon deployment
  flannel/0*              active    idle                10.2.33.246                     Flannel subnet 10.1.86.1/24
kubernetes-worker/0*      waiting   idle       7        10.2.33.46      80/tcp,443/tcp  Waiting for kubelet,kube-proxy to start.
  flannel/3               active    idle                10.2.33.46                      Flannel subnet 10.1.95.1/24
kubernetes-worker/1       active    idle       0        10.2.33.197     80/tcp,443/tcp  Kubernetes worker running.
  flannel/2               active    idle                10.2.33.197                     Flannel subnet 10.1.88.1/24
kubernetes-worker/2       active    idle       8        10.2.33.121     80/tcp,443/tcp  Kubernetes worker running.
  flannel/1               active    idle                10.2.33.121                     Flannel subnet 10.1.67.1/24

Machine  State    DNS          Inst id        Series  AZ  Message
0        started  10.2.33.197  juju-12b89b-0  xenial      Running
1        started  10.2.33.120  juju-12b89b-1  xenial      Running
2        started  10.2.33.234  juju-12b89b-2  xenial      Running
3        started  10.2.33.250  juju-12b89b-3  xenial      Running
4        started  10.2.33.71   juju-12b89b-4  xenial      Running
5        started  10.2.33.246  juju-12b89b-5  xenial      Running
6        started  10.2.33.158  juju-12b89b-6  xenial      Running
7        started  10.2.33.46   juju-12b89b-7  xenial      Running
8        started  10.2.33.121  juju-12b89b-8  xenial      Running

Relation           Provides               Consumes               Type
certificates       easyrsa                etcd                   regular
certificates       easyrsa                kubeapi-load-balancer  regular
certificates       easyrsa                kubernetes-master      regular
certificates       easyrsa                kubernetes-worker      regular
cluster            etcd                   etcd                   peer
etcd               etcd                   flannel                regular
etcd               etcd                   kubernetes-master      regular
cni                flannel                kubernetes-master      regular
cni                flannel                kubernetes-worker      regular
loadbalancer       kubeapi-load-balancer  kubernetes-master      regular
kube-api-endpoint  kubeapi-load-balancer  kubernetes-worker      regular
cni                kubernetes-master      flannel                subordinate
kube-control       kubernetes-master      kubernetes-worker      regular
cni                kubernetes-worker      flannel                subordinate

ubuntu@kube1:~$
@tvansteenburgh

This comment has been minimized.

Show comment
Hide comment
@tvansteenburgh

tvansteenburgh Sep 2, 2017

Member

@gstanden It'll be difficult to debug with the status output alone. Can you run https://github.com/juju-solutions/cdk-field-agent to collect the state and logs of your infrastructure?

Member

tvansteenburgh commented Sep 2, 2017

@gstanden It'll be difficult to debug with the status output alone. Can you run https://github.com/juju-solutions/cdk-field-agent to collect the state and logs of your infrastructure?

@marcoceppi

This comment has been minimized.

Show comment
Hide comment
@marcoceppi

marcoceppi Sep 2, 2017

Member
Member

marcoceppi commented Sep 2, 2017

@tvansteenburgh

This comment has been minimized.

Show comment
Hide comment
@tvansteenburgh

tvansteenburgh Sep 2, 2017

Member

I have a localhost lxd deployment running with allow-privileged="true" on both master and worker, and everything is fine. There must be something else to it than just allow-privileged on lxd.

Member

tvansteenburgh commented Sep 2, 2017

I have a localhost lxd deployment running with allow-privileged="true" on both master and worker, and everything is fine. There must be something else to it than just allow-privileged on lxd.

@ybaumy

This comment has been minimized.

Show comment
Hide comment
@ybaumy

ybaumy Jun 30, 2018

is this still open because im running into something at least very similar too. which results in a forever pending state for a view pods


ybaumy@kube1604:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                               READY     STATUS    RESTARTS   AGE
default       default-http-backend-mkdj7                         0/1       Pending   0          2h
default       nginx-ingress-kubernetes-worker-controller-nlcc2   0/1       Pending   0          2h
kube-system   heapster-v1.5.3-6c9b557676-mhxsl                   0/4       Pending   0          2h
kube-system   kube-dns-7b479ccbc6-qtxwm                          0/3       Pending   0          2h
kube-system   kubernetes-dashboard-6948bdb78-lxtft               0/1       Pending   0          2h
kube-system   metrics-server-v0.2.1-85646b8b5d-nt2gg             0/2       Pending   0          2h
kube-system   monitoring-influxdb-grafana-v4-b54d59784-l47mt     0/2       Pending   0          2h

018-06-30 08:49:01 DEBUG certificates-relation-changed The connection to the server localhost:8080 was refused - did you specify the right host or port?
2018-06-30 08:49:01 DEBUG certificates-relation-changed Traceback (most recent call last):
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 145, in <module>
2018-06-30 08:49:01 DEBUG certificates-relation-changed     main()
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 13, in main
2018-06-30 08:49:01 DEBUG certificates-relation-changed     if render_templates():
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 21, in render_templates
2018-06-30 08:49:01 DEBUG certificates-relation-changed     node_count = get_node_count()
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 129, in get_node_count
2018-06-30 08:49:01 DEBUG certificates-relation-changed     output = kubectl("get", "nodes", "-o", "name")
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 125, in kubectl
2018-06-30 08:49:01 DEBUG certificates-relation-changed     return subprocess.check_output(cmd)
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2018-06-30 08:49:01 DEBUG certificates-relation-changed     **kwargs).stdout
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/usr/lib/python3.5/subprocess.py", line 708, in run
2018-06-30 08:49:01 DEBUG certificates-relation-changed     output=stdout, stderr=stderr)
2018-06-30 08:49:01 DEBUG certificates-relation-changed subprocess.CalledProcessError: Command '['/snap/cdk-addons/415/kubectl', 'get', 'nodes', '-o', 'name']' returned non-zero exit status 1
2018-06-30 08:49:01 INFO juju-log certificates:83: Addons are not ready yet.
2018-06-30 08:49:23 INFO juju-log certificates:83: Invoking reactive handler: reactive/kubernetes_master.py:544:send_cluster_dns_detail
2018-06-30 08:49:24 DEBUG certificates-relation-changed active
2018-06-30 08:49:24 DEBUG certificates-relation-changed active
2018-06-30 08:49:24 DEBUG certificates-relation-changed active
2018-06-30 08:49:24 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending
2018-06-30 08:49:34 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending
2018-06-30 08:49:44 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending
2018-06-30 08:49:55 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending

ybaumy commented Jun 30, 2018

is this still open because im running into something at least very similar too. which results in a forever pending state for a view pods


ybaumy@kube1604:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                               READY     STATUS    RESTARTS   AGE
default       default-http-backend-mkdj7                         0/1       Pending   0          2h
default       nginx-ingress-kubernetes-worker-controller-nlcc2   0/1       Pending   0          2h
kube-system   heapster-v1.5.3-6c9b557676-mhxsl                   0/4       Pending   0          2h
kube-system   kube-dns-7b479ccbc6-qtxwm                          0/3       Pending   0          2h
kube-system   kubernetes-dashboard-6948bdb78-lxtft               0/1       Pending   0          2h
kube-system   metrics-server-v0.2.1-85646b8b5d-nt2gg             0/2       Pending   0          2h
kube-system   monitoring-influxdb-grafana-v4-b54d59784-l47mt     0/2       Pending   0          2h

018-06-30 08:49:01 DEBUG certificates-relation-changed The connection to the server localhost:8080 was refused - did you specify the right host or port?
2018-06-30 08:49:01 DEBUG certificates-relation-changed Traceback (most recent call last):
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 145, in <module>
2018-06-30 08:49:01 DEBUG certificates-relation-changed     main()
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 13, in main
2018-06-30 08:49:01 DEBUG certificates-relation-changed     if render_templates():
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 21, in render_templates
2018-06-30 08:49:01 DEBUG certificates-relation-changed     node_count = get_node_count()
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 129, in get_node_count
2018-06-30 08:49:01 DEBUG certificates-relation-changed     output = kubectl("get", "nodes", "-o", "name")
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/snap/cdk-addons/415/apply", line 125, in kubectl
2018-06-30 08:49:01 DEBUG certificates-relation-changed     return subprocess.check_output(cmd)
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2018-06-30 08:49:01 DEBUG certificates-relation-changed     **kwargs).stdout
2018-06-30 08:49:01 DEBUG certificates-relation-changed   File "/usr/lib/python3.5/subprocess.py", line 708, in run
2018-06-30 08:49:01 DEBUG certificates-relation-changed     output=stdout, stderr=stderr)
2018-06-30 08:49:01 DEBUG certificates-relation-changed subprocess.CalledProcessError: Command '['/snap/cdk-addons/415/kubectl', 'get', 'nodes', '-o', 'name']' returned non-zero exit status 1
2018-06-30 08:49:01 INFO juju-log certificates:83: Addons are not ready yet.
2018-06-30 08:49:23 INFO juju-log certificates:83: Invoking reactive handler: reactive/kubernetes_master.py:544:send_cluster_dns_detail
2018-06-30 08:49:24 DEBUG certificates-relation-changed active
2018-06-30 08:49:24 DEBUG certificates-relation-changed active
2018-06-30 08:49:24 DEBUG certificates-relation-changed active
2018-06-30 08:49:24 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending
2018-06-30 08:49:34 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending
2018-06-30 08:49:44 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending
2018-06-30 08:49:55 INFO juju-log certificates:83: Checking system pods status: heapster-v1.5.3-54cdccd684-ntvmm=Pending, kube-dns-7b479ccbc6-qtxwm=Pending, kubernetes-dashboard-6948bdb78-lxtft=Pending, metrics-server-v0.2.1-85646b8b5d-nt2gg=Pending, monitoring-influxdb-grafana-v4-b54d59784-l47mt=Pending
@Cynerva

This comment has been minimized.

Show comment
Hide comment
@Cynerva

Cynerva Jul 2, 2018

Contributor

@ybaumy Nope, this issue is closed. Please open a new issue and attach a cdk-field-agent archive, and we'll do our best to help you out.

Contributor

Cynerva commented Jul 2, 2018

@ybaumy Nope, this issue is closed. Please open a new issue and attach a cdk-field-agent archive, and we'll do our best to help you out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment