Skip to content
This repository has been archived by the owner on Oct 22, 2020. It is now read-only.

Stack Build Error #22

Closed
svenmueller opened this issue Sep 13, 2015 · 7 comments
Closed

Stack Build Error #22

svenmueller opened this issue Sep 13, 2015 · 7 comments

Comments

@svenmueller
Copy link
Contributor

Hi,

When i use the latest version of https://github.com/metral/corekube/blob/master/corekube-cloudservers.yaml i get an error when creating the stack (rackspace). Any idea?

Resource CREATE failed: resources.kubernetes_minions: Property error: resources[1].properties.networks[0].network: Error validating value '00000000-0000-0000-0000-000000000000': SSL certificate validation has failed: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Thanks,
Sven

@metral
Copy link
Owner

metral commented Sep 17, 2015

Hello,

I just tried running it and have found no issues.

The command I issued is the same as in the readme:
heat stack-create foobar --template-file corekube-cloudservers.yaml -P keyname=<RAX_SSH_KEY>

Are you still running into issues? If so, could you provide some more info/logs of the issue?

@svenmueller
Copy link
Contributor Author

Hi,

The mentioned issue dissappeared again (looks like Rackspace did some changes/fixes under the hood...). But still i have problems. Now the coreos cluster is not working:

kubernetes-master ~ # fleetctl list-units
Error retrieving list of units from repository: googleapi: Error 503: fleet server unable to communicate with etcd
kubernetes-master ~ # journalctl -u etcd.service
-- Logs begin at Sat 2015-09-19 22:49:30 UTC, end at Sat 2015-09-19 22:54:39 UTC. --
Sep 19 22:49:42 kubernetes-master systemd[1]: Started etcd.
Sep 19 22:49:42 kubernetes-master systemd[1]: Starting etcd...
Sep 19 22:49:42 kubernetes-master etcd[1073]: [etcd] Sep 19 22:49:42.559 INFO      | Discovery via http://10.182.65.214:2379 using prefix discovery/<TOKEN>.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Unit entered failed state.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Failed with result 'exit-code'.
Sep 19 22:49:53 kubernetes-master systemd[1]: etcd.service: Service hold-off time over, scheduling restart.

Does it work for you @metral ?

@svenmueller
Copy link
Contributor Author

Hi @metral,

Any updates on this? Does etcd work for you (e.g. on the kubernetes master)?

Thx Sven

@metral
Copy link
Owner

metral commented Oct 2, 2015

Apologies on my lack of a response. Have you tried starting a clean
deployment? Are you trying to fix an existing deployment? Please provide me
with more information to help recreate your issue

On Thursday, October 1, 2015, Sven Müller notifications@github.com wrote:

Hi @metral https://github.com/metral,

Any updates on this? Does etcd work for you (e.g. on the kubernetes
master)?

Thx Sven


Reply to this email directly or view it on GitHub
#22 (comment).

-Mike Metral

@svenmueller
Copy link
Contributor Author

Hi @metral,

Yep, i always destroy the old stack and create a new stack using the heat template (repeated it couple of times to see if it is reproducable). After the stack is ready, i'm using ssh to access the kubernetes master node. There i can see that there are issues with etcd.

kubernetes-master ~ # journalctl -u etcd.service
-- Logs begin at Sat 2015-09-19 22:49:30 UTC, end at Sat 2015-09-19 22:54:39 UTC. --
Sep 19 22:49:42 kubernetes-master systemd[1]: Started etcd.
Sep 19 22:49:42 kubernetes-master systemd[1]: Starting etcd...
Sep 19 22:49:42 kubernetes-master etcd[1073]: [etcd] Sep 19 22:49:42.559 INFO      | Discovery via http://10.182.65.214:2379 using prefix discovery/<TOKEN>.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Unit entered failed state.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Failed with result 'exit-code'.
Sep 19 22:49:53 kubernetes-master systemd[1]: etcd.service: Service hold-off time over, scheduling restart.

Thx for the support :)

@metral
Copy link
Owner

metral commented Oct 2, 2015

This is very odd - I've done 2 clean deployments just now and when you originally opened up the issue but I still am not running into the issues that you're describing. My steps from beginning to end in the ORD region:

  • heat stack-create corekube --template-file corekube-cloudservers.yaml -P keyname=<SSH_REGION_KEY>
  • wait a couple of min until stack is created
  • ssh into overlord to check its progress
    • from overlord: docker logs overlord - this can take upto an additional 4-5 min after heat creates the stack
    • once this is done and it says its seen & deployed to 4 machines (the default - 1 master + 3 minions) i log into the k8s master
  • in the k8s master my etcd.service is just fine and from there i can use k8s as expected - here is my etcd.service log:
-- Logs begin at Fri 2015-10-02 17:43:38 UTC, end at Fri 2015-10-02 17:51:10 UTC. --
Oct 02 17:43:49 kubernetes-master systemd[1]: Started etcd.
Oct 02 17:43:49 kubernetes-master systemd[1]: Starting etcd...
Oct 02 17:43:49 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:49.999 INFO      | Discovery via http://10.210.104.64:2379 using prefix discovery/GhOOQ7AxAQr0wgygYd6eHrgkk7pNuQsX.
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.013 INFO      | Discovery found peers [http://10.210.104.74:7001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.013 INFO      | Discovery fetched back peer list: [10.210.104.74:7001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.018 INFO      | Send Join Request to http://10.210.104.74:7001/join
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.029 INFO      | kubernetes_master joined the cluster via peer 10.210.104.74:7001
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | etcd server [name kubernetes_master, listen on :4001, advertised url http://10.210.104.78:4001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | peer server [name kubernetes_master, listen on :7001, advertised url http://10.210.104.78:7001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | kubernetes_master starting in peer mode
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | kubernetes_master: state changed from 'initialized' to 'follower'.
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.080 INFO      | kubernetes_master: peer added: 'overlord'
Oct 02 17:43:53 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:53.669 INFO      | kubernetes_master: peer added: 'kubernetes_minion_0'
Oct 02 17:44:03 kubernetes-master etcd[1052]: [etcd] Oct  2 17:44:03.234 INFO      | kubernetes_master: peer added: 'kubernetes_minion_2'
Oct 02 17:44:16 kubernetes-master etcd[1052]: [etcd] Oct  2 17:44:16.633 INFO      | kubernetes_master: peer added: 'kubernetes_minion_1'

can you provide the steps you're taking? from your issues it seems that you're discovery node is not setting up the private etcd server that both the overlord and k8s use/depend on, but I am not sure as to why its having issues.

could you try again deploying from scratch or provide me with more information into your discovery node's log files for the container running it: docker logs discovery

@metral
Copy link
Owner

metral commented Oct 12, 2015

Closing due to inactivity. Please reopen if the issues still continue.

@metral metral closed this as completed Oct 12, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants