New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vagrant to include kubeworkers and refator edge, worker loop #1365

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@andreijs
Copy link
Contributor

andreijs commented Apr 19, 2016

Hey guys,

Here is the updated vagrant file for kubeworker.

  • [done ] Installs cleanly on a fresh build of most recent master branch
  • [ done] Upgrades cleanly from the most recent release
  • [ done] Updates documentation relevant to the changes
Vagrantfile Outdated
@@ -26,6 +30,36 @@ else
config_hash = config_hash.merge(YAML.load(File.read(config_path)))
end

def spin_up(config_hash:, config:, server_array:, hostvars:, hosts:, server_type:)

This comment has been minimized.

@siddharthist

siddharthist Apr 20, 2016

Contributor

This function could use an explaining comment

@siddharthist

This comment has been minimized.

Copy link
Contributor

siddharthist commented Apr 20, 2016

This looks great, thanks @andreijs! I left just a few comments

Vagrantfile Outdated
"ansible_ssh_host" => ip,
"private_ipv4" => ip,
"public_ipv4" => ip,
"role" => server_type

This comment has been minimized.

@siddharthist

siddharthist Apr 20, 2016

Contributor

server_type could just be renamed to "role" for consistency

Vagrantfile Outdated
end
end
spin_up(config_hash: config_hash, config: config, hostvars: hostvars, server_array: workers, hosts: hosts, server_type: 'worker')
spin_up(config_hash: config_hash, config: config, hostvars: hostvars, server_array: workers, hosts: hosts, server_type: 'kubeworker')

This comment has been minimized.

@siddharthist

siddharthist Apr 20, 2016

Contributor

Shouldn't this server_array be kubeworkers or similar?

@siddharthist

This comment has been minimized.

Copy link
Contributor

siddharthist commented Apr 20, 2016

@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented Apr 20, 2016

@siddharthist thanks for your feedback I had made the changes pointed out 👍

consul_package: consul-0.6.3
consul_ui_package: consul-ui-0.6.3
consul_package: consul-0.6.4
consul_ui_package: consul-ui-0.6.4

This comment has been minimized.

@siddharthist

siddharthist Apr 20, 2016

Contributor

Can we put this in a separate PR? Doesn't seem directly related.

This comment has been minimized.

@andreimc

andreimc Apr 20, 2016

Contributor

Sorry I didn't realize I commited this


A base IP address which will have its last digit appended. For example, if
``worker_ip_start`` is set to "192.168.100.10", the first worker node will
have the IP address 192.168.100.101, the second will have 192.168.100.102,
etc.

.. data:: worker_memory, control_memory, edge_memory
.. data:: worker_memory, control_memory, edge_memory, kubeworker_memeory

This comment has been minimized.

@ryane

ryane Apr 20, 2016

Contributor

small typo: s/memeory/memory

@siddharthist

This comment has been minimized.

Copy link
Contributor

siddharthist commented Apr 20, 2016

I got this error on vagrant up:

TASK: [kubernetes-addons | create/update skydns replication controller] ******* 
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/skydns-rc.yaml) command (rc=1): Error from server: error when creating "/etc/kubernetes/manifests/skydns-rc.yaml": namespaces "kube-system" not found


FATAL: all hosts have already failed -- aborting
@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented Apr 20, 2016

I get this one some times haven't seen the one you got @siddharthist

TASK: [kubernetes-addons | create/update elasticsearch service] ***************
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/es-svc.yaml) command (rc=1): Error from server: error when creating "/etc/kubernetes/manifests/es-svc.yaml": Internal error occurred: failed to allocate a serviceIP: cannot allocate resources of type serviceipallocations at this time

@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented Apr 21, 2016

Everything works for me apart from Kube UI and nginx-consul starting on kubeworker ref #1346

PLAY RECAP ********************************************************************
mesos | wait for zookeeper service to be registered -------------------- 38.28s
common | install system utilities -------------------------------------- 37.09s
consul | wait for leader ----------------------------------------------- 30.54s
kubernetes | pull hyperkube docker image ------------------------------- 24.81s
kubernetes-master | wait for apiserver to come up ---------------------- 16.20s
etcd | restart skydns -------------------------------------------------- 14.56s
kubernetes-master | download kubernetes binaries ----------------------- 14.36s
mantlui | ensure nginx-mantlui docker image is present ----------------- 12.17s
kubernetes-node | download kubernetes binaries ------------------------- 12.07s
zookeeper | install zookeepercli package ------------------------------- 10.47s
control-01                 : ok=271  changed=206  unreachable=0    failed=0
edge-001                   : ok=122  changed=91   unreachable=0    failed=0
kubeworker-001             : ok=152  changed=106  unreachable=0    failed=0
localhost                  : ok=0    changed=0    unreachable=0    failed=0
worker-001                 : ok=123  changed=88   unreachable=0    failed=0

➜  mantl (master) ✔ vagrant ssh kubeworker-001
No vagrant-config.yml found, using defaults
Last login: Wed Apr 20 23:54:41 2016 from control-01
[vagrant@kubeworker-001 ~]$
@siddharthist

This comment has been minimized.

Copy link
Contributor

siddharthist commented Apr 21, 2016

I got another error this time:

TASK: [kubernetes-addons | create/update grafana service] ********************* 
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/grafana-service.yaml) command (rc=1): Error from server: error when creating "/etc/kubernetes/manifests/grafana-service.yaml": Internal error occurred: failed to allocate a serviceIP: cannot allocate resources of type serviceipallocations at this time


FATAL: all hosts have already failed -- aborting
@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented Apr 22, 2016

@siddharthist can you give me your machine specs OS, ansible version etc ?

@siddharthist

This comment has been minimized.

Copy link
Contributor

siddharthist commented Apr 22, 2016

@andreimc I have Vagrant 1.8.1 and Oracle VM VirtualBox Manager 5.0.16_OSE. My host's version of ansible doesn't/shouldn't affect anything, the VMs are provisioned from the control node.

@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented Apr 22, 2016

@siddharthist I really don't know why it fails, me and a co-worker both tried to spin it up and it worked fine, Vagrant file updates should not really cause ansible to fail ... maybe get someone else to try it.

@ryane ryane modified the milestone: 1.1 Apr 22, 2016

@ryane

This comment has been minimized.

Copy link
Contributor

ryane commented Apr 26, 2016

I'm also having a lot of trouble getting kubernetes to run on Vagrant. Provisioning fails intermittently with various errors. Here are a couple I have seen repeatedly:

TASK: [kubernetes-addons | create or update dashboard] ************************
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/kubernetes-dashboard.yaml) command (rc=1): You have exposed your service on an external port on all nodes in your
cluster.  If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:30000) to serve traffic.

See http://releases.k8s.io/release-1.2/docs/user-guide/services-firewalls.md for more details.
service "kubernetes-dashboard" created
TASK: [kubernetes-addons | create or update dashboard] ************************
failed: [control-01] => {"failed": true}
msg: error running kubectl (/bin/kubectl --server=http://localhost:8085/ --namespace=kube-system create --filename=/etc/kubernetes/manifests/kubernetes-dashboard.yaml) command (rc=1): replicationcontroller "kubernetes-dashboard" created

Repeated provisioning attempts might ultimately complete but still seeing various problems with Kubernetes:

  1. UI not accessible

  2. No nodes registered

    $ kubectl get nodes
    
    # no results
    
  3. Errors running kubectl

    $ kubectl get po
    Error from server: an error on the server has prevented the request from succeeding
    

@BrianHicks @Zogg any ideas on this?

@SillyMoo

This comment has been minimized.

Copy link

SillyMoo commented Apr 26, 2016

I get the same issue, but if I re-run with a 'vagrant provision' it all springs to life. Looks like a timing issue to me (I know that the ansible scripts wait for hyperkube to be pulled, but do they start for it to actually be up and listening?).

@SillyMoo

This comment has been minimized.

Copy link

SillyMoo commented Apr 26, 2016

Ok, I tell a bit of a lie. Ansible finishes ok, hyperkube is running and I see a node in kubectl. However I can't actually get kubernetes to pull any images (the pod just sits there, no image pull events, and no sign on the kubeworker that any images are being pulled).

@andreimc andreimc force-pushed the andreimc:master branch from 0b7445b to b091ae5 Apr 28, 2016

@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented Apr 28, 2016

With latest master merged in it fails to restart skydns, not sure what would be causing it It just hangs for a while then I get the following error message:

NOTIFIED: [dnsmasq | restart dnsmasq] *****************************************
changed: [control-01]
changed: [kubeworker-001]
changed: [edge-001]

PLAY [role=worker] ************************************************************

TASK: [mesos | install mesos packages] ****************************************
FATAL: no hosts matched or all hosts have already failed -- aborting

Not sure why.

@stevendborrelli

This comment has been minimized.

Copy link
Contributor

stevendborrelli commented May 3, 2016

Docker fails on this

TASK: [docker | enable docker] ************************************************ 
failed: [control-01] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

But this is due to the new docker implementation not creating a /etc/sysconfig/mantl-storage file on non-lvm based systems and not a problem with this PR.

@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented May 4, 2016

The problems with this will be fixed after #1409 and #1410 are merged in.

@siddharthist

This comment has been minimized.

Copy link
Contributor

siddharthist commented May 4, 2016

@andreijs @andreimc Can you rebase this? Both those PRs have been merged.

@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented May 4, 2016

@siddharthist up to date.

@ryane

This comment has been minimized.

Copy link
Contributor

ryane commented May 4, 2016

Had a successful build but I am back to

Internal Server Error (500)

Get https://10.254.0.1:443/api/v1/replicationcontrollers: dial tcp 10.254.0.1:443: getsockopt: connection refused

when trying to access the Kubernetes UI. 10.254.0.1 is the cluster ip for the kubernetes service.

kubectl get svc --namespace=default
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.254.0.1   <none>        443/TCP   1h

Logs from the kubernetes-dashboard pod's container:

2016/05/04 12:50:26 Incoming HTTP/1.0 GET /api/v1/replicationcontrollers request from 10.10.99.1:33787
2016/05/04 12:50:26 Getting list of all replication controllers in the cluster
2016/05/04 12:50:26 Get https://10.254.0.1:443/api/v1/replicationcontrollers: dial tcp 10.254.0.1:443: getsockopt: connection refused
2016/05/04 12:50:26 Outcoming response to 10.10.99.1:33787 with 500 status code
@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented May 7, 2016

hey guys, I had some time span this up in vagrant, I still get 502 for kube ui. :(, asnible ran ok tho.

@andreimc

This comment has been minimized.

Copy link
Contributor

andreimc commented May 7, 2016

I get the following on control-01 when i try to list pods:

[vagrant@control-01 ~]$ kubectl --namespace kube-system get pods
Error from server: an error on the server has prevented the request from succeeding

@ryane ryane modified the milestones: 1.2, 1.1 May 12, 2016

@manishrajkarnikar

This comment has been minimized.

Copy link

manishrajkarnikar commented May 14, 2016

@andreimc @siddharthist curious how is file groups_var/all/kubernetes_vars.yml read in vagrant run? or is it required at all.

@Zogg

This comment has been minimized.

Copy link
Contributor

Zogg commented May 16, 2016

@manishrajkarnikar groups_var/all/kubernetes_vars.yml usage in local ansible run shouldn't be different from the remote case.
As far as I remember yes, the kubernetes_vars.yml was mandatory to have the kubernetes roles play nice.

@manishrajkarnikar

This comment has been minimized.

Copy link

manishrajkarnikar commented May 16, 2016

@Zogg I don't see it being mentioned in the vagrant file. I added that in my vagrant file as raw parameter and I was able to get K8s multi node cluster up and running. I couldn't get single master and slave node going though probably because of bug reported in vagrant file.

ryane and others added some commits May 31, 2016

@andreimc andreimc force-pushed the andreimc:master branch from 7543b2f to f877500 Jun 12, 2016

@andreijs

This comment has been minimized.

Copy link
Contributor

andreijs commented Jun 12, 2016

Opening new PR of a branch closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment