New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vagrant origin create registry fails #391

Closed
dlbewley opened this Issue Jul 24, 2015 · 8 comments

Comments

Projects
None yet
7 participants
@dlbewley
Contributor

dlbewley commented Jul 24, 2015

What should the first steps be after vagrant provision succeeds? There is no pointer in README_vagrant.md

Based on these docs I'm attempting to deploy a docker registry.

  • Perms are restrictive:
[vagrant@ose3-master ~]$ ls -la /etc/openshift/master
total 164
drwxr-xr-x. 2 root root  4096 Jul 24 19:49 .
drwxr-xr-x. 4 root root    43 Jul 24 19:50 ..
-rw-r--r--. 1 root root  1115 Jul 24 19:48 admin.crt
-rw-------. 1 root root  1679 Jul 24 19:48 admin.key
-rw-------. 1 root root  5716 Jul 24 19:48 admin.kubeconfig
-rw-r--r--. 1 root root  1066 Jul 24 19:48 ca.crt
-rw-------. 1 root root  1675 Jul 24 19:48 ca.key
-rw-r--r--. 1 root root     1 Jul 24 19:50 ca.serial.txt
-rw-r--r--. 1 root root  2201 Jul 24 19:48 etcd.server.crt
-rw-------. 1 root root  1679 Jul 24 19:48 etcd.server.key
-rw-r--r--. 1 root root  3431 Jul 24 19:49 master-config.yaml
-rw-r--r--. 1 root root  1070 Jul 24 19:48 master.etcd-client.crt
-rw-------. 1 root root  1675 Jul 24 19:48 master.etcd-client.key
-rw-r--r--. 1 root root  1070 Jul 24 19:48 master.kubelet-client.crt
-rw-------. 1 root root  1679 Jul 24 19:48 master.kubelet-client.key
-rw-r--r--. 1 root root  2201 Jul 24 19:48 master.server.crt
-rw-------. 1 root root  1679 Jul 24 19:48 master.server.key
-rw-r--r--. 1 root root  1119 Jul 24 19:48 openshift-master.crt
-rw-------. 1 root root  1675 Jul 24 19:48 openshift-master.key
-rw-------. 1 root root  5760 Jul 24 19:48 openshift-master.kubeconfig
-rw-r--r--. 1 root root  1127 Jul 24 19:48 openshift-registry.crt
-rw-------. 1 root root  1675 Jul 24 19:48 openshift-registry.key
-rw-------. 1 root root  5780 Jul 24 19:48 openshift-registry.kubeconfig
-rw-r--r--. 1 root root  1119 Jul 24 19:48 openshift-router.crt
-rw-------. 1 root root  1679 Jul 24 19:48 openshift-router.key
-rw-------. 1 root root  5764 Jul 24 19:48 openshift-router.kubeconfig
-rw-r--r--. 1 root root 35495 Jul 24 19:49 policy.json
-rw-r--r--. 1 root root   459 Jul 24 19:49 scheduler.json
-rw-------. 1 root root  1675 Jul 24 19:48 serviceaccounts.private.key
-rw-------. 1 root root   459 Jul 24 19:48 serviceaccounts.public.key
  • Fix? perms. Should this be in the playbook?
[vagrant@ose3-master ~]$ export KUBECONFIG=/etc/openshift/master/admin.kubeconfig
[vagrant@ose3-master ~]$ export CREDENTIALS=/etc/openshift/master/openshift-registry.kubeconfig
[vagrant@ose3-master ~]$ sudo chmod +r $KUBECONFIG $CREDENTIALS
[vagrant@ose3-master ~]$ ls -l  $KUBECONFIG $CREDENTIALS
-rw-r--r--. 1 root root 5716 Jul 24 19:48 /etc/openshift/master/admin.kubeconfig
-rw-r--r--. 1 root root 5780 Jul 24 19:48 /etc/openshift/master/openshift-registry.kubeconfig

Attempt to create registry

[vagrant@ose3-master ~]$ oadm registry --create --credentials=$CREDENTIALS --config=$KUBECONFIG
deploymentconfigs/docker-registry
services/docker-registry
[vagrant@ose3-master ~]$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   0/1       Pending   0          9s
  • Failure
[vagrant@ose3-master ~]$ oc get pods
NAME                       READY     STATUS         RESTARTS   AGE
docker-registry-1-deploy   0/1       ExitCode:255   0          59s
[vagrant@ose3-master ~]$ oc logs docker-registry-1-deploy
F0724 20:24:14.746997       1 deployer.go:64] couldn't get deployment default/docker-registry-1: Get https://ose3-master.example.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1: dial tcp: lookup ose3-master.example.com: no such host

Nodes Status

I'm not sure why there is an apparent DNS failure, because all 3 machines can reach each other.

  • master is reachable
[vagrant@ose3-node1 ~]$ curl -k https://ose3-master.example.com:8443/healthz
ok
[vagrant@ose3-node2 ~]$ curl -k https://ose3-master.example.com:8443/healthz
ok
  • docker status
[vagrant@ose3-node1 ~]$ sudo docker images
REPOSITORY                            TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
docker.io/openshift/origin-deployer   v1.0.3              eb7436dcd694        4 days ago          413.1 MB
docker.io/openshift/origin-pod        v1.0.3              0809cb1f232c        4 days ago          1.105 MB

[vagrant@ose3-node1 ~]$ sudo docker ps -a
CONTAINER ID        IMAGE                              COMMAND                CREATED             STATUS                        PORTS               NAMES
1c7e4304dd10        openshift/origin-deployer:v1.0.3   "/usr/bin/openshift-   11 minutes ago      Exited (255) 11 minutes ago                       k8s_deployment.c9387ef1_docker-registry-1-deploy_default_d5e6b23a-3241-11e5-9061-080027893417_effb7c43
0ae2ba517215        openshift/origin-pod:v1.0.3        "/pod"                 12 minutes ago      Exited (0) 11 minutes ago                         k8s_POD.d324c42e_docker-registry-1-deploy_default_d5e6b23a-3241-11e5-9061-080027893417_55bf5e84

[vagrant@ose3-node1 ~]$ sudo docker logs 1c7e4304dd10
F0724 20:24:14.746997       1 deployer.go:64] couldn't get deployment default/docker-registry-1: Get https://ose3-master.example.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1: dial tcp: lookup ose3-master.example.com: no such host

[vagrant@ose3-node1 ~]$ sudo docker logs 0ae2ba517215
@csochin

This comment has been minimized.

Show comment
Hide comment
@dlbewley

This comment has been minimized.

Show comment
Hide comment
@dlbewley

dlbewley Jul 24, 2015

Contributor

To answer my own question about next steps after the playbook, this seems to be a pretty good place to go https://docs.openshift.com/enterprise/3.0/admin_guide/install/docker_registry.html

Contributor

dlbewley commented Jul 24, 2015

To answer my own question about next steps after the playbook, this seems to be a pretty good place to go https://docs.openshift.com/enterprise/3.0/admin_guide/install/docker_registry.html

@dlbewley

This comment has been minimized.

Show comment
Hide comment
@dlbewley

dlbewley Jul 26, 2015

Contributor

Enable DNS Server on Vagrant Host

I added vagrant landrush after seeing it was more simple to use than the vagrant dnsmasq.

diff --git a/Vagrantfile b/Vagrantfile
index a832ae8..bfa13ac 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -11,6 +11,8 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
   deployment_type = ENV['OPENSHIFT_DEPLOYMENT_TYPE'] || 'origin'
   num_nodes = (ENV['OPENSHIFT_NUM_NODES'] || 2).to_i

+  config.landrush.enabled = true
+  config.landrush.tld = 'example.com'
   config.hostmanager.enabled = true
   config.hostmanager.manage_host = true
   config.hostmanager.include_offline = true
@@ -39,6 +41,7 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
     config.vm.define "node#{node_index}" do |node|
       node.vm.hostname = "ose3-node#{node_index}.example.com"
       node.vm.network :private_network, ip: "192.168.100.#{200 + n}"
+      node.landrush.host_ip_address =  "192.168.100.#{200 + n}"
       config.vm.provision "shell", inline: "nmcli connection reload; systemctl restart network.service"
     end
   end
@@ -47,6 +50,7 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
     master.vm.hostname = "ose3-master.example.com"
     master.vm.network :private_network, ip: "192.168.100.100"
     master.vm.network :forwarded_port, guest: 8443, host: 8443
+    master.landrush.host_ip_address = "192.168.100.100"
     config.vm.provision "shell", inline: "nmcli connection reload; systemctl restart network.service"
     master.vm.provision "ansible" do |ansible|
       ansible.limit = 'all'

This creates a DNS server on the host and tells the nodes to use it

$ cat /etc/resolver/example.com
# Generated by landrush, a vagrant plugin
nameserver 127.0.0.1
port 10053
$ vagrant landrush ls
ose3-node1.example.com          192.168.100.200
ose3-node2.example.com          192.168.100.201
ose3-master.example.com         192.168.100.100

[vagrant@ose3-node1 ~]$ cat /etc/resolv.conf
# Generated by NetworkManager
search home.bewley.net example.com
nameserver 10.0.2.3

DNS Resolution In the Nodes

  • Now host $(hostname) and dig works on the VMs
[vagrant@ose3-master ~]$ sudo yum -y install bind-utils
[vagrant@ose3-master ~]$ grep nameserver /etc/resolv.conf
nameserver 10.0.2.3
[vagrant@ose3-master ~]$ dig ose3-node1.example.com @10.0.2.3 +short
192.168.100.200
[vagrant@ose3-master ~]$ dig ose3-node2.example.com @10.0.2.3 +short
192.168.100.201
[vagrant@ose3-master ~]$ host $(hostname)
ose3-master.example.com has address 192.168.100.100
Host ose3-master.example.com not found: 3(NXDOMAIN)
Host ose3-master.example.com not found: 3(NXDOMAIN)

[vagrant@ose3-node2 ~]$ dig ose3-master.example.com @10.0.2.3 +short
192.168.100.100
[vagrant@ose3-node2 ~]$ host $(hostname)
ose3-node2.example.com has address 192.168.100.201
Host ose3-node2.example.com not found: 3(NXDOMAIN)
Host ose3-node2.example.com not found: 3(NXDOMAIN)
  • Registry creation still fails with a DNS error.
[vagrant@ose3-master ~]$ oc get pods
NAME                       READY     STATUS         RESTARTS   AGE
docker-registry-1-deploy   0/1       ExitCode:255   0          4m
[vagrant@ose3-master ~]$ oc logs docker-registry-1-deploy
F0726 02:41:00.845168       1 deployer.go:64] couldn't get deployment default/docker-registry-1: Get https://ose3-master.example.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1: dial tcp: lookup ose3-master.example.com: no such host

DNS Resolution In the Container

Looking closer at the failed container and it's DNS settings:

[vagrant@ose3-node2 ~]$ sudo docker ps -a
CONTAINER ID        IMAGE                              COMMAND                CREATED             STATUS                           PORTS               NAMES
cbb44b91a61c        openshift/origin-deployer:v1.0.3   "/usr/bin/openshift-   About an hour ago   Exited (255) About an hour ago                       k8s_deployment.25e680b6_docker-registry-1-deploy_default_382a8468-333f-11e5-b4c7-080027893417_da26b35b
dff43b80c38c        openshift/origin-pod:v1.0.3        "/pod"                 About an hour ago   Exited (0) About an hour ago                         k8s_POD.d324c42e_docker-registry-1-deploy_default_382a8468-333f-11e5-b4c7-080027893417_942b60fb

[vagrant@ose3-node2 ~]$ sudo docker inspect --format='{{.HostConfig.Dns}}' cbb44b91a61c
[10.0.2.15 10.0.2.3]

[vagrant@ose3-node2 ~]$ sudo docker inspect --format='{{.HostConfig.DnsSearch}}' cbb44b91a61c
[default.svc.cluster.local svc.cluster.local cluster.local home.bewley.net example.com]

[vagrant@ose3-node2 ~]$ sudo docker inspect --format='{{.ResolvConfPath}}' cbb44b91a61c
/var/lib/docker/containers/dff43b80c38c98c754be52a206b703db8499622a2251d4b10b110e0c4d8e28d2/resolv.conf

[vagrant@ose3-node2 ~]$ sudo cat /var/lib/docker/containers/dff43b80c38c98c754be52a206b703db8499622a2251d4b10b110e0c4d8e28d2/resolv.conf
nameserver 10.0.2.15
nameserver 10.0.2.3
search default.svc.cluster.local svc.cluster.local cluster.local home.bewley.net example.com
options ndots:5

?

  • Why is 10.0.2.15 listed? This is the enp0s3 int on each of the VMs. (i hadn't noticed they were all the same before...)
[vagrant@ose3-node2 ~]$ ip addr ls enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:89:34:17 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 79831sec preferred_lft 79831sec
    inet6 fe80::a00:27ff:fe89:3417/64 scope link
       valid_lft forever preferred_lft forever

[vagrant@ose3-node2 ~]$ dig ose3-master.example.com. @10.0.2.15 +short

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.2 <<>> ose3-master.example.com. @10.0.2.15 +short
;; global options: +cmd
;; connection timed out; no servers could be reached
  • How does the resolv.conf get defined for the container?
Contributor

dlbewley commented Jul 26, 2015

Enable DNS Server on Vagrant Host

I added vagrant landrush after seeing it was more simple to use than the vagrant dnsmasq.

diff --git a/Vagrantfile b/Vagrantfile
index a832ae8..bfa13ac 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -11,6 +11,8 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
   deployment_type = ENV['OPENSHIFT_DEPLOYMENT_TYPE'] || 'origin'
   num_nodes = (ENV['OPENSHIFT_NUM_NODES'] || 2).to_i

+  config.landrush.enabled = true
+  config.landrush.tld = 'example.com'
   config.hostmanager.enabled = true
   config.hostmanager.manage_host = true
   config.hostmanager.include_offline = true
@@ -39,6 +41,7 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
     config.vm.define "node#{node_index}" do |node|
       node.vm.hostname = "ose3-node#{node_index}.example.com"
       node.vm.network :private_network, ip: "192.168.100.#{200 + n}"
+      node.landrush.host_ip_address =  "192.168.100.#{200 + n}"
       config.vm.provision "shell", inline: "nmcli connection reload; systemctl restart network.service"
     end
   end
@@ -47,6 +50,7 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
     master.vm.hostname = "ose3-master.example.com"
     master.vm.network :private_network, ip: "192.168.100.100"
     master.vm.network :forwarded_port, guest: 8443, host: 8443
+    master.landrush.host_ip_address = "192.168.100.100"
     config.vm.provision "shell", inline: "nmcli connection reload; systemctl restart network.service"
     master.vm.provision "ansible" do |ansible|
       ansible.limit = 'all'

This creates a DNS server on the host and tells the nodes to use it

$ cat /etc/resolver/example.com
# Generated by landrush, a vagrant plugin
nameserver 127.0.0.1
port 10053
$ vagrant landrush ls
ose3-node1.example.com          192.168.100.200
ose3-node2.example.com          192.168.100.201
ose3-master.example.com         192.168.100.100

[vagrant@ose3-node1 ~]$ cat /etc/resolv.conf
# Generated by NetworkManager
search home.bewley.net example.com
nameserver 10.0.2.3

DNS Resolution In the Nodes

  • Now host $(hostname) and dig works on the VMs
[vagrant@ose3-master ~]$ sudo yum -y install bind-utils
[vagrant@ose3-master ~]$ grep nameserver /etc/resolv.conf
nameserver 10.0.2.3
[vagrant@ose3-master ~]$ dig ose3-node1.example.com @10.0.2.3 +short
192.168.100.200
[vagrant@ose3-master ~]$ dig ose3-node2.example.com @10.0.2.3 +short
192.168.100.201
[vagrant@ose3-master ~]$ host $(hostname)
ose3-master.example.com has address 192.168.100.100
Host ose3-master.example.com not found: 3(NXDOMAIN)
Host ose3-master.example.com not found: 3(NXDOMAIN)

[vagrant@ose3-node2 ~]$ dig ose3-master.example.com @10.0.2.3 +short
192.168.100.100
[vagrant@ose3-node2 ~]$ host $(hostname)
ose3-node2.example.com has address 192.168.100.201
Host ose3-node2.example.com not found: 3(NXDOMAIN)
Host ose3-node2.example.com not found: 3(NXDOMAIN)
  • Registry creation still fails with a DNS error.
[vagrant@ose3-master ~]$ oc get pods
NAME                       READY     STATUS         RESTARTS   AGE
docker-registry-1-deploy   0/1       ExitCode:255   0          4m
[vagrant@ose3-master ~]$ oc logs docker-registry-1-deploy
F0726 02:41:00.845168       1 deployer.go:64] couldn't get deployment default/docker-registry-1: Get https://ose3-master.example.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1: dial tcp: lookup ose3-master.example.com: no such host

DNS Resolution In the Container

Looking closer at the failed container and it's DNS settings:

[vagrant@ose3-node2 ~]$ sudo docker ps -a
CONTAINER ID        IMAGE                              COMMAND                CREATED             STATUS                           PORTS               NAMES
cbb44b91a61c        openshift/origin-deployer:v1.0.3   "/usr/bin/openshift-   About an hour ago   Exited (255) About an hour ago                       k8s_deployment.25e680b6_docker-registry-1-deploy_default_382a8468-333f-11e5-b4c7-080027893417_da26b35b
dff43b80c38c        openshift/origin-pod:v1.0.3        "/pod"                 About an hour ago   Exited (0) About an hour ago                         k8s_POD.d324c42e_docker-registry-1-deploy_default_382a8468-333f-11e5-b4c7-080027893417_942b60fb

[vagrant@ose3-node2 ~]$ sudo docker inspect --format='{{.HostConfig.Dns}}' cbb44b91a61c
[10.0.2.15 10.0.2.3]

[vagrant@ose3-node2 ~]$ sudo docker inspect --format='{{.HostConfig.DnsSearch}}' cbb44b91a61c
[default.svc.cluster.local svc.cluster.local cluster.local home.bewley.net example.com]

[vagrant@ose3-node2 ~]$ sudo docker inspect --format='{{.ResolvConfPath}}' cbb44b91a61c
/var/lib/docker/containers/dff43b80c38c98c754be52a206b703db8499622a2251d4b10b110e0c4d8e28d2/resolv.conf

[vagrant@ose3-node2 ~]$ sudo cat /var/lib/docker/containers/dff43b80c38c98c754be52a206b703db8499622a2251d4b10b110e0c4d8e28d2/resolv.conf
nameserver 10.0.2.15
nameserver 10.0.2.3
search default.svc.cluster.local svc.cluster.local cluster.local home.bewley.net example.com
options ndots:5

?

  • Why is 10.0.2.15 listed? This is the enp0s3 int on each of the VMs. (i hadn't noticed they were all the same before...)
[vagrant@ose3-node2 ~]$ ip addr ls enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:89:34:17 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 79831sec preferred_lft 79831sec
    inet6 fe80::a00:27ff:fe89:3417/64 scope link
       valid_lft forever preferred_lft forever

[vagrant@ose3-node2 ~]$ dig ose3-master.example.com. @10.0.2.15 +short

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.2 <<>> ose3-master.example.com. @10.0.2.15 +short
;; global options: +cmd
;; connection timed out; no servers could be reached
  • How does the resolv.conf get defined for the container?
@gtseamus

This comment has been minimized.

Show comment
Hide comment
@gtseamus

gtseamus Sep 21, 2015

This is still a major issue. Is there any update here on how to fix? Been at this for weeks with no solution. The whole point of using things like vagrant is to be able to replicate amongst multiple others.

gtseamus commented Sep 21, 2015

This is still a major issue. Is there any update here on how to fix? Been at this for weeks with no solution. The whole point of using things like vagrant is to be able to replicate amongst multiple others.

@urashidmalik

This comment has been minimized.

Show comment
Hide comment

urashidmalik commented Nov 19, 2015

+1

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost commented Nov 19, 2015

+1

@tbielawa

This comment has been minimized.

Show comment
Hide comment
@tbielawa

tbielawa Nov 15, 2016

Member

We no longer support vagrant in openshift-ansible. The vagrant file was removed in #2654 Please see https://github.com/openshift/openshift-ansible-contrib for vagrant support.

Member

tbielawa commented Nov 15, 2016

We no longer support vagrant in openshift-ansible. The vagrant file was removed in #2654 Please see https://github.com/openshift/openshift-ansible-contrib for vagrant support.

@tbielawa tbielawa closed this Nov 15, 2016

@ianmiell

This comment has been minimized.

Show comment
Hide comment
@ianmiell

ianmiell commented Dec 1, 2016

I maintain a vagrant script for ansible documented here:

https://medium.com/@zwischenzugs/a-complete-openshift-cluster-on-vagrant-step-by-step-7465e9816d98#.xeiaw5zct

Code:
https://github.com/ianmiell/shutit-openshift-cluster

There's also a chef version (on chef branch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment