Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following Tutorial In Readme.md #99

Closed
JulianHBuecher opened this issue Apr 28, 2022 · 20 comments
Closed

Following Tutorial In Readme.md #99

JulianHBuecher opened this issue Apr 28, 2022 · 20 comments

Comments

@JulianHBuecher
Copy link

Specs

  • Ubuntu 20.04.4 LTS (Focal Fossa)
  • Docker 20.10.14, build a224086
  • Docker-Compose 1.29.2, build 5becea4c
  • kind 0.12.0
  • containerlab 0.25.1
  • kvm ✅

Problem

Hi,

today I tried the tutorial found in the README.md. After several cleanups and restarts I did not get it to work. Every time creating the metal-core I got the following error:

deploy-partition | TASK [ansible-common/roles/systemd-docker-service : start service metal-core] ***
deploy-partition | changed: [leaf01]
deploy-partition | changed: [leaf02]
deploy-partition | 
deploy-partition | TASK [ansible-common/roles/systemd-docker-service : ensure service is started] ***
deploy-partition | ok: [leaf02]
deploy-partition | ok: [leaf01]
deploy-partition | 
deploy-partition | TASK [metal-roles/partition/roles/metal-core : wait for metal-core to listen on port] ***
deploy-partition | fatal: [leaf01]: FAILED! => changed=false 
deploy-partition |   elapsed: 300
deploy-partition |   msg: metal-core did not come up
deploy-partition | fatal: [leaf02]: FAILED! => changed=false 
deploy-partition |   elapsed: 300
deploy-partition |   msg: metal-core did not come up
deploy-partition | 
deploy-partition | PLAY RECAP *********************************************************************
deploy-partition | leaf01                     : ok=65   changed=47   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
deploy-partition | leaf02                     : ok=59   changed=43   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
deploy-partition | 
deploy-partition exited with code 2
docker exec vms /mini-lab/manage_vms.py --names machine01,machine02 create
Formatting '/machine01.img', fmt=qcow2 size=5368709120 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/machine02.img', fmt=qcow2 size=5368709120 cluster_size=65536 lazy_refcounts=off refcount_bits=16
QEMU 4.2.1 monitor - type 'help' for more information
(qemu) qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64 -name machine01 -uuid e0ab02d2-27cd-5a5e-8efc-080ba80cf258 -m 2G -boot n -drive if=virtio,format=qcow2,file=/machine01.img -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,file=/usr/share/OVMF/OVMF_VARS.fd -serial telnet:127.0.0.1:4000,server,nowait -enable-kvm -nographic -net nic,model=virtio,macaddr=aa:c1:ab:87:4e:82 -net nic,model=virtio,macaddr=aa:c1:ab:c1:29:2c -net tap,fd=30 30<>/dev/tap2 -net tap,fd=40 40<>/dev/tap3 &
qemu-system-x86_64 -name machine02 -uuid 2294c949-88f6-5390-8154-fa53d93a3313 -m 2G -boot n -drive if=virtio,format=qcow2,file=/machine02.img -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,file=/usr/share/OVMF/OVMF_VARS.fd -serial telnet:127.0.0.1:4001,server,nowait -enable-kvm -nographic -net nic,model=virtio,macaddr=aa:c1:ab:90:3a:db -net nic,model=virtio,macaddr=aa:c1:ab:46:52:e4 -net tap,fd=50 50<>/dev/tap4 -net tap,fd=60 60<>/dev/tap5 &
QEMU 4.2.1 monitor - type 'help' for more information
(qemu) qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@leaf01 -i files/ssh/id_rsa 'systemctl restart metal-core'
Warning: Permanently added 'leaf01,172.17.0.4' (ECDSA) to the list of known hosts.
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@leaf02 -i files/ssh/id_rsa 'systemctl restart metal-core'
Warning: Permanently added 'leaf02,172.17.0.3' (ECDSA) to the list of known hosts.

The error tells me, the host does not support a requested feature. I have found similar issues in other virtualization software like podman (see containers/podman#11479).

Is there something I missed during configuration of my machine or software?
Hopefully you could help me out here.
Best regards Julian

@Gerrit91
Copy link
Contributor

Hey Julian,

when the metal-core does not come up it is likely that it cannot reach the metal-api in the kind cluster. You can enter the leaf switch using make ssh-leaf01 and check the logs using journalctl -lu metal-core, which should show you an error.

Just a small assumption: I can imagine that sometimes name resolution on 0.0.0.0.nip.io is not working well on some distros. Personally, I add the following line to my host local /etc/hosts file to make name resolution more stable:

127.0.0.1	api.0.0.0.0.nip.io rethinkdb.0.0.0.0.nip.io

@Gerrit91
Copy link
Contributor

The error with host doesn't support requested feature should be irrelevant. It also happens on our CI runners (not sure if logs are visible for externals though): https://github.com/metal-stack/mini-lab/runs/6094031813?check_suite_focus=true#step:6:739

@majst01
Copy link
Contributor

majst01 commented Apr 28, 2022

Under some circumstances systemd-resolved does not resolv *.nip.io domains. Ubuntu uses systemd-resolved by default and this can be resolved by adding the entries @Gerrit91 mentioned in the hosts file.

@JulianHBuecher
Copy link
Author

Hi @Gerrit91 and hi @majst01,
thank you very much for your quick response.
I have adjusted my hostfile and give it a new run... for gods sake the metal-core does not come up...
This is the logs out of the leaf1.
Hopefully they could be of use for you:

Apr 28 20:24:49 mini-lab-leaf01 docker[15650]: {"level":"debug","timestamp":"2022-04-28T22:24:49+02:00","caller":"api/registerSwitch.go:72","msg":"skip interface, because only swp* switch ports are reported to m
etal-api","interface":"docker0","MAC":"02:42:78:01:87:47"}
Apr 28 20:24:49 mini-lab-leaf01 docker[15650]: {"level":"debug","timestamp":"2022-04-28T22:24:49+02:00","caller":"api/registerSwitch.go:64","msg":"skip interface, because it is contained in the blacklist","inter
face":"vniInternet","blacklist":["vniInternet"]}
Apr 28 20:24:49 mini-lab-leaf01 docker[15650]: {"level":"debug","timestamp":"2022-04-28T22:24:49+02:00","caller":"api/registerSwitch.go:72","msg":"skip interface, because only swp* switch ports are reported to m
etal-api","interface":"vlanInternet","MAC":"9a:3a:27:c3:b0:7c"}
Apr 28 20:24:49 mini-lab-leaf01 docker[15650]: {"level":"debug","timestamp":"2022-04-28T22:24:49+02:00","caller":"api/registerSwitch.go:72","msg":"skip interface, because only swp* switch ports are reported to m
etal-api","interface":"vrfInternet","MAC":"06:d5:27:be:6d:52"}

...
Apr 28 20:10:29 mini-lab-leaf01 docker[15616]: {"level":"error","timestamp":"2022-04-28T22:10:29+02:00","caller":"bus/eventbus.go:401","msg":"  1 [mini-lab-machine/core] error querying nsqlookupd (http://0.0.0.0
.nip.io:4161/lookup?topic=mini-lab-machine) - Get \"http://0.0.0.0.nip.io:4161/lookup?topic=mini-lab-machine\": dial tcp 172.17.0.1:4161: i/o timeout","stacktrace":"github.com/metal-stack/metal-lib/bus.bridgeNsq
LogToCoreLog\n\t/go/pkg/mod/github.com/metal-stack/metal-lib@v0.9.0/bus/eventbus.go:401\ngithub.com/metal-stack/metal-lib/bus.(*ConsumerRegistration).Output\n\t/go/pkg/mod/github.com/metal-stack/metal-lib@v0.9.0
/bus/eventbus.go:157\ngithub.com/nsqio/go-nsq.(*Consumer).log\n\t/go/pkg/mod/github.com/nsqio/go-nsq@v1.0.8/consumer.go:1169\ngithub.com/nsqio/go-nsq.(*Consumer).queryLookupd\n\t/go/pkg/mod/github.com/nsqio/go-n
sq@v1.0.8/consumer.go:474\ngithub.com/nsqio/go-nsq.(*Consumer).lookupdLoop\n\t/go/pkg/mod/github.com/nsqio/go-nsq@v1.0.8/consumer.go:397"}

Apr 28 20:10:31 mini-lab-leaf01 docker[15616]: {"level":"error","timestamp":"2022-04-28T22:10:31+02:00","caller":"bus/eventbus.go:401","msg":"  1 [mini-lab-machine/core] error querying nsqlookupd (http://0.0.0.0
.nip.io:4161/lookup?topic=mini-lab-machine) - Get \"http://0.0.0.0.nip.io:4161/lookup?topic=mini-lab-machine\": dial tcp 172.17.0.1:4161: i/o timeout","stacktrace":"github.com/metal-stack/metal-lib/bus.bridgeNsq
LogToCoreLog\n\t/go/pkg/mod/github.com/metal-stack/metal-lib@v0.9.0/bus/eventbus.go:401\ngithub.com/metal-stack/metal-lib/bus.(*ConsumerRegistration).Output\n\t/go/pkg/mod/github.com/metal-stack/metal-lib@v0.9.0
/bus/eventbus.go:157\ngithub.com/nsqio/go-nsq.(*Consumer).log\n\t/go/pkg/mod/github.com/nsqio/go-nsq@v1.0.8/consumer.go:1169\ngithub.com/nsqio/go-nsq.(*Consumer).queryLookupd\n\t/go/pkg/mod/github.com/nsqio/go-n
sq@v1.0.8/consumer.go:474\ngithub.com/nsqio/go-nsq.(*Consumer).lookupdLoop\n\t/go/pkg/mod/github.com/nsqio/go-nsq@v1.0.8/consumer.go:397"}

Is there another network interface I have to configure to solve this problem?

@GrigoriyMikhalkin
Copy link
Contributor

@JulianHBuecher
So you added this line 127.0.0.1 api.0.0.0.0.nip.io to /etc/hosts and you still see this error, correct?

@JulianHBuecher
Copy link
Author

JulianHBuecher commented Apr 28, 2022

Hi @GrigoriyMikhalkin
added it to the file, restart the systemd-resolved to make changes effective.
But still got the same error.

@Gerrit91
Copy link
Contributor

dial tcp 172.17.0.1:4161: i/o timeout indicates that your leaf docker containers (the ones running the grigoriymikh/sandbox docker image) did not start in the default bridge docker network. Can you please look up in which network the containers are running?

On my machine it looks like:

❯ docker inspect e9b2864b9190
...
            "Networks": {                                                                                                                                                                                          
                "bridge": {                                                                                                                                                                                        
                    "IPAMConfig": null,                                                                                                                                                                            
                    "Links": null,                                                                                                                                                                                 
                    "Aliases": null,                                                                                                                                                                               
                    "NetworkID": "7bff4afa1f1e81ce37105b2c281395686665a2d1ae835e805d55e8df48424ded",                                                                                                               
                    "EndpointID": "9b00ac866185ed062d0be0296c9a329b4fc82ca617d66b9020783c173585800f",                                                                                                              
                    "Gateway": "172.17.0.1",                                                                                                                                                                       
                    "IPAddress": "172.17.0.2",                                                                                                                                                                     
                    "IPPrefixLen": 16,     
...

The gateway address has to be 172.17.0.1 such that they can communicate with the Kind cluster (which is a known limitation of the lab and is briefly mentioned in the requirements section).

@JulianHBuecher
Copy link
Author

JulianHBuecher commented Apr 29, 2022

Hi Gerrit,
I looked up the network interfaces for both leaves and found the following:

> docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED          STATUS          PORTS                                                                                                                                              NAMES
4b6f85fc789e   grigoriymikh/sandbox:latest               "/usr/local/bin/igni…"   45 minutes ago   Up 45 minutes                                                                                                                                                      ignite-8257a712b51fdfb0
c75361e69f71   ghcr.io/metal-stack/mini-lab-vms:latest   "/mini-lab/vms_entry…"   45 minutes ago   Up 45 minutes                                                                                                                                                      vms
43b64ce7a0d5   grigoriymikh/sandbox:latest               "/usr/local/bin/igni…"   45 minutes ago   Up 45 minutes                                                                                                                                                      ignite-e45d2a58f4f8edd8
...

The first leaf:

> docker inspect 4b6f85fc789e
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "79797117b7911a8c7d2d9b34e5ab0dca082cea607e37d0f5a19c03a34f796466",
                    "EndpointID": "9636d45f7dc1af1516f54555413053fb4b6205f4eb7f519b4967521daa30761e",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.4",
                    "IPPrefixLen": 16,
...
                }
            }

And the second:

> docker inspect 43b64ce7a0d5
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "79797117b7911a8c7d2d9b34e5ab0dca082cea607e37d0f5a19c03a34f796466",
                    "EndpointID": "856f028b60b4cc236bb0d0863233a896990d534bd2a22f83e9243a8e3e0baf34",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
...
                }
            }

After checking the network list from docker, I assume the run both in the same network:

> docker network ls
NETWORK ID     NAME      DRIVER    SCOPE
79797117b791   bridge    bridge    local
43763f527aec   host      host      local
827bb2ff0a65   kind      bridge    local
8386d1eb5469   none      null      local

EDIT:
Additionally I tried to ping 172.18.0.2 (IP of metal-core-api) in kind Cluster from leaf1 (make ssh-leaf01) and it is not reachable. Possibly here is the problem.

@Gerrit91
Copy link
Contributor

Gerrit91 commented May 2, 2022

Thanks for looking it up. Looks good to me. I guess, then we'll need to inspect the other end of the line.

Are there any other error logs during the provisioning with Ansible? You can use eval $(make dev-env) to point your kubectl against the kind cluster. Are all pods running? Can you reach the metal-api in the kind cluster from your host machine with curl?

❯ k get po -A
NAMESPACE             NAME                                                        READY   STATUS      RESTARTS       AGE
ingress-nginx         ingress-nginx-controller-84589bbb6b-j989p                   1/1     Running     0              2m56s
kube-system           coredns-64897985d-2c7cf                                     1/1     Running     0              3m7s
kube-system           coredns-64897985d-gs9vj                                     1/1     Running     0              3m7s
kube-system           etcd-metal-control-plane-control-plane                      1/1     Running     0              3m21s
kube-system           kindnet-qtkmr                                               1/1     Running     0              3m7s
kube-system           kube-apiserver-metal-control-plane-control-plane            1/1     Running     0              3m23s
kube-system           kube-controller-manager-metal-control-plane-control-plane   1/1     Running     0              3m23s
kube-system           kube-proxy-tcn9h                                            1/1     Running     0              3m7s
kube-system           kube-scheduler-metal-control-plane-control-plane            1/1     Running     0              3m21s
local-path-storage    local-path-provisioner-5ddd94ff66-94tkb                     1/1     Running     0              3m7s
metal-control-plane   ipam-db-0                                                   2/2     Running     1 (2m4s ago)   2m28s
metal-control-plane   masterdata-api-65d875cc48-gms6v                             1/1     Running     0              90s
metal-control-plane   masterdata-db-0                                             2/2     Running     1 (111s ago)   2m27s
metal-control-plane   metal-api-6fb848c8b4-6kngj                                  1/1     Running     0              90s
metal-control-plane   metal-api-create-masterdata-t62zd                           0/1     Completed   0              78s
metal-control-plane   metal-api-initdb-ftdlm                                      0/1     Completed   0              2m24s
metal-control-plane   metal-api-liveliness-27524554-l6klz                         0/1     Completed   0              22s
metal-control-plane   metal-api-migrate-db-rzmnd                                  0/1     Completed   0              80s
metal-control-plane   metal-db-0                                                  2/2     Running     1 (116s ago)   2m29s
metal-control-plane   nsq-lookupd-5bffdc656f-ktb7p                                1/1     Running     0              2m31s
metal-control-plane   nsqd-0                                                      2/2     Running     0              2m31s
❯ curl http://api.0.0.0.0.nip.io:8080/metal/v1/health                                                                                                                                                     08:33:08
{
 "status": "healthy",
 "message": "",
 "services": {
  "rethinkdb": {
   "status": "healthy",
   "message": ""
  }
 }
}

@JulianHBuecher
Copy link
Author

Hi @Gerrit91,
thank you for reaching out to me.

So, I evaluated the cluster and the API. Here are the following outputs:

$ kubectl get po -A
NAMESPACE             NAME                                                        READY   STATUS      RESTARTS      AGE
ingress-nginx         ingress-nginx-controller-84589bbb6b-hnnps                   1/1     Running     0             12m
kube-system           coredns-64897985d-9msnb                                     1/1     Running     0             13m
kube-system           coredns-64897985d-phs8x                                     1/1     Running     0             13m
kube-system           etcd-metal-control-plane-control-plane                      1/1     Running     0             13m
kube-system           kindnet-xl46l                                               1/1     Running     0             13m
kube-system           kube-apiserver-metal-control-plane-control-plane            1/1     Running     0             13m
kube-system           kube-controller-manager-metal-control-plane-control-plane   1/1     Running     0             13m
kube-system           kube-proxy-vvld9                                            1/1     Running     0             13m
kube-system           kube-scheduler-metal-control-plane-control-plane            1/1     Running     0             13m
local-path-storage    local-path-provisioner-5ddd94ff66-2cg68                     1/1     Running     0             13m
metal-control-plane   ipam-db-0                                                   2/2     Running     1 (11m ago)   12m
metal-control-plane   masterdata-api-65d875cc48-jjmvm                             1/1     Running     0             10m
metal-control-plane   masterdata-db-0                                             2/2     Running     1 (11m ago)   12m
metal-control-plane   metal-api-6fb848c8b4-ccj5b                                  1/1     Running     0             10m
metal-control-plane   metal-api-create-masterdata-fn7r6                           0/1     Completed   0             10m
metal-control-plane   metal-api-initdb-8g6vt                                      0/1     Completed   0             12m
metal-control-plane   metal-api-liveliness-27525338-922br                         0/1     Completed   0             55s
metal-control-plane   metal-api-migrate-db-mlsdc                                  0/1     Completed   0             10m
metal-control-plane   metal-db-0                                                  2/2     Running     1 (11m ago)   12m
metal-control-plane   nsq-lookupd-5bffdc656f-b9rfp                                1/1     Running     0             12m
metal-control-plane   nsqd-0                                                      2/2     Running     0             12m

... and executing the curl:

$ curl http://api.0.0.0.0.nip.io:8080/metal/v1/health
{
 "status": "healthy",
 "message": "",
 "services": {
  "rethinkdb": {
   "status": "healthy",
   "message": ""
  }
 }

... I couldn't believe it.
So I thought, what could I do next... Possibly the ingress could have a problem or something like this. Then I found this very interesting log in the dump:

$ kubectl logs -n ingress-nginx -f ingress-nginx-controller-84589bbb6b-hnnps
W0502 19:28:09.745882      14 controller.go:422] Error getting Service "metal-control-plane/metal-console": no object matching key "metal-control-plane/metal-console" in local store
172.18.0.1 - - [02/May/2022:19:28:16 +0000] "GET /metal/v1/health HTTP/1.1" 200 121 "-" "ansible-httpget" 139 0.001 [metal-control-plane-metal-api-8080] [] 10.244.0.18:8080 121 0.004 200 c13da215c2999d73fe38334171cb94eb
I0502 19:28:28.673237      14 status.go:299] "updating Ingress status" namespace="metal-control-plane" ingress="control-plane-ingress" currentValue=[] newValue=[{IP:10.96.234.61 Hostname: Ports:[]}]
I0502 19:28:28.681374      14 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"metal-control-plane", Name:"control-plane-ingress", UID:"f6c2653b-35c2-449c-b18d-9ac282822556", APIVersion:"networking.k8s.io/v1", ResourceVersion:"1377", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0502 19:28:28.681983      14 controller.go:422] Error getting Service "metal-control-plane/metal-console": no object matching key "metal-control-plane/metal-console" in local store
172.18.0.1 - - [02/May/2022:19:35:09 +0000] "GET /metal/v1/health HTTP/1.1" 200 121 "-" "curl/7.68.0" 102 0.001 [metal-control-plane-metal-api-8080] [] 10.244.0.18:8080 121 0.004 200 1b8aa90ed3d9662f0a61be005a4ee2ec

Is it possibly a problem the metal-console is not reachable by the ingress? Or is this the next activation step in the flow?

@Gerrit91
Copy link
Contributor

Gerrit91 commented May 3, 2022

Okay, so your control plane also looks fine. Running a bit out of options, but I am pretty sure we'll find it, so thanks for hanging in. 👍

The metal-console is just an optional component that does not function for the mini-lab because it requires a BMC for accessing a machine's serial console. The service should not be added to the ingress config, but it shouldn't harm either. I created a pull request in order to clean this up: #100.

Can you also reach the metal-api from a leaf switch?

❯ make ssh-leaf01
ssh -o StrictHostKeyChecking=no -i files/ssh/id_rsa root@leaf01
Last login: Tue May  3 06:41:52 2022 from 0.0.0.0.nip.io
root@mini-lab-leaf01:mgmt-vrf:~# curl http://api.0.0.0.0.nip.io:8080/metal/v1/health
{
 "status": "healthy",
 "message": "",
 "services": {
  "rethinkdb": {
   "status": "healthy",
   "message": ""
  }
 }
}

Also the following request (failing endpoint from your metal-core logs) should return a response:

root@mini-lab-leaf01:mgmt-vrf:~# curl http://0.0.0.0.nip.io:4161
{"message":"NOT_FOUND"}

I hope that the docker-compose version does not cause any issues. I am running on v2.1.1, where I think they did a complete re-write from Python to Golang. I remember I had to make small changes to this project when docker-compose v2 was introduced.

@JulianHBuecher
Copy link
Author

JulianHBuecher commented May 3, 2022

Hi @Gerrit91,
I have to thank you for your patience and your ambitious help. Now we hit the problem. The leaf could not resolve the hostname...

root@mini-lab-leaf01:mgmt-vrf:~# curl http://api.0.0.0.0.nip.io:8080/metal/v1/health -v
* Hostname was NOT found in DNS cache
*   Trying 172.17.0.1...
# Same for
root@mini-lab-leaf01:mgmt-vrf:~# curl http://0.0.0.0.nip.io:4161 -v
* Rebuilt URL to: http://0.0.0.0.nip.io:4161/
* Hostname was NOT found in DNS cache
*   Trying 172.17.0.1...

So I've updated my Docker-Compose Installation to Version v2.3.3 and tried it again. For our sake with no effect (updated the files from docker-compose commands to docker compose). The same host resolution error appierd...

@majst01
Copy link
Contributor

majst01 commented May 4, 2022

Hi Julian,

out of curiosity, do you have a Mac ?

@Gerrit91
Copy link
Contributor

Gerrit91 commented May 4, 2022

It's indeed suspicious that reaching the host system through the Docker Gateway does not work as I think it's not so uncommon to do on a Linux machine for development purposes. On other operating systems this trick will not work (see here for Docker on Mac).

@JulianHBuecher
Copy link
Author

Hi @majst01,
I have a Mac, but it is not the environment I'm running mini-lab on. For this I have a DELL XPS 15 running Ubuntu in Dual-Boot Setup. For working I connect with RDP on it. With other Provisioning Providers like Tinkerbell in Combination of Vagrant, I have not figured out any problems so for :D

and Hi @Gerrit91, I did not get it either... Are you running the lab on your local machine or inside a VM? Maybe I should try it on a fresh installation inside VirtualBox or so

@Gerrit91
Copy link
Contributor

Gerrit91 commented May 4, 2022

Hi, I am running it on the local machine. VM is probably tough as it would require nested virtualization, which can quickly make things more complicated than expected.

Still have to think about why you cannot reach your host system through the Docker bridge. It would be interesting to see if it works for you with a minimal example. Something like:

❯ docker run -d --rm -it -p 5000:80 nginx:alpine  
❯ docker run --rm -it alpine wget -O- 172.17.0.1:5000
Connecting to 172.17.0.1:5000 (172.17.0.1:5000)
writing to stdout
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

@JulianHBuecher
Copy link
Author

JulianHBuecher commented May 4, 2022

Hi @Gerrit91,
as we would say in Germany, "hier liegt wohl der Hund begraben"...

To get access to the host network through the docker bridge I have to use the following command:

julian@Julian-XPS-15:~$ docker run --rm -it --network host alpine wget -O- 172.17.0.1:5000
Connecting to 172.17.0.1:5000 (172.17.0.1:5000)
writing to stdout
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and

Otherwise, the command hangs in host resolution:

julian@Julian-XPS-15:~$ docker run --rm -it  alpine wget -O- 172.17.0.1:5000
Connecting to 172.17.0.1:5000 (172.17.0.1:5000)
...

The direct call via the bridge network works fine...

julian@Julian-XPS-15:~$ docker run --rm -it alpine wget -O- 172.17.0.2:80
Connecting to 172.17.0.2:80 (172.17.0.2:80)
writing to stdout
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>

Additionally I did a little bit of research and found an interesting StackOverflow article mentioning this kind of problematic. For reference see here https://stackoverflow.com/questions/31324981/how-to-access-host-port-from-docker-container
Possibly this could explain, why the call does not work.

Edit:
Additionally I tried the same experiment on my Windows Machine with Docker Desktop. Here this simple setup works as expected... Possibly there is a problem with my machine here. I could not explain it... or there are some tricks Docker Desktop do and the Linux runtime does not.

@Gerrit91
Copy link
Contributor

Gerrit91 commented May 6, 2022

Running the leaf switches in host network will not work. There is a lot of network things going on that you definitely do not want on your host system. In the stackoverflow issue you posted there is one suggestion regarding iptables. Have you tried this already?

@JulianHBuecher
Copy link
Author

Hi @Gerrit91,
yesterday I solved my problem... The reason for the misbehaviour of the communication between the containers and the leafs was based on firewall settings... In the process configuring my RDP client I enabled ufw and totally forgot it.
So yesterday I tried to disable it and voila the mini-lab and your little example worked again...

I have to apologise for all that hustle... next time I think twice before reaching out to you again... But now I could test it for my project. Thank you very much for your help guys. Really appreciate that.
Best regard, Julian

@Gerrit91
Copy link
Contributor

Gerrit91 commented May 6, 2022

You don't have to apologize. I am really happy that you want to try it out and shared the problem. Maybe it helps someone else, too. You are also invited to our metal-stack Slack channel when you have smaller questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants