Support for bare-metal workers #113

ByteAlex · 2020-11-08T16:55:58Z

Hello,

is it possible to add servers from the Hetzner Robot to the cluster created with the CCM?

I've been using a K3s cluster which I bootstrapped manually and when I tried to install the hcloud CCM the hcloud:// provider was not working for all servers - whether they were Cloud or Robot servers.

Now I've bootstrapped a cluster using kubeadm and followed the instructions and the hcloud:// provider seems to be working, yet I still have my bare-metal servers and before I let them join my cluster and possibly destroy the CCM, I'd rather ask for clarification first.

My expectations would be:

Pods requesting a PVC can't schedule on robot-server
robot-servers will be added to Load-Balancers using "external ip"

Thank you!

malikkirchner · 2020-12-05T15:44:31Z

The bare metal support would be highly appreciated. A label, that causes CCM to ignore bare metal nodes, would be fine as an intermediate step. This would make CCM still functional and useful in the meantime.

LKaemmerling · 2020-12-07T14:39:00Z

Additional (already closed) issues:
#9
#5

There are a few problems with adding dedicated servers are real "nodes" to the k8s cluster.

Dedicated Servers have a completely different API, Using a Hetzner Cloud Token does not allow getting the data about a root server.
Based on the spec: https://kubernetes.io/docs/concepts/architecture/cloud-controller/#node-controller k8s deactivates all nodes that are not known from the cloud provider

We will look into how we can improve it, but I can not promise something.

ctodea · 2020-12-18T13:09:02Z

Any update on this?

batistein · 2021-01-30T16:22:31Z

@ctodea you can have a look to this https://github.com/cluster-api-provider-hcloud/hcloud-cloud-controller-manager

malikkirchner · 2021-03-18T12:39:41Z

@ctodea we managed to get a cluster working, where most nodes, including the master, are cloud servers. And some nodes are root servers, e.g. for databases. Basically the root servers should be mostly ignored by the CCM and CSI plugin. Maybe this helps:

You need to connect the root servers via vSwitch, though.

Maybe #172 results in a mainline solution ...

ctodea · 2021-03-18T13:15:08Z

Many thanks for the update @malikkirchner @batistein
Will give it a try, but unfortunately, I guess won't be any time soon.

identw · 2021-03-21T07:04:56Z

@ctodea we managed to get a cluster working, where most nodes, including the master, are cloud servers. And some nodes are root servers, e.g. for databases. Basically the root servers should be mostly ignored by the CCM and CSI plugin. Maybe this helps:

Hi @malikkirchner
I can see from the code that you are skipping creating routes for root servers because the API doesn't allow it (https://github.com/xelonic/hcloud-cloud-controller-manager/blob/root-server-support/hcloud/routes.go#L104). But I don't understand how pod-to-pod communication between cloud and dedicated nodes works for you.
For example:
10.240.0.2 - cloud node, 10.244.0.0/24 pod network on the cloud node
10.240.1.2 - dedicated node, 10.244.1.0/24 pod network on the dedicated node

But you can't create a route 10.240.1.0/24 via 10.240.1.2 in api. Then how will the communication between the pods of the 10.240.0.0/24 and 10.240.1.0/24 network work?

malikkirchner · 2021-03-21T17:47:26Z

Hi @identw,

that is an excellent point, I do not know and was wondering myself. According to #133 (comment) that should never have worked. We are using kubeadm to setup the cluster and Cilium as CNI plugin. I am happy to share the exact config, if you are interested.

I have two guesses how this can 'work'. Either the vSwitch does some routing, that I do not understand, or Cilium somehow manages to route to the root server. Leakage over the public device is ruled out by the root server's Hetzner firewall.

Though, it is possible, that this is a bug, that will be fixed and not work anymore, like #133. If so, I was wondering if it would make sense, to use a layer of wireguard peer-to-peer between all nodes, kinda as a unified substrate for cilium.

Any clarification on this topic is highly appreciated.

identw · 2021-03-22T05:07:23Z

@malikkirchner

that is an excellent point, I do not know and was wondering myself

Cilium uses an overlay network between nodes (vxlan or geneve) by default, maybe you haven't disabled it?
Check your cilium configmap. For example:

$ kubectl -n kube-system get cm cilium-config -o yaml | grep "tunnel"
  tunnel: vxlan

This configuration will work in any way, even without hetzner cloud networks and vswich.

I was wondering if it would make sense, to use a layer of wireguard peer-to-peer between all nodes, kinda as a unified substrate for cilium

For cilium, this is not necessary, since it already knows how to build tunnels between nodes and does it by default. If encryption is required then cilium supports ipsec (https://docs.cilium.io/en/v1.9/gettingstarted/encryption/).

Also I recommend paying attention to latency when connecting vswitch to the cloud:

ping from cloud node to dedicated node via public ip:

$ ping 135.181.96.131
PING 135.181.96.131 (135.181.96.131) 56(84) bytes of data.
64 bytes from 135.181.96.131: icmp_seq=1 ttl=59 time=0.442 ms
64 bytes from 135.181.96.131: icmp_seq=2 ttl=59 time=0.372 ms
64 bytes from 135.181.96.131: icmp_seq=3 ttl=59 time=0.460 ms
64 bytes from 135.181.96.131: icmp_seq=4 ttl=59 time=0.539 ms

ping from cloud node to same dedicated node via vswitch:

$ ping 10.240.1.2
PING 10.240.1.2 (10.240.1.2) 56(84) bytes of data.
64 bytes from 10.240.1.2: icmp_seq=1 ttl=63 time=47.4 ms
64 bytes from 10.240.1.2: icmp_seq=2 ttl=63 time=47.0 ms
64 bytes from 10.240.1.2: icmp_seq=3 ttl=63 time=46.9 ms
64 bytes from 10.240.1.2: icmp_seq=4 ttl=63 time=46.9 ms

~0.5ms via public network vs ~46.5ms via private network =(.

malikkirchner · 2021-03-22T07:23:10Z

@identw thank you for the hint, you are right, our Cilium uses vxlan as tunnel. That explains why it works. We deploy Istio on top of Cilium, I guess there is no real need for the Cilium encryption for us at the moment. As I understand enabling the Cilium encryption also conflicts with some features of Istio.

The ping from a cloud server to the dedicated server via vSwitch is not that bad for us:

# ping starfleet-janeway 
PING starfleet-janeway (10.0.1.2) 56(84) bytes of data.
64 bytes from starfleet-janeway (10.0.1.2): icmp_seq=1 ttl=63 time=3.70 ms
64 bytes from starfleet-janeway (10.0.1.2): icmp_seq=2 ttl=63 time=3.57 ms

Our cloud nodes are hosted in nbg1-dc3 and the dedicated server lives in fsn1-dc15. I guess that would be even better, if we moved the cloud nodes to Falkenstein.

FYI we encountered a problem with Cilium and systemd in Debian bullseye, buster is fine: cilium/cilium#14658.

identw · 2021-03-22T10:39:47Z

@malikkirchner

As I understand enabling the Cilium encryption also conflicts with some features of Istio.

I mentioned encryption because you wrote about wireguard. Encryption is optional

The ping from a cloud server to the dedicated server via vSwitch is not that bad for us:

Not so bad. I tested in the hel1 location (dedicated node from hel1-dc4, cloud node from hel1-dc2).

FYI we encountered a problem with Cilium and systemd in Debian bullseye, buster is fine: cilium/cilium#14658.

Thank you interesting. I really also use cilium without kube-proxy, but I have not seen this bug.

github-actions · 2021-05-21T13:06:51Z

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

Bessonov · 2021-05-21T15:01:03Z

further action occurs

Donatas-L · 2021-06-28T17:26:25Z

I saw that someone made a repo (https://github.com/identw/hetzner-cloud-controller-manager) to solve this, has anyone tried it?

randrusiak · 2021-08-17T10:45:01Z

Any updates here? @LKaemmerling are you going to implement support for root server soon?

hendrikkiedrowski · 2021-09-06T11:46:30Z

@Donatas-L I tried it. It works great with some tidbits. It would need a bit of attention from the community to keep track with the development of the Hetzner Team @LKaemmerling you may also want to have a look here. Maybe you can take this idea ;)

github-actions · 2021-11-05T12:56:33Z

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

acjohnson · 2021-11-16T18:25:28Z

I also am interested in using bare-metal workers via vSwitch and have it working with calico CNI. Any chance this could become mainlined in the hcloud-cloud-controller-manager?

wethinkagile · 2021-11-16T20:53:43Z

If we want to push the european cloud we need to push awesome Hetzner to push itself to grow above itself. This way many open source cloud projects and startups with GDPR and DSGVO compliant ISMS' will able to get founded in Europe. tl;dr yea I'm interested, too.

acjohnson · 2021-11-17T01:43:22Z

I went ahead and rebased the work that @malikkirchner did against master from this repo and built a new image with a few fixes that seemed to be required to use Hetzner Robot servers via vSwitch/Cloud Networks

src: https://github.com/acjohnson/hcloud-cloud-controller-manager/tree/root-server-support
image: https://hub.docker.com/r/acjohnson/hcloud-cloud-controller-manager

This seems to work next to perfectly with only a couple of transient messages in the cloud-controllers logs such as

I1117 01:31:27.718391       1 util.go:39] hcloud/getServerByName: server with name kube02 not found, are the name in the Hetzner Cloud and the node name identical?
E1117 01:31:27.718445       1 node_controller.go:245] Error getting node addresses for node "kube02": error fetching node by provider ID: hcloud/instances.NodeAddressesByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: , and error by node name: hcloud/instances.NodeAddresses: instance not found

...but otherwise load balancer creation works and ignores all nodes that have the instance.hetzner.cloud/is-root-server=true label set

I'd file a PR but this really isn't my work, just a few fixes on top of what y'all have already done.

Hoping something more legit will make its way into this repo but for now this will have to do.

acjohnson · 2021-11-18T23:07:19Z

@LKaemmerling would you consider reopening this issue as there is a fair bit of support for this feature and quite a bit of hacking that has gone into it already

malikkirchner · 2022-01-24T20:51:37Z

@acjohnson thank you for improving on Boris' change.

maaft · 2022-10-05T10:14:48Z

Uhm, why is this closed? Currently it does not work. What can I do please? Any step by step instructions how I can provision a LB connected to my 3 root servers?

batistein · 2022-10-05T10:16:31Z

use this one: https://github.com/syself/hetzner-cloud-controller-manager

batistein · 2022-10-05T10:17:19Z

It's already full integrated with: https://github.com/syself/cluster-api-provider-hetzner

maaft · 2022-10-05T10:17:50Z

Ah, yes. I've read about that CAPI a few days ago already. Thanks mate!

maaft · 2022-10-05T10:24:57Z

I'm getting Cloud provider could not be initialized: unknown cloud provider "hetzner" from the logs.

Any Idea how to fix this?

batistein · 2022-10-05T10:31:22Z

sounds like you have the wrong provider argument in the deployment... Did you only replaced the image? see: https://github.com/syself/hetzner-cloud-controller-manager/blob/master/deploy/ccm.yaml#L63

maaft · 2022-10-05T10:36:58Z

Well, after removing the "old" ccm, I installed the suggested one with:

kubectl apply -f https://github.com/syself/hetzner-cloud-controller-manager/releases/latest/download/ccm.yaml

Which contains:

containers:
        - image: quay.io/syself/hetzner-cloud-controller-manager:v1.13.0-0.0.1
          name: hcloud-cloud-controller-manager
          command:
            - "/bin/hetzner-cloud-controller-manager"
            - "--cloud-provider=hetzner"
            - "--leader-elect=false"
            - "--allow-untagged-cloud"

Any slack/discord channels available? Don't want to spam this issue here further.

batistein · 2022-10-05T10:38:58Z

kubernetes slack workspace channel #hetzner

LKaemmerling mentioned this issue Dec 5, 2020

vSwitch causes route creation to fail #133

Closed

github-actions bot added the stale label May 21, 2021

LKaemmerling added enhancement New feature or request and removed stale labels May 21, 2021

github-actions bot added the stale label Nov 5, 2021

github-actions bot closed this as completed Nov 10, 2021

Privatecoder mentioned this issue Mar 14, 2022

add external nodes to cluster vitobotta/hetzner-k3s#72

Closed

maaft mentioned this issue Nov 25, 2022

Bare Metal Support #330

Closed

mysticaltech mentioned this issue Dec 5, 2022

Hetzner Bare Metal server support kube-hetzner/terraform-hcloud-kube-hetzner#433

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for bare-metal workers #113

Support for bare-metal workers #113

ByteAlex commented Nov 8, 2020

malikkirchner commented Dec 5, 2020

LKaemmerling commented Dec 7, 2020

ctodea commented Dec 18, 2020

batistein commented Jan 30, 2021

malikkirchner commented Mar 18, 2021 •

edited

Loading

ctodea commented Mar 18, 2021

identw commented Mar 21, 2021 •

edited

Loading

malikkirchner commented Mar 21, 2021

identw commented Mar 22, 2021 •

edited

Loading

malikkirchner commented Mar 22, 2021 •

edited

Loading

identw commented Mar 22, 2021 •

edited

Loading

github-actions bot commented May 21, 2021

Bessonov commented May 21, 2021

Donatas-L commented Jun 28, 2021

randrusiak commented Aug 17, 2021

hendrikkiedrowski commented Sep 6, 2021

github-actions bot commented Nov 5, 2021

acjohnson commented Nov 16, 2021 •

edited

Loading

wethinkagile commented Nov 16, 2021

acjohnson commented Nov 17, 2021 •

edited

Loading

acjohnson commented Nov 18, 2021

malikkirchner commented Jan 24, 2022

maaft commented Oct 5, 2022

batistein commented Oct 5, 2022

batistein commented Oct 5, 2022

maaft commented Oct 5, 2022

maaft commented Oct 5, 2022

batistein commented Oct 5, 2022 •

edited

Loading

maaft commented Oct 5, 2022

batistein commented Oct 5, 2022

Support for bare-metal workers #113

Support for bare-metal workers #113

Comments

ByteAlex commented Nov 8, 2020

malikkirchner commented Dec 5, 2020

LKaemmerling commented Dec 7, 2020

ctodea commented Dec 18, 2020

batistein commented Jan 30, 2021

malikkirchner commented Mar 18, 2021 • edited Loading

ctodea commented Mar 18, 2021

identw commented Mar 21, 2021 • edited Loading

malikkirchner commented Mar 21, 2021

identw commented Mar 22, 2021 • edited Loading

malikkirchner commented Mar 22, 2021 • edited Loading

identw commented Mar 22, 2021 • edited Loading

github-actions bot commented May 21, 2021

Bessonov commented May 21, 2021

Donatas-L commented Jun 28, 2021

randrusiak commented Aug 17, 2021

hendrikkiedrowski commented Sep 6, 2021

github-actions bot commented Nov 5, 2021

acjohnson commented Nov 16, 2021 • edited Loading

wethinkagile commented Nov 16, 2021

acjohnson commented Nov 17, 2021 • edited Loading

acjohnson commented Nov 18, 2021

malikkirchner commented Jan 24, 2022

maaft commented Oct 5, 2022

batistein commented Oct 5, 2022

batistein commented Oct 5, 2022

maaft commented Oct 5, 2022

maaft commented Oct 5, 2022

batistein commented Oct 5, 2022 • edited Loading

maaft commented Oct 5, 2022

batistein commented Oct 5, 2022

malikkirchner commented Mar 18, 2021 •

edited

Loading

identw commented Mar 21, 2021 •

edited

Loading

identw commented Mar 22, 2021 •

edited

Loading

malikkirchner commented Mar 22, 2021 •

edited

Loading

identw commented Mar 22, 2021 •

edited

Loading

acjohnson commented Nov 16, 2021 •

edited

Loading

acjohnson commented Nov 17, 2021 •

edited

Loading

batistein commented Oct 5, 2022 •

edited

Loading