Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible wrong Cloud-Init network-config generated file #51

Open
ricottatosta opened this issue Dec 28, 2023 · 28 comments
Open

Possible wrong Cloud-Init network-config generated file #51

ricottatosta opened this issue Dec 28, 2023 · 28 comments

Comments

@ricottatosta
Copy link

What steps did you take and what happened:
Sorry I submit this as a bug, but maybe it isn't.
When deploying a cluster with Talos as provider for bootstrap and controlplane, Talos' init process finds a cloud-init drive, but then complains about network-config file. Talos' error says: "network-config metadata version=0 is not supported", maybe because it starts with "network:". Is it supported? Reading the manual, it shouldn't.

cloud-init manual

What did you expect to happen:
Maybe cluster deployment should generate a network-config file starting without a top level "network:".

Environment:

  • Cluster-api-provider-proxmox version: 0.1.0
  • Kubernetes version: (use kubectl version): 1.28.3
  • OS (e.g. from /etc/os-release): Talos 1.5.5
@ricottatosta ricottatosta added the kind/bug Something isn't working label Dec 28, 2023
@mcbenjemaa mcbenjemaa added kind/support and removed kind/bug Something isn't working labels Dec 29, 2023
@mcbenjemaa
Copy link
Member

mcbenjemaa commented Dec 29, 2023

@ricottatosta

As I can see, you're trying to use Talos as a control-plane and bootstrap provider.
It's recommended that you submit this issue in talos-control-plane-provider

@ricottatosta
Copy link
Author

Thanks for changing my submission with type 'support'.
The issue involves cidata.iso created by capmox to pass configuration via cloud-init, and talos vm boot image that uses it at boot time.
Or am I supposed to believe that Talos bootstrap and controlplane are involved in creating cidata.iso?
Maybe the issue should be submitted in "talos os".

@mcbenjemaa
Copy link
Member

mcbenjemaa commented Dec 29, 2023

Hmmm,
CAPMOX Now only creates cidata supported by Debian and Ubuntu distributions.
If it's related to supporting Talos Cloud-init, you're right.

I don't know how Talos load the cloud-init iso,

If you know something, I can help you add support for Talos.

@ricottatosta
Copy link
Author

As far as I know, Talos expects cloud-init datasource to be in "nocloud" format. In this format, network-config file can't start with "network:".
Because of the presence of "network:", Talos doesn't find the top level key "version: 2" and assumes it to be "version: 0".
I can do nothing, that's the way you build cidata.iso and depends on how cloud-init expects data to be arranged in distributions like Debian and Ubuntu. You should find a way, either arranging data in a way that is good for all situations or let the user choose what kind of datasource format to produce in cidata.iso.
I know this means more effort. But I think Talos and CAPMOX are great together.

@ricottatosta
Copy link
Author

A couple of usefull links:
Talos NoCloud
Cloud-Init NoCloud datasource

@mcbenjemaa
Copy link
Member

Will check this out

@ricottatosta
Copy link
Author

This is the console output of Talos NoCloud booting image regarding the issue:

[talos] found config disk (cidata) at /dev/sr0
ISO 9660 Extensions: IEEE_P1282
[talos] fetching meta config from: cidata/meta-data
[talos] fetching network config from: cidata/network-config
[talos] fetching machine config from: cidata/user-data
[talos] restarting platform network config {"component": "controller-runtime", "controller": "network.PlatformConfigController", ..., "error": "network-config metadata version=0 is not supported"}

@ricottatosta ricottatosta changed the title Possible wrong Coud-Init network-config generated file Possible wrong Cloud-Init network-config generated file Jan 7, 2024
@mcbenjemaa
Copy link
Member

I will check this if it works with other netplans distros, and then I will implement it

@ricottatosta
Copy link
Author

I'm really looking forward to hearing good news from you.

@mcbenjemaa
Copy link
Member

I will test it on ubuntu but can you tested on Talos,

@ricottatosta
Copy link
Author

ricottatosta commented Jan 16, 2024

What am I supposed to test? Release 0.1.1?
Does it contain any changes in the code involved in the issue?

@mcbenjemaa
Copy link
Member

Currently, we support only Netplan-based distros,
We will try to add more support for other network-configs.

If someone wants to take effort and add this, it will be great.

@ricottatosta
Copy link
Author

OK. It was expected.
Unfortunately I'm not good at programming in Go. And the solution I found out is not suitable for being integrated in your code.
As my solution is quite simple (just delete a bunch of characters from a template string), I'll patch your code every time you release a new version.
Thank you for your support.

@mcbenjemaa
Copy link
Member

We will try to support this soon.

@ricottatosta
Copy link
Author

ricottatosta commented Jan 20, 2024

I would like to share my experience about using CAPMOX with TALOS.
This is the template I use in pkg/cloudinit/network.go:

version: 2
ethernets:
{{- range $index, $element := .NetworkConfigData }}
  eth{{ $index }}:
    match:
      macaddress: {{ $element.MacAddress }}
    dhcp4: false
    addresses:
    {{- if $element.IPAddress }}
      - {{ $element.IPAddress }}
    {{- end }}
    {{- if $element.IPV6Address }}
      - {{ $element.IPV6Address }}
    {{- end }}
  {{- if eq $index 0 }}
    {{- if $element.Gateway }}
    gateway4: {{ $element.Gateway }}
    {{- end }}
    {{- if $element.Gateway6 }}
    gateway6: {{ $element.Gateway6 }}
    {{- end }}
    {{- if $element.DNSServers }}
    nameservers:
      addresses:
      {{- range $element.DNSServers }}
        - {{ . }}
      {{- end -}}
    {{- end -}}
  {{- end -}}
{{- end -}}

TALOS dislikes defining static routes in place of gateways and perhaps even defining nameservers for each device.
For the rest, it works like a charm.

@mcbenjemaa
Copy link
Member

Thanks for sharing,
I guess the best solution for this is to support another version of network-config thats actually different from the netplan config.

@mcbenjemaa
Copy link
Member

We have released a new version.

@ricottatosta
Copy link
Author

ricottatosta commented Jan 28, 2024

I'm trying to make nocloud template string work like the netplan one as much as possible. But there is an issue.
If I omit to define a gateway, the injector refuses to build the cloudinit image complaining it wants it.
Is there a way to define only address and netmask for a network device?

@mcbenjemaa
Copy link
Member

Oh, no
The gateway is required

@mcbenjemaa
Copy link
Member

I will create new issue, so we can take this as a feature to support multiple cloud-init network-config.

However, DHCP will be included in the next release, i don't know if that helps you.

@mcbenjemaa
Copy link
Member

Issue is added

#94

@65278
Copy link
Collaborator

65278 commented Apr 24, 2024

I'm trying to make nocloud template string work like the netplan one as much as possible. But there is an issue. If I omit to define a gateway, the injector refuses to build the cloudinit image complaining it wants it. Is there a way to define only address and netmask for a network device?

One workaround I've found is to add an illegal gateway (169.254.255.254 for example). Cluster-api-provider-ipam-in-cluster will accept this, and netplan will ignore it when applying with a warning.
I have not tested this with cloud-init network config v1.

@isZumpo
Copy link
Contributor

isZumpo commented May 16, 2024

I have been trying to get the talos bootstrap provider and proxmox infrastructure provider together the whole morning. Running into the network-config metadata version=0 is not supported issue... So glad to finally find this github issue about it, meaning that I probably did nothing wrong in my configuration :)

Has there been any progress on the topic since January? Could possibly some sort of variable be added that makes it generate the cidata in the nocloud format such that it is compatible with talos?

@wikkyk
Copy link
Collaborator

wikkyk commented May 31, 2024

Unfortunately not. We don't use Talos and as such we can't commit to adding support for it. Patches welcome, of course, and we are more than happy to accept additional maintainers :-)

@ricottatosta
Copy link
Author

ricottatosta commented May 31, 2024

CAPMOX works well with TALOS. But it needs some patches to the CAPMOX's code.
Furthermore, it is possible to make your cluster "elastic", even self-managed (without an external cluster that creates and manages it).
All it needs to do is modifying network.go file located at pkg/cloudinit/ in the source code and rebuild docker image.
If something ready to use is needed, there is a docker image at https://hub.docker.com/r/ricottatosta/cluster-api-provider-proxmox already patched for TALOS. After deploying CAPMOX, patch CAPMOX deployment manifest and let its container image point at ricottatosta/cluster-api-provider-proxmox:[tag].
Last version (0.3.0) has the following patch applied:

...
const (
	/* network-config template. */
	networkConfigTPl = `version: 2
renderer: networkd
ethernets:
{{- range $index, $element := .NetworkConfigData }}
  eth{{ $index }}:
    match:
      macaddress: {{ $element.MacAddress }}
    dhcp4: {{ if $element.DHCP4 }}true{{ else }}false{{ end }}
    dhcp6: {{ if $element.DHCP6 }}true{{ else }}false{{ end }}
  {{- if or (and (not $element.DHCP4) $element.IPAddress) (and (not $element.DHCP6) $element.IPV6Address) }}
    addresses:
    {{- if $element.IPAddress }}
      - {{ $element.IPAddress }}
    {{- end }}
    {{- if $element.IPV6Address }}
      - '{{ $element.IPV6Address }}'
    {{- end }}
  {{- if eq $index 0 }}
    {{- if and $element.Gateway (not $element.DHCP4) }}
    gateway4: {{ $element.Gateway }}
    {{- end }}
    {{- if and $element.Gateway6 (not $element.DHCP6) }}
    gateway6: '{{ $element.Gateway6 }}'
    {{- end }}
    {{- if $element.DNSServers }}
    nameservers:
      addresses:
      {{- range $element.DNSServers }}
        - {{ . }}
      {{- end -}}
    {{- end -}}
  {{- end -}}
  {{- end -}}
{{- end -}}
{{- $vrf := 0 -}}
{{- range $index, $element := .NetworkConfigData }}
{{- if eq $element.Type "vrf" }}
{{- if eq $vrf 0 }}
vrfs:
{{- $vrf := 1 }}
{{- end }}
  {{$element.Name}}:
    table: {{ $element.Table }}
    {{- if $element.Routes }}{{ template "routes" $element }}{{- end -}}
    {{- if $element.FIBRules }}{{ template "rules" $element }}{{- end -}}
    {{- if $element.Interfaces }}
    interfaces:
    {{- range $element.Interfaces }}
      - {{ . }}
    {{- end -}}
    {{- end -}}
{{- end -}}
{{- end -}}
{{- define "rules" }}
    routing-policy:
    {{- range $index, $rule := .FIBRules }}
      - {
      {{- if $rule.To }} "to": "{{$rule.To}}", {{ end -}}
      {{- if $rule.From }} "from": "{{$rule.From}}", {{ end -}}
      {{- if $rule.Priority }} "priority": {{$rule.Priority}}, {{ end -}}
      {{- if $rule.Table }} "table": {{$rule.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
{{- define "routes" }}
    routes:
    {{- range $index, $route := .Routes }}
      - {
      {{- if $route.To }} "to": "{{$route.To}}", {{ end -}}
      {{- if $route.Via }} "via": "{{$route.Via}}", {{ end -}}
      {{- if $route.Metric }} "metric": {{$route.Metric}}, {{ end -}}
      {{- if $route.Table }} "table": {{$route.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
`
)
...

It's not tested against vrf. My use case is two ethernets, public and private.
As mentioned earlier, it works like a charm.

@isZumpo
Copy link
Contributor

isZumpo commented Jun 3, 2024

CAPMOX works well with TALOS. But it needs some patches to the CAPMOX's code. Furthermore, it is possible to make your cluster "elastic", even self-managed (without an external cluster that creates and manages it). All it needs to do is modifying network.go file located at pkg/cloudinit/ in the source code and rebuild docker image. If something ready to use is needed, there is a docker image at https://hub.docker.com/r/ricottatosta/cluster-api-provider-proxmox already patched for TALOS. After deploying CAPMOX, patch CAPMOX deployment manifest and let its container image point at ricottatosta/cluster-api-provider-proxmox:[tag]. Last version (0.3.0) has the following patch applied:

...
const (
	/* network-config template. */
	networkConfigTPl = `version: 2
renderer: networkd
ethernets:
{{- range $index, $element := .NetworkConfigData }}
  eth{{ $index }}:
    match:
      macaddress: {{ $element.MacAddress }}
    dhcp4: {{ if $element.DHCP4 }}true{{ else }}false{{ end }}
    dhcp6: {{ if $element.DHCP6 }}true{{ else }}false{{ end }}
  {{- if or (and (not $element.DHCP4) $element.IPAddress) (and (not $element.DHCP6) $element.IPV6Address) }}
    addresses:
    {{- if $element.IPAddress }}
      - {{ $element.IPAddress }}
    {{- end }}
    {{- if $element.IPV6Address }}
      - '{{ $element.IPV6Address }}'
    {{- end }}
  {{- if eq $index 0 }}
    {{- if and $element.Gateway (not $element.DHCP4) }}
    gateway4: {{ $element.Gateway }}
    {{- end }}
    {{- if and $element.Gateway6 (not $element.DHCP6) }}
    gateway6: '{{ $element.Gateway6 }}'
    {{- end }}
    {{- if $element.DNSServers }}
    nameservers:
      addresses:
      {{- range $element.DNSServers }}
        - {{ . }}
      {{- end -}}
    {{- end -}}
  {{- end -}}
  {{- end -}}
{{- end -}}
{{- $vrf := 0 -}}
{{- range $index, $element := .NetworkConfigData }}
{{- if eq $element.Type "vrf" }}
{{- if eq $vrf 0 }}
vrfs:
{{- $vrf := 1 }}
{{- end }}
  {{$element.Name}}:
    table: {{ $element.Table }}
    {{- if $element.Routes }}{{ template "routes" $element }}{{- end -}}
    {{- if $element.FIBRules }}{{ template "rules" $element }}{{- end -}}
    {{- if $element.Interfaces }}
    interfaces:
    {{- range $element.Interfaces }}
      - {{ . }}
    {{- end -}}
    {{- end -}}
{{- end -}}
{{- end -}}
{{- define "rules" }}
    routing-policy:
    {{- range $index, $rule := .FIBRules }}
      - {
      {{- if $rule.To }} "to": "{{$rule.To}}", {{ end -}}
      {{- if $rule.From }} "from": "{{$rule.From}}", {{ end -}}
      {{- if $rule.Priority }} "priority": {{$rule.Priority}}, {{ end -}}
      {{- if $rule.Table }} "table": {{$rule.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
{{- define "routes" }}
    routes:
    {{- range $index, $route := .Routes }}
      - {
      {{- if $route.To }} "to": "{{$route.To}}", {{ end -}}
      {{- if $route.Via }} "via": "{{$route.Via}}", {{ end -}}
      {{- if $route.Metric }} "metric": {{$route.Metric}}, {{ end -}}
      {{- if $route.Table }} "table": {{$route.Table}}, {{ end -}} }
    {{- end }}
{{- end -}}
`
)
...

It's not tested against vrf. My use case is two ethernets, public and private. As mentioned earlier, it works like a charm.

Thanks! This patch appears to be working. I am now getting past the point where it was complaining that: "network-config metadata version=0 is not supported" :)

On a related point, how are you setting up the initial network for the control plane using Talos and Proxmox? I am attempting to use the VIP solution built into Talos but it seems to not be working... If you don't mind it would be very nice to see an example of your TalosControlPlane and ProxmoxCluster objects

@ricottatosta
Copy link
Author

ricottatosta commented Jun 3, 2024

For networking I use Cilium without kube-proxy.
For VIP I use kube-vip in BGP mode as daemonset.
Following is what you asked for:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxCluster
metadata:
  name: k8s-test
  namespace: k8s-test
spec:
  allowedNodes:
    - pve1
    - pve2
    - pve3
  controlPlaneEndpoint:
    host: 10.100.150.150 (my VIP)
    port: 6443
  dnsServers:
    - 10.100.150.1
  ipv4Config:
    addresses:
      - 10.100.150.151-10.100.150.159
    gateway: 10.100.150.254
    prefix: 24


apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
  name: k8s-test-control-plane
  namespace: k8s-test
spec:
  controlPlaneConfig:
    controlplane:
      generateType: controlplane
      talosVersion: v1.6.1
      configPatches:
        - op: add
          path: /machine/network/extraHostEntries
          value:
            - ip: 127.0.0.1
              aliases:
                - kubernetes
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
    kind: ProxmoxMachineTemplate
    name: k8s-test-control-plane
  replicas: 1
  version: 1.28.3

@wikkyk
Copy link
Collaborator

wikkyk commented Jun 10, 2024

@ricottatosta Thank you for this - can you submit it as a PR, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants