Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network.RouteSpecController can't set IPv6 (SLAAC) gateway because of file exists #8558

Closed
Tracked by #8549
MindTooth opened this issue Apr 8, 2024 · 9 comments · Fixed by #8579
Closed
Tracked by #8549
Assignees

Comments

@MindTooth
Copy link
Contributor

MindTooth commented Apr 8, 2024

Bug Report

Description

I'm struggling with getting IPv6 to properly work. Specifically the gateway IP. On our platform we are using SLAAC for IPv6. The network.RouteSpecController controller complains about a file exists for the gateway IP. Error below.

A temporary fix is to reboot, but it's not always the case that it will be solved. Hence this issue.

I have the following customization:

machine:
 network:
    interfaces:
      - interface: eth0
        dhcp: true
        vip:
          ip: 10.10.10.100

Please let me know if I can provide some more. I'm a bit at loss and appreciate only help I can get.

Logs

10.10.10.52: user: warning: [2024-04-07T18:55:33.883943449Z]: [talos] controller failed {"component": "controller-runtime", "controller": "network.RouteSpecController", "error": "1 error occurred:\n\t* error adding route: netlink receive: file exists, message {Family:10 DstLength:0 SrcLength:0 Tos:0 Table:0 Protocol:4 Scope:0 Type:1 Flags:0 Attributes:{Dst:<nil> Src:<nil> Gateway:<gateway> OutIface:8 Priority:1024 Table:254 Mark:0 Pref:<nil> Expires:<nil> Metrics:<nil> Multipath:[]}}\n\n"}

Replaced gateway IP with <gateway> for privacy.

I can send a pcap log somewhere. Not keen on sharing private information as we have routable addresses.

Environment

  • Talos version: [talosctl version --nodes <problematic nodes>]
$ talosctl --talosconfig talosconfig version -n 10.10.10.51
Client:
	Tag:         v1.6.7
	SHA:         46c8ac10
	Built:       
	Go version:  go1.21.8 X:loopvar
	OS/Arch:     darwin/arm64
Server:
	NODE:        10.10.10.51
	Tag:         v1.6.7
	SHA:         46c8ac10
	Built:       
	Go version:  go1.21.8 X:loopvar
	OS/Arch:     linux/amd64
	Enabled:     RBAC
  • Kubernetes version: [kubectl version --short]
$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3
  • Platform: OpenStack
@MindTooth MindTooth changed the title network.RouteSpecController cam network.RouteSpecController can't set IPv6 gateway Apr 8, 2024
@MindTooth MindTooth changed the title network.RouteSpecController can't set IPv6 gateway network.RouteSpecController can't set IPv6 (SLAAC) gateway because of file exists Apr 8, 2024
@smira
Copy link
Member

smira commented Apr 8, 2024

Can you please provide the output of talosctl get routes -o yaml and talosctl get routespecs -o yaml?

If you want to keep it private, you can encrypt it to my key: https://github.com/smira.gpg

@MindTooth
Copy link
Contributor Author

MindTooth commented Apr 8, 2024

Can you please provide the output of talosctl get routes -o yaml and talosctl get routespecs -o yaml?

If you want to keep it private, you can encrypt it to my key: https://github.com/smira.gpg

Sure. Rename the .tgz file to .tar.gpg and the .png.tgz to .png.gpg. Seems GitHub has some restrictions on upload.

The png shows a package in Wireshark that has from my team been verified for being a correct announcement from the router.

route_logs.tgz
icmp6_134.png.tgz

@smira
Copy link
Member

smira commented Apr 10, 2024

I think I know what the problem is - the route priority is duplicate with v4 default route.

smira added a commit to smira/talos that referenced this issue Apr 10, 2024
Fixes siderolabs#8558

Similar fix is done for other platforms, but not OpenStack.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
@smira smira self-assigned this Apr 10, 2024
@MindTooth
Copy link
Contributor Author

Thank you. 🙏🏻

Ref. your milestone on the PR, this will not be backported for 1.7? If need be, I can of course compile myself a new image if that is the case. 😄

@smira
Copy link
Member

smira commented Apr 11, 2024

It will be backported to 1.7 and 1.6

smira added a commit to smira/talos that referenced this issue Apr 12, 2024
Fixes siderolabs#8558

Similar fix is done for other platforms, but not OpenStack.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit bfbd02a)
@MindTooth
Copy link
Contributor Author

Just to verify, this should be part of talos-v1.7.0-beta.1 right? Because, I still see the same problem. Seems a reboot is needed.

Screenshot 2024-04-18 at 09 35 44

@smira
Copy link
Member

smira commented Apr 18, 2024

@MindTooth I can't guess this way. If you hit an error, let's repeat the cycle - logs and support data.

@MindTooth
Copy link
Contributor Author

Of course. My bad. 😄

❯ talosctl --talosconfig talosconfig version -n 10.10.10.51
Client:
	Tag:         v1.6.7
	SHA:         46c8ac10
	Built:       
	Go version:  go1.21.8 X:loopvar
	OS/Arch:     darwin/arm64
Server:
	NODE:        10.10.10.51
	Tag:         v1.7.0-beta.1
	SHA:         77581447
	Built:       
	Go version:  go1.22.2
	OS/Arch:     linux/amd64
	Enabled:     RBAC

Route logs: route_logs_2.tgz - (decrypt first)

@smira
Copy link
Member

smira commented Apr 19, 2024

The route has same "bad" priority, I can't guess with limited data. Talos persists platform network config if it fails to download new one. If you need that to be fixed, please submit full information as a new issue:

  • OpenStack network metadata dump
  • Talos boot logs
  • Dump of the route specs

According to the tests, it should work.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants