Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] host.k3d.internal breaks on system reboot #1221

Open
mdshack opened this issue Feb 15, 2023 · 7 comments · Fixed by #1453
Open

[BUG] host.k3d.internal breaks on system reboot #1221

mdshack opened this issue Feb 15, 2023 · 7 comments · Fixed by #1453
Labels
bug Something isn't working priority/high
Milestone

Comments

@mdshack
Copy link

mdshack commented Feb 15, 2023

What did you do

  • How was the cluster created?

    • k3d cluster create [clustername] --volume ... --volume ... --registry-config [local path to registry config] --agents 1 --servers 1 --port 8081:8081@loadbalancer --port ... --port ... --port ... --k3s-arg --disable=traefik@server:0 --k3s-arg --disable=metrics-server@server:0
  • What did you do afterwards?

  1. Shell into the cluster agent: docker exec -it k3d-relay-agent-0 /bin/sh
  2. wget a running registry that exists on the host machine successfully: wget host.k3d.internal:5000
    [refer to Screenshots or terminal output -> Successful wget below]
  3. Restart host machine (an actual power cycle)
  4. Shell into the cluster agent: docker exec -it k3d-relay-agent-0 /bin/sh
  5. wget a running registry that exists on the host machine without success: wget host.k3d.internal:5000
    [refer to Screenshots or terminal output -> Unsuccessful wget below]
  6. Stop your cluster: k3d cluster stop [clustername]
  7. Start your cluster: k3d cluster start [clustername]
  8. Repeat steps 1-2 and ensure you get another successful wget

What did you expect to happen

Expect host.k3d.internal:5000 to be reachable on machine restart

Screenshots or terminal output

If applicable, add screenshots or terminal output (code block) to help explain your problem.

Successful wget

Connecting to host.k3d.internal:5000 (172.20.0.1:5000)
saving to 'index.html'
'index.html' saved

Unsuccessful wget

wget: bad address 'host.k3d.internal:5000'

Which OS & Architecture

  • output of k3d runtime-info
arch: x86_64
cgroupdriver: cgroupfs
cgroupversion: "1"
endpoint: /var/run/docker.sock
filesystem: extfs
name: docker
os: Ubuntu 20.04.5 LTS
ostype: linux
version: 20.10.23

Which version of k3d

  • output of k3d version
k3d version v5.4.6
k3s version v1.24.4-k3s1 (default)

Which version of docker

  • output of docker version and docker info
Client: Docker Engine - Community
 Version:           20.10.23
 API version:       1.41
 Go version:        go1.18.10
 Git commit:        7155243
 Built:             Thu Jan 19 17:36:25 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.23
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.10
  Git commit:       6051f14
  Built:            Thu Jan 19 17:34:14 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.15
  GitCommit:        5b842e528e99d4d4c1686467debf2bd4b88ecd86
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.10.0-docker)
  compose: Docker Compose (Docker Inc., v2.15.1)
  scan: Docker Scan (Docker Inc., v0.23.0)

Server:
 Containers: 11
  Running: 6
  Paused: 0
  Stopped: 5
 Images: 56
 Server Version: 20.10.23
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 5b842e528e99d4d4c1686467debf2bd4b88ecd86
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.15.0-60-generic
 Operating System: Ubuntu 20.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.42GiB
 Name: ....
 ID: ...
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: ...
 Registry: ...
 Labels:
 Experimental: false
 Insecure Registries:
 ...:5001
  ....:5002
  localhost:32000
  ....:5003
  ...:5000
  127.0.0.0/8
 Registry Mirrors:
  ...:5001/
 Live Restore Enabled: false
@mdshack mdshack added the bug Something isn't working label Feb 15, 2023
@jtele2
Copy link

jtele2 commented Feb 15, 2023

I am having the same problem. Was working fine up until about a week ago.

@bruciebruce
Copy link

bruciebruce commented Feb 15, 2023

I have also been hit with this issue. Cluster is up on AWS. Builds fine with all pods running. When EC2 instance is rebooted most of the pods enter a crash loop.

ubuntu@ip-10-1-1-102:~$ k3d runtime-info
arch: x86_64
cgroupdriver: cgroupfs
cgroupversion: "1"
endpoint: /var/run/docker.sock
filesystem: extfs
name: docker
os: Ubuntu 20.04.5 LTS
ostype: linux
version: 20.10.12

ubuntu@ip-10-1-1-102:~$ k3d version
k3d version v5.4.7
k3s version v1.25.6-k3s1 (default)

ubuntu@ip-10-1-1-102:~$ docker info
Client:
Context: default
Debug Mode: false

Server:
Containers: 5
Running: 5
Paused: 0
Stopped: 0
Images: 3
Server Version: 20.10.12
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.15.0-1028-aws
Operating System: Ubuntu 20.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 61.81GiB
Name: ip-10-1-1-102
ID: KB2D:4ZRM:F5IX:G6ZP:JVCV:ORCW:D3FO:GG4J:N5RH:ZJVR:J2QL:TAPO
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

image

@bruciebruce
Copy link

bruciebruce commented Feb 27, 2023

Trying to use Kind instead.

@CormacLennon
Copy link

Has anyone found a workaround for this yet? its a pain starting and stopping the cluster to get this back. Any way we can query the k3d instance to find out what it ought to be?

@DanHorrocksBurgess
Copy link

Has anyone found a workaround for this yet? its a pain starting and stopping the cluster to get this back. Any way we can query the k3d instance to find out what it ought to be?

Did you happen to find a workaround for this?

We're experiencing the same problem, our developers locally have to stop and start the cluster every time they reboot their machines to fix DNS resolution.

@CormacLennon
Copy link

CormacLennon commented Sep 1, 2023

Has anyone found a workaround for this yet? its a pain starting and stopping the cluster to get this back. Any way we can query the k3d instance to find out what it ought to be?

Did you happen to find a workaround for this?

We're experiencing the same problem, our developers locally have to stop and start the cluster every time they reboot their machines to fix DNS resolution.

I wrote a powershell function for our developers to run to fix the issue

function Repair-ClusterCoreDns()
{
$servero = docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' k3d-energy-server-0
$serverlb = docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' k3d-energy-serverlb
$registry = docker inspect --format='{{with index .NetworkSettings.Networks \"k3d-energy\"}}{{.IPAddress}}{{end}}' k3d-myregistry.localhost
$hostK3dInternal = Get-HostK3dInternal

$ips = "|
    $hostK3dInternal host.k3d.internal
    $servero k3d-energy-server-0
    $serverlb k3d-energy-serverlb
    $registry k3d-myregistry.localhost
"
    $patch = 'data:
  NodeHosts: ' + $ips
    Write-Output "Adding following members to the coredns config map:"
    Write-Output $patch
    kubectl patch configmap/coredns -n kube-system --type merge --patch $patch
}
Set-Alias -Name fixdns -Value Repair-ClusterCoreDns -Force
Export-ModuleMember -Function Repair-ClusterCoreDns -Alias fixdns

function Get-HostK3dInternal()
{
  $hostIp = ""
  $dnsEntries = docker exec k3d-energy-tools /bin/sh -c "getent ahostsv4 host.k3d.internal"

  foreach ( $dnsEntry in $dnsEntries) {
      $chunks = $dnsEntry.Split(" ") | Where-Object {$_}

      if($chunks[2] -eq "host.k3d.internal")
      {
        $hostIp = $chunks[0]
      }
  }

  if($hostIp -eq "")
  {
    Write-Host 'FAILURE: Could not resolve host.k3d.internal, Please ensure k3d-energy-tools container is running'
  }
  return $hostIp
}

its not perfect but it works, the important line to figure out what the host.k3d.internal ip should be is

docker exec k3d-energy-tools /bin/sh -c "getent ahostsv4 host.k3d.internal"

Which I only figured out reading the source and, as its undocumented, is liable to change. but it works for now

@Lanchez
Copy link

Lanchez commented Sep 28, 2023

This is also a problem with local registries and it can easily be replicated. Local registries break when used from the cluster after a cluster restart.

  1. Create cluster with a local registry
  2. Check that coredns ConfigMap has proper entries
  3. Restart docker daemon
  4. Local registries break when accessed from the cluster and coredns ConfigMap is missing the entries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority/high
Projects
None yet
7 participants