Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad left pause-amd64 continaers alive if drain_on_shutdown is used #17299

Closed
suikast42 opened this issue May 23, 2023 · 5 comments · Fixed by #17455
Closed

nomad left pause-amd64 continaers alive if drain_on_shutdown is used #17299

suikast42 opened this issue May 23, 2023 · 5 comments · Fixed by #17455
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. stage/needs-investigation theme/driver/docker type/bug

Comments

@suikast42
Copy link
Contributor

I am on nomad 1.5.6

Everytime if I reboot the ubuntu 22.04 VM there are pause containers left.

This containers does not consume any mem or cpu.

I have gc active in the client config but that does not have an effect.

plugin "docker" {
  config {
    allow_privileged = false
    disable_log_collection  = false
#    volumes {
#      enabled = true
#      selinuxlabel = "z"
#    }
    infra_image = "{{nomad_infra_image}}"
    infra_image_pull_timeout ="30m"
    extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"]
    logging {
      type = "journald"
       config {
          labels-regex =".*"
       }
    }
    gc{
      container = true
      dangling_containers{
        enabled = true
      # period = "3m"
      # creation_grace = "5m"
      }
    }

  }
}
CONTAINER ID   IMAGE                                                      COMMAND                  CREATED             STATUS         PORTS     NAMES
8bbbdd9bd7cb   registry.cloud.private/suikast42/logunifier:0.1.1          "/logunifier -config…"   5 minutes ago       Up 5 minutes             logunifier-880a05ce-2dfe-ac5f-a3eb-c0fdcde6bcb6
229e3975f7c3   prom/blackbox-exporter:v0.24.0                             "/bin/blackbox_expor…"   6 minutes ago       Up 6 minutes             blackbox-task-9a8cf574-b7f5-23fc-ae46-44de47ae6e1c
a1ced0db67fe   10.21.21.41:5000/traefik:v2.10.1                           "/entrypoint.sh trae…"   6 minutes ago       Up 6 minutes             traefik-54665384-005c-a298-9ebd-6c4f110cbdee
ac00bc7b5962   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 6 minutes ago       Up 6 minutes             nomad_init_880a05ce-2dfe-ac5f-a3eb-c0fdcde6bcb6
a41df1e4a6ea   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 6 minutes ago       Up 6 minutes             nomad_init_9a8cf574-b7f5-23fc-ae46-44de47ae6e1c
581cb58b77a4   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 About an hour ago   Up 7 minutes             nomad_init_c1f6660d-bee3-bcea-1a52-43f8307bad07
63cb098bf749   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 14 hours ago        Up 7 minutes             nomad_init_e84b912c-50e6-ecec-9b2a-85e44ce4825f
e562b1afdf58   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 24 hours ago        Up 7 minutes             nomad_init_6a754458-08cf-0326-6b68-7bfecd44b95a
da3819a02e31   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 24 hours ago        Up 7 minutes             nomad_init_501de7f0-bd13-78b7-8967-02b91e62764d
bc4f4df0d861   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 31 hours ago        Up 7 minutes             nomad_init_cb2ace0d-4eac-45fa-17a8-902193f581f7
058219c80f51   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 31 hours ago        Up 7 minutes             nomad_init_8fdf7135-cd79-2175-5512-0cdfc6698806
9bdb7cf5373f   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 32 hours ago        Up 7 minutes             nomad_init_2537b847-5ce5-f5cf-cde8-613344206a87
1dfc05610d67   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 32 hours ago        Up 7 minutes             nomad_init_82cf76dc-37fc-8226-6df8-7a98da3238ea
aadc899cdadc   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 3 days ago          Up 7 minutes             nomad_init_64d45e1d-2e71-a531-2823-073264ab8c91
7c270ed72430   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 5 days ago          Up 7 minutes             nomad_init_b5f25ba3-eece-6a12-ad3e-4a5a6ac1ddcf
cefd98ab6156   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 5 days ago          Up 7 minutes             nomad_init_7a274784-170d-a3a2-6a09-117f8b4aa51d
1a9d6c5b2291   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 6 days ago          Up 7 minutes             nomad_init_733ca4a4-a8b0-c21e-25d6-494f3811e94b
9dc5621ed2f6   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 7 days ago          Up 7 minutes             nomad_init_ddf2525a-9659-1160-6813-0f835a109f38
60cae637f900   registry.cloud.private/google_containers/pause-amd64:3.2   "/pause"                 10 days ago         Up 7 minutes             nomad_init_0c710e44-1916-6b69-f3c2-d4bb6ec92ba9
@suikast42
Copy link
Contributor Author

That's strange. The puase container gets a termintation signal after every rebott but keeps alive

image

@suikast42 suikast42 changed the title nomad left pause-amd64 continaers alive nomad left pause-amd64 continaers alive if drain_on_shutdown is used May 28, 2023
@suikast42
Copy link
Contributor Author

I am using the node drain machanisim

  drain_on_shutdown {
    deadline           = "1h"
    force              = false
    ignore_system_jobs = false
  }

The TimeOutSpec in systemd job is set to 1h as well.

Bu somehow the init containers alives after every node reboot.

I made an ugly workarround with a PostStart methon in nomad agent systemd job

#!/bin/bash
CONTAINER_IDS=$(docker ps | grep "amd64")
if [ -n "$CONTAINER_IDS" ]; then
    docker kill $(docker ps | grep "amd64" | awk '{ print $1 }')
fi

@shoenig shoenig self-assigned this Jun 2, 2023
@shoenig shoenig added theme/driver/docker stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Jun 2, 2023
@shoenig shoenig added this to Needs Triage in Nomad - Community Issues Triage via automation Jun 2, 2023
@shoenig
Copy link
Member

shoenig commented Jun 5, 2023

@suikast42 I haven't been able to reproduce what you're seeing. Can you paste more of the Client spec, in particular what you have for

leave_on_terminate = true
leave_on_interrupt = true

And then also include your systemd unit file for the Nomad Client agent.

When you reboot the VM, is that sending a signal to the Nomad agent?

@suikast42
Copy link
Contributor Author

So here are my config files

Systemd
   [Unit]
# When using Nomad with Consul it is not necessary to start Consul first. These
# lines start Consul before Nomad as an optimization to avoid Nomad logging
# that Consul is unavailable at startup.
Description=Nomad
Documentation=https://www.nomadproject.io/docs/
Wants=network-online.target,containerd.service,docker.service,consul.service
After=network-online.target,containerd.service,docker.service,consul.service



[Service]
ExecStartPre=/bin/bash -c '(while ! nc -z -v -w1 consul.service.consul 8501 2>/dev/null; do echo "Waiting for consul.service.consul 8501 to open..."; sleep 1; done); sleep 1'

# Nomad server should be run as the nomad user. Nomad clients
# should be run as root
User=root

Group=root



ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
# See issue https://github.com/hashicorp/nomad/issues/17299
# See issue https://github.com/suikast42/nomadder/issues/138
ExecStartPre=/etc/nomad.d/nomad_kill_pause_containers.sh
# nomad client have a active setting drain_on_shutdown
# this drains the node and mark it as ineligible.
# Make the node eligible again
ExecStartPost=systemctl restart nomad.eligtion.service
# Use node drain over client config drain_on_shutdown
# Enable this section if you disable the option drain_on_shutdown
#ExecStop=/etc/nomad.d/nomad_node_drain.sh

KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2

## Configure unit start rate limiting. Units which are started more than
## *burst* times within an *interval* time span are not permitted to start any
## more. Use `StartLimitIntervalSec` or `StartLimitInterval` (depending on
## systemd version) to configure the checking interval and `StartLimitBurst`
## to configure how many starts per interval are allowed. The values in the
## commented lines are defaults.

# StartLimitBurst = 5

## StartLimitIntervalSec is used for systemd versions >= 230
StartLimitIntervalSec = 10s

# drain_on_shutdown +  30s
TimeoutStopSec=2m30s
## StartLimitInterval is used for systemd versions < 230
# StartLimitInterval = 10s

TasksMax=infinity
#The default systemd configuration for Nomad should set OOMScoreAdjust=-1000 to avoid OOMing the Nomad process.
OOMScoreAdjust=-1000

[Install]
WantedBy=multi-user.target
agent conf
log_level = "DEBUG"
name = "worker-01"
datacenter = "nomadder1"
data_dir =  "/opt/services/core/nomad/data"
bind_addr = "0.0.0.0" # the default

leave_on_interrupt= true
#https://github.com/hashicorp/nomad/issues/17093
#systemctl kill -s SIGTERM nomad will suppress node drain if
#leave_on_terminate set to false
leave_on_terminate = true

advertise {
  # Defaults to the first private IP address.
  http = "10.21.21.42"
  rpc  = "10.21.21.42"
  serf = "10.21.21.42"
}
client {
  enabled = true
  network_interface = "eth1"
  meta {
    node_type= "worker"
    connect.log_level = "debug"
    connect.sidecar_image= "registry.cloud.private/envoyproxy/envoy:v1.26.2"
  }
  server_join {
    retry_join =  ["10.21.21.41"]
    retry_max = 0
    retry_interval = "15s"
  }
  # Either leave_on_interrupt or leave_on_terminate must be set
  # for this to take effect.
  drain_on_shutdown {
    deadline           = "2m"
    force              = false
    ignore_system_jobs = false
  }
  host_volume "ca_cert" {
    path      = "/usr/local/share/ca-certificates/cloudlocal"
    read_only = true
  }
  host_volume "cert_ingress" {
    path      = "/etc/opt/certs/ingress"
    read_only = true
  }
  ## Cert consul client
  ## Needed for consul_sd_configs
  ## Should be deleted after resolve https://github.com/suikast42/nomadder/issues/100
  host_volume "cert_consul" {
    path      = "/etc/opt/certs/consul"
    read_only = true
  }

  ## Cert consul client
  ## Needed for jenkins
  ## Should be deleted after resolve https://github.com/suikast42/nomadder/issues/100
  host_volume "cert_nomad" {
    path      = "/etc/opt/certs/nomad"
    read_only = true
  }

  ## Cert docker client
  ## Needed for jenkins
  ## Should be deleted after migrating to vault
  host_volume "cert_docker" {
    path      = "/etc/opt/certs/docker"
    read_only = true
  }

  host_network "public" {
    interface = "eth0"
    #cidr = "203.0.113.0/24"
    #reserved_ports = "22,80"
  }
  host_network "default" {
      interface = "eth1"
  }
  host_network "private" {
    interface = "eth1"
  }
  host_network "local" {
    interface = "lo"
  }

  reserved {
  # cpu (int: 0) - Specifies the amount of CPU to reserve, in MHz.
  # cores (int: 0) - Specifies the number of CPU cores to reserve.
  # memory (int: 0) - Specifies the amount of memory to reserve, in MB.
  # disk (int: 0) - Specifies the amount of disk to reserve, in MB.
  # reserved_ports (string: "") - Specifies a comma-separated list of ports to reserve on all fingerprinted network devices. Ranges can be specified by using a hyphen separating the two inclusive ends. See also host_network for reserving ports on specific host networks.
    cpu    = 1000
    memory = 2048
  }
  max_kill_timeout  = "1m"
}

tls {
  http = true
  rpc  = true

  ca_file   = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
  cert_file = "/etc/opt/certs/nomad/nomad.pem"
  key_file  = "/etc/opt/certs/nomad/nomad-key.pem"

  verify_server_hostname = true
  verify_https_client    = true
}

consul{
  ssl= true
  address = "127.0.0.1:8501"
  grpc_address = "127.0.0.1:8503"
  # this works only with ACL enabled
  allow_unauthenticated= true
  ca_file   = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
  grpc_ca_file   = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
  cert_file = "/etc/opt/certs/consul/consul.pem"
  key_file  = "/etc/opt/certs/consul/consul-key.pem"
}


telemetry {
  collection_interval = "1s"
  disable_hostname = true
  prometheus_metrics = true
  publish_allocation_metrics = true
  publish_node_metrics = true
}

plugin "docker" {
  config {
    allow_privileged = false
    disable_log_collection  = false
#    volumes {
#      enabled = true
#      selinuxlabel = "z"
#    }
    infra_image = "registry.cloud.private/google_containers/pause-amd64:3.2"
    infra_image_pull_timeout ="30m"
    extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"]
    logging {
      type = "journald"
       config {
          labels-regex =".*"
       }
    }
    gc{
      container = true
      dangling_containers{
        enabled = true
      # period = "3m"
      # creation_grace = "5m"
      }
    }

  }
}

How can I check which signal is sent form OS to systemd service ?

@shoenig
Copy link
Member

shoenig commented Jun 6, 2023

Ah I was finally able to reproduce @suikast42, thanks for the extra info. Not sure what the underlying problem is yet but at least I can investigate now.

shoenig added a commit that referenced this issue Jun 7, 2023
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationResource field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
shoenig added a commit that referenced this issue Jun 7, 2023
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationResource field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
shoenig added a commit that referenced this issue Jun 7, 2023
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
shoenig added a commit that referenced this issue Jun 8, 2023
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
Nomad - Community Issues Triage automation moved this from Needs Triage to Done Jun 9, 2023
shoenig added a commit that referenced this issue Jun 9, 2023
…#17455)

This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. stage/needs-investigation theme/driver/docker type/bug
Development

Successfully merging a pull request may close this issue.

2 participants