Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken NOMAD_HOST_PORT_<label> for "host" mode in Nomad 0.9 #5587

Closed
ole-lukoe opened this issue Apr 20, 2019 · 6 comments · Fixed by #5641
Closed

Broken NOMAD_HOST_PORT_<label> for "host" mode in Nomad 0.9 #5587

ole-lukoe opened this issue Apr 20, 2019 · 6 comments · Fixed by #5641

Comments

@ole-lukoe
Copy link

Nomad version

Nomad v0.9.0 (18dd590)

Operating system and Environment details

Ubuntu 16.04

Issue

I'm using ability to allocate random port numbers in "host" networking mode to bind docker containers on LAN interface. But with last release NOMAD_HOST_PORT_ variables are equal to 0.

Desired

$ set | grep NOMAD_HOST_PORT
NOMAD_HOST_PORT_http='27700'
NOMAD_HOST_PORT_tcp='26954'

It works in 0.8.7

Now

$ set | grep NOMAD_HOST_PORT
NOMAD_PORT_http='0'
NOMAD_PORT_tcp='0'

Job file (if appropriate)

task "statsd" { 
    driver = "docker"
    config {
        network_mode = "host"
        image = "prom/statsd-exporter"
        port_map {
            http = 9102
            tcp = 9125
        }
        args = [
            "--statsd.mapping-config=/statsd/statsd.conf",
            "--web.listen-address=${NODE_LOCAL_IP}:${NOMAD_HOST_PORT_http}",
            "--statsd.listen-tcp=${NODE_LOCAL_IP}:${NOMAD_HOST_PORT_tcp}",
        ]
    }

    template {
        data = <<EOH
        {{- with node }}
        NODE_LOCAL_IP="{{ .Node.Address }}"{{ end }}
        EOH
        destination = "secrets/file.env"
        env         = true
    }

    service {
        name = "statsd-web"
        port = "http"
    }
    resources {
        cpu    = 200
        memory = 256
        network {
            port "http" { }
            port "tcp" { }
        }
    }
}
@cgbaker
Copy link
Contributor

cgbaker commented Apr 22, 2019

Hi @ole-lukoe ,

I wasn't able to reproduce the issue with v0.9.0 (18dd590). The job spec above was failing because the /stasd/statsd.conf file was missing:

$ nomad job run repro.nomad
==> Monitoring evaluation "b1a70182"
    Evaluation triggered by job "repro"
    Allocation "f013a2e4" created: node "256030af", group "repro"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "b1a70182" finished with status "complete"

$ nomad alloc logs -stderr f013
...
time="2019-04-22T15:42:35Z" level=fatal msg="Error loading config:open /statsd/statsd.conf: no such file or directory" source="main.go:202"

I commented out that argument (--statsd.mapping-config) and re-ran the provided job spec. The docker container was up and running:

$ docker exec -ti 8b8 env | grep HOST_PORT
NOMAD_HOST_PORT_http=23829
NOMAD_HOST_PORT_tcp=23151

$ docker inspect 8b8 | jq '.[0].Config.Env[] | select(startswith("NOMAD_HOST_PORT"))'
"NOMAD_HOST_PORT_http=23829"
"NOMAD_HOST_PORT_tcp=23151"

$ docker inspect 8b8 | jq '.[0].Config.Cmd'
[
  "--web.listen-address=127.0.0.1:23829",
  "--statsd.listen-tcp=127.0.0.1:23151"
]

Can you please post the status of the allocation and the result of docker inspect on the running container? Also, the Now result pasted above doesn't look quite right... Maybe a copy-paste error?

@kpettijohn
Copy link

I have been playing with Nomad 0.9.0 and noticed similar behavior with the docker driver but in my case I am using the default network_mode. I suspect it's something with my Nomad client configuration and how its attempting to fingerprint IP addresses but that is more of a guess at this point.

Job file

job "test" {
  datacenters = ["pjsh"]
  type = "service"

  group "nginx" {
    task "httpsrv" {
      driver = "docker"
      config {
        image = "nginx"
        port_map {
          nginx = 80
        }
      }

      resources {
        cpu    = 100
        memory = 64
        network {
          mbits = 20
          port "nginx" {}
        }
      }
    }
  }
}

Job alloc

nomad run test.hcl 
==> Monitoring evaluation "4f1a9926"
    Evaluation triggered by job "test"
    Allocation "f9172b3f" created: node "c466ee7f", group "nginx"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "4f1a9926" finished with status "complete"

Verified the container is running.

nomad alloc logs f91
10.10.10.10 - - [24/Apr/2019:01:02:44 +0000] "GET / HTTP/1.1" 200 612 "-" "..." "-"

Docker inspect and exec output

Verify the correct container ID.

docker ps |grep f91
5f5a20faa650        nginx               "nginx -g 'daemon of…"   24 minutes ago      Up 24 minutes       10.10.10.31:23958->80/tcp, 10.10.10.31:23958->80/udp   httpsrv-f9172b3f-39b2-3e8a-dd27-a08aa65431b6

Docker exec env output.

docker exec -it 5f5a20faa650 env|grep -E 'HOST|PORT|IP|ADDR'
HOSTNAME=5f5a20faa650
NOMAD_ADDR_nginx=:0
NOMAD_HOST_PORT_nginx=0
NOMAD_IP_nginx=
NOMAD_PORT_nginx=0

Docker inspect output.

docker inspect 5f5a20faa650 | jq '.[0].Config.Env[] | select(startswith("NOMAD_IP","NOMAD_ADDR", "NOMAD_PORT", "NOMAD_HOST_PORT"))'
"NOMAD_ADDR_nginx=:0"
"NOMAD_HOST_PORT_nginx=0"
"NOMAD_IP_nginx="
"NOMAD_PORT_nginx=0"

Nomad client configuration

Container Linux by CoreOS stable (2023.5.0)

Nomad v0.9.0 (18dd590)

data_dir = "/var/lib/nomad"
bind_addr = "10.10.10.31"
datacenter = "pjsh"
log_level = "DEBUG"
client {
  enabled = true
  network_interface = "enp7s0"
}
consul {
  address = "127.0.0.1:8500"
}

telemetry {
  collection_interval = "1s"
  disable_hostname = true
  prometheus_metrics = true
  publish_allocation_metrics = true
  publish_node_metrics = true
}

Docker version:

docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.8
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:16:31 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:16:31 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Consul version:

consul version
Consul v1.4.4

@dansteen
Copy link

I had this problem as well when running the client version 0.9.1, but the server version 0.8.4. It resolved itself once I updated the server.

@kpettijohn
Copy link

@dansteen thanks for the info! Turns out I was running a 0.8.7 server and bumping to 0.9.1 resolved the issue as you noted. Thanks again!

@notnoop
Copy link
Contributor

notnoop commented May 2, 2019

Thanks for raising this and for the hint about 0.8.7 server! I was able to reproduce it and I aim to fix it soon, as we do want to support 0.9 clients against 0.8 servers to ease upgrades (we don't recommend this configuration for long though).

@notnoop notnoop self-assigned this May 2, 2019
notnoop pushed a commit that referenced this issue May 2, 2019
Fixes #5587

When a nomad 0.9 client is handling an alloc generated by a nomad 0.8
server, we should check the alloc.TaskResources for networking details
rather than task.Resources.

We check alloc.TaskResources for networking for other tasks in the task
group [1], so it's a bit odd that we used the task.Resources struct
here.  TaskRunner also uses `alloc.TaskResources`[2].

The task.Resources struct in 0.8 was sparsly populated, resulting to
storing of 0 in port mapping env vars:

```
vagrant@nomad-server-01:~$ nomad version
Nomad v0.8.7 (21a2d93+CHANGES)
vagrant@nomad-server-01:~$ nomad server members
Name                    Address      Port  Status  Leader  Protocol  Build  Datacenter  Region
nomad-server-01.global  10.199.0.11  4648  alive   true    2         0.8.7  dc1         global
vagrant@nomad-server-01:~$ nomad alloc status -json 5b34649b | jq '.Job.TaskGroups[0].Tasks[0].Resources.Networks'
[
  {
    "CIDR": "",
    "Device": "",
    "DynamicPorts": [
      {
        "Label": "db",
        "Value": 0
      }
    ],
    "IP": "",
    "MBits": 10,
    "ReservedPorts": null
  }
]
vagrant@nomad-server-01:~$ nomad alloc status -json 5b34649b | jq '.TaskResources'
{
  "redis": {
    "CPU": 500,
    "DiskMB": 0,
    "IOPS": 0,
    "MemoryMB": 256,
    "Networks": [
      {
        "CIDR": "",
        "Device": "eth1",
        "DynamicPorts": [
          {
            "Label": "db",
            "Value": 21722
          }
        ],
        "IP": "10.199.0.21",
        "MBits": 10,
        "ReservedPorts": null
      }
    ]
  }
}
```

Also, updated the test values to mimic how Nomad 0.8 structs are
represented, and made its result match the non compact values in
`TestEnvironment_AsList`.

[1] https://github.com/hashicorp/nomad/blob/24e9040b18a4f893e2f353288948a0f7cd9d82e4/client/taskenv/env.go#L624-L639
[2] https://github.com/hashicorp/nomad/blob/master/client/allocrunner/taskrunner/task_runner.go#L287-L303
notnoop pushed a commit that referenced this issue May 8, 2019
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants