Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Connect service health checks not accessible? #9907

Open
evandam opened this issue Jan 27, 2021 · 11 comments
Open

Consul Connect service health checks not accessible? #9907

evandam opened this issue Jan 27, 2021 · 11 comments

Comments

@evandam
Copy link

evandam commented Jan 27, 2021

Nomad version

Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

Operating system and Environment details

Ubuntu 18.04

Issue

When running a service binding a port locally (ex 127.0.0.1:8080), it seems that Consul health checks cannot access them, and I'm unable to use options like expose or address_mode.

I would expect this to be a pretty common approach if I understand correctly (to avoid leaking ports that could be accessed outside of Consul Connect). Can the guides/docs add steps for health checks in https://www.nomadproject.io/docs/integrations/consul-connect?

Reproduction steps

Using the following job, try adding expose = true or address_mode = "driver" to the check and note the errors.

With expose = true:

❯ nomad job run debug/python_http.hcl
Error submitting job: Unexpected response code: 500 (error in job mutator expose-check: unable to determine local service port for service check app->python-http->python-http-health)

This happens even if I pass port = "8080" in the check configuration.

With address_mode = "driver":

The job is deployed, but the task fails with the following log:

failed to setup alloc: pre-run hook "group_services" failed: error getting address for check "python-http-health": cannot use address_mode="driver": no driver network exists

Job file (if appropriate)

job "python-http" {
  datacenters = ["kitchen"]

  group "app" {
    network {
      mode = "bridge"
      port "http" {}
    }

    task "python-http" {
      driver = "docker"

      config {
        image = "python:3"
        command = "python3"
        args = [
          "-m",
          "http.server",
          "-b",
          "127.0.0.1",
          "${NOMAD_PORT_http}",
        ]
      }

      env {
        PYTHONUNBUFFERED = "1"
      }

      resources {
        cpu = 20
        memory = 100
      }
    }

    service {
      name = "python-http"
      port = "http"

      check {
        type     = "http"
        name     = "python-http-health"
        path     = "/"
        interval = "10s"
        timeout  = "3s"
        # address_mode = "driver"
        # expose = "true"
      }

      connect {
        sidecar_service {}
      }
    }
  }
}
@evandam
Copy link
Author

evandam commented Jan 27, 2021

After a decent amount of trial and error, it looks like an issue with named ports instead of hard-coded ports.

I'm not sure if this is a bug or expected behavior, but it's certainly confusing. Any chance docs could capture this either way?

@tgross tgross added theme/consul theme/docs Documentation issues and enhancements stage/needs-investigation labels Jan 28, 2021
@idrennanvmware
Copy link
Contributor

idrennanvmware commented Jan 28, 2021

@evandam given you're running in mesh, is there a reason you aren't using hard coded ports? Since it's all internal there's no chance of conflict. Here's an example of how we're doing it

 group "<redacted>-group" {
    count = [[ .api.count ]]

    constraint {
      attribute = "${meta.general_compute_linux}"
      value     = "true"
    }
    
    network {
      mode = "bridge"
      port "exposed"{}
    }

    service {
      name         = "<redacted>"
      tags         = [ "http" ]
      port         = "9090"
      check {
        expose   = true
        type     = "http"
        port     = "exposed"
        path     = "/hc"
        interval = "10s"
        timeout  = "5s"
      }
      
      connect {
        sidecar_service {
          proxy {}
        }
      }
    }

and our task (snipped)

task "<redacted>" {
   driver = "docker"
  
   config {
     image        = "<redacted>"
     volumes      = [
       "local/overrides:/app/overrides"
     ]
     cpu_hard_limit = true
   }

   env {
     ASPNETCORE_URLS         = "http://+:9090"
   }

   resources {
     cpu    = [[ .api.resources.cpu ]] # Mhz
     memory = [[ .api.resources.memory ]] # MB
   }
 }
}

@evandam
Copy link
Author

evandam commented Jan 28, 2021

Hey @idrennanvmware, after learning this was the issue there's not necessarily a requirement to use named ports, but generally I like using them for readability. I also wouldn't have expected the behavior to be different when using named/hard-coded ports, so it just seems like a point of confusion.

@krishicks
Copy link
Contributor

Hey @evandam! Thanks for raising the issue.

What do you think about the following update?

The port in the service stanza is the port the API service listens on. The
Envoy proxy will automatically route traffic to that port inside the network
-namespace.
+namespace. Note that this cannot be a named port; it must be a hard-coded port
+value.

@krishicks krishicks self-assigned this Feb 8, 2021
@evandam
Copy link
Author

evandam commented Feb 8, 2021

Sounds good to me, thanks!

@xeroc
Copy link

xeroc commented Mar 10, 2021

This explains my issue here.

Thanks for making it clear

@tgross tgross moved this from In Progress to Needs Triage in Nomad - Community Issues Triage Mar 18, 2021
tgross pushed a commit that referenced this issue Mar 24, 2021
@tgross tgross added type/enhancement and removed theme/docs Documentation issues and enhancements labels Mar 24, 2021
@tgross tgross removed this from Needs Triage in Nomad - Community Issues Triage Mar 24, 2021
@tgross
Copy link
Member

tgross commented Mar 24, 2021

#10225 will fix the docs, and I'm going to keep this issue open as a feature request to fix.

@mircea-c
Copy link

Any timeline on this fix at the moment? It's a real pain not being able to use dynamic ports in service definitions.

@Oloremo
Copy link
Contributor

Oloremo commented Jul 15, 2022

Any updates on this?

@bradydean
Copy link

I've noticed that dynamic port labels can be used without causing any errors (granted I still have errors, but I think they're unrelated). Is this expected now?

@ElectroTiger
Copy link

ElectroTiger commented Oct 9, 2023

As of October 2023, the workaround documented here seems to enable usage of dynamic ports: https://discuss.hashicorp.com/t/port-mapping-with-nomad-and-consul-connect/16738/5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants