Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to define http/s health checks for Network Loadbalancer #2708

Closed
jkroepke opened this issue Dec 19, 2017 · 28 comments · Fixed by #2906
Closed

Unable to define http/s health checks for Network Loadbalancer #2708

jkroepke opened this issue Dec 19, 2017 · 28 comments · Fixed by #2906
Labels
bug Addresses a defect in current functionality. regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement.
Milestone

Comments

@jkroepke
Copy link

Hi there,

Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.

Terraform Version

Terraform 0.11.1 with AWS Provider 1.6

Affected Resource(s)

Please list the resources as a list, for example:

  • aws_lb_target_group

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

resource "aws_lb_target_group" "tcp" {
 # ...
 protocol    = "TCP"
 # ...
 health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = "10"
    port                = "443"
    path                = "/healthz"
    protocol            = "HTTPS"
    interval            = 30
    matcher             = "200-399"
  }
}

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html. Please do NOT paste the debug output in the issue; just paste a link to the Gist.

Expected Behavior

Setup the expected network loadbalancer like in 1.5

Actual Behavior

* module.lb_internal_master.aws_lb_target_group.tcp: 1 error(s) occurred:

* module.lb_internal_master.aws_lb_target_group.tcp: arn:aws:elasticloadbalancing:eu-central-1:191844718867:targetgroup/openshift-tg-internal-master-443/652998c18664d76d: custom matcher is not supported for target_groups with TCP protocol

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

N/A

References

N/A

@gaelL
Copy link

gaelL commented Dec 19, 2017

Hi, Same issue here with the path.

Looking the Aws doc http://docs.aws.amazon.com/fr_fr/elasticloadbalancing/latest/APIReference/API_CreateTargetGroup.html

HTTP health check param path and matcher should be available for Network Load Balancers

Might be related to a6d1266

@jasonkuehl
Copy link

I just found this error myself.

Terraform 0.11.1

  • provider.aws: version = "~> 1.6"

Example code

resource "aws_lb_target_group" "testexternal" {
  name     = "testexternal"
  protocol = "TCP"
  port     = 22
  vpc_id      = "${aws_vpc.bla.id}"

  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }
}

resource "aws_lb" "testexternal" {
  name                        = "testserver"

  load_balancer_type          = "network"
  internal                    = false
  subnets                     = ["${module.subnet.ELB-subnet-ids}"]
  enable_deletion_protection  = true
}

resource "aws_lb_listener" "testexternal" {
  load_balancer_arn = "${aws_lb.testexternal.arn}"
  protocol          = "TCP"
  port              = "22"

  default_action {
    target_group_arn = "${aws_lb_target_group.testexternal.arn}"
    type             = "forward"
  }
}

resource "aws_lb_target_group_attachment" "testexternal" {
  target_group_arn = "${aws_lb_target_group.testexternal.arn}"
  target_id        = "${aws_instance.bla-002.id}"
  port             = 22
}

@meyertime
Copy link

I ran into this issue as well.

My theory is that it is checking the rules based on the target group protocol (which in this case is TCP) when it should be checking against the health check protocol (which is HTTPS). Both path and matcher are not applicable to TCP health checks, but they are to HTTP and HTTPS health checks, even if the target group's protocol is TCP.

I might also mention that even version 1.5 had some wonky behavior regarding health checks on network load balancers / TCP target groups. This is because network load balancers have certain restrictions on the health checks that application load balancers do not: The unhealthy threshold must equal the healthy threshold, timeout is fixed to 10 and cannot be changed, interval has only two possible values (10 and 30), and matcher (if applicable) is fixed to 200-399.

Unfortunately, terraform would still try to manage these parameters even if they weren't supplied. This led to errors in some cases. In others, it would work upon creation, but then a subsequent apply would detect changes to those parameters and attempt to fix them, resulting in errors. This made it necessary to specify all the parameters and make sure they were valid for network load balancers. For instance, matcher = "200-399" had to be specified in this case in order to avoid errors, even though matcher is always that value and cannot be changed. However, now in 1.6, terraform won't let you specify matcher. I haven't had a chance to see what happens in 1.6 when you don't specify a matcher, though, because in this case we need to specify a path, and that is not allowed now either. So we have to revert to 1.5 for now to work around this.

In any case, the rules will have to take into account not only the protocol of the target group (which determines whether it's an alb or nlb) but also the protocol of the health check.

@apparentlymart apparentlymart added bug Addresses a defect in current functionality. regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement. labels Dec 20, 2017
@apparentlymart
Copy link
Member

Hi all! Sorry for the regression here.

It seems that this is caused by the additional validation checks added in #2380. The goal of these changes was to catch more errors at plan time that were previously only caught at apply time, regarding the various subtle differences between application and network load balancers.

@deftflux is correct that the validation code is checking the target group protocol to recognize if a given target group is an application or network target group, but indeed it does seem like the health check protocol is the correct thing to check for this case to match, per the relevant API documentation which describes this particular property (Matcher) as being for "HTTP/HTTPS health checks" rather than for network load balancers in particular.

It seems that the same bug exists for HealthCheckPath. The docs also seem to disagree with our implementation about the timeout attribute, which we currently seem to permit only for HTTP/HTTPS target groups but the docs suggest it can work for all target groups but has a different range of valid values and different default depending on the healthcheck protocol.

@apparentlymart
Copy link
Member

After playing with this some more it seems like the current validation is correct here, per what's enforced by the underlying API. After weakening the check in the provider, I see the following error from the remote API during apply:

InvalidConfigurationRequest: Custom health check matchers are not supported for health checks for target groups with the TCP protocol

The logic I'd implemented -- based on the documentation -- was to allow custom health check matchers if the healthcheck protocol is HTTP, but it seems that there is an undocumented additional restriction that Matcher may not be set for TCP target groups, regardless of the healthcheck protocol.

However, I see that you all saw something working prior to this validation being added, so I'm now trying to figure out what the old implementation (prior to 1.6) was actually doing in this scenario that was allowing it to work.

@apparentlymart
Copy link
Member

As far as I can tell, this was only working before because it was totally ignoring these attributes:

https://github.com/terraform-providers/terraform-provider-aws/blob/840a82babd3ef0deed25ca7e06104f998577bbab/aws/resource_aws_lb_target_group.go#L218-L224

So while indeed this wasn't an error before, it seems like it was never actually working. In principle we could restore the previous behavior of just silently ignoring these arguments for TCP target groups, but that seems counter to Terraform's usually goal of doing what it says it will do or failing loudly if it can't.

Given that these attributes were not functional before anyway, I'd like to propose that we move forward with these additional checks in place (arguably it was a bug that these checks were not present before) and require removing these previously-non-functional attributes from configuration when upgrading to 1.6 and above. Of course we ideally would've noticed this change in behavior and included it in the 1.6 changelog, which we can do now retroactively although it won't be visible within the v1.6.0 tag's version of the changelog since that is now frozen.

Please let me know if any of you have a use-case where including these arguments even though they are ignored is important; we can then think about how we might strike a compromise to retain the now-more-correct validation while still making those use-cases work.

Sorry for the accidental undocumented compatibility break here! 😖

@meyertime
Copy link

Thanks for looking into this @apparentlymart !

I did a little testing, and there is still a problem with 1.6. Consider this test configuration:

resource "aws_lb_target_group" "foo" {
    name = "tf-nlb-health-check-test"
    protocol = "TCP"
    port = "1234"
    vpc_id = "${local.vpc}"

    health_check {
        protocol = "HTTPS"
        port = 12345
        #path = "/custom/path"
        #matcher = "200-399"
        interval = 30
        #timeout = 10
        healthy_threshold = 3
        unhealthy_threshold = 3
    }
}

The lines that are commented out are the ones that I used in 1.5 but that are considered invalid now in 1.6. In 1.5 with those lines uncommented, it works for both creation and subsequent plan and apply.

With those lines commented out, 1.6 will create the target group successfully. However, subsequent plan or apply will produce the following error:

aws_lb_target_group.foo: Refreshing state... (ID: arn:aws:elasticloadbalancing:us-east-1:...nlb-health-check-test/f37488d894f4b0a6)

Error: Error running plan: 1 error(s) occurred:

* aws_lb_target_group.foo: 1 error(s) occurred:

* aws_lb_target_group.foo: arn:aws:elasticloadbalancing:us-east-1:365567845318:targetgroup/tf-nlb-health-check-test/f37488d894f4b0a6: custom matcher is not supported for target_groups with TCP protocol

Apparently, terraform is detecting a change in the matcher, presumably because normally the default matcher is assumed by terraform to be 200, but for TCP target groups, AWS locks it to 200-399. That's why previously, explicitly specifying the correct default fixed the problem, but that workaround is no longer possible in 1.6. (However, it is strange that this error happens while refreshing the state rather than when applying a change.)

So it looks like these attributes are being correctly ignored when creating a TCP target group, but not when managing an existing TCP target group.

As far as requiring the removal of these attributes, I suppose it is a bit of a bug that we had to explicitly specify them previously. But we could still maintain backwards compatibility if we allow the only valid value to be specified as the commented lines above. I would definitely be in favor of at least fixing it so that we do not have to specify those attributes, however.

@arminbuerkle
Copy link

I'm currently experiencing the same problem.

I created an aws_lb_target_group in v1.5.0 with matcher = "200-399".
When i upgraded to v1.6.0 i get the following message, even after removing the matcher attribute:

custom matcher is not supported for target_groups with TCP protocol

When i downgrade to v1.5.0 again and remove the matcher attribute i get:

Error modifying Target Group: ValidationError: Health check matcher HTTP code cannot be empty

How is an upgrade suppose to happen with the current implementation?
Currently the only option i have is to stay on v1.5.0 with matcher set.

@meyertime
Copy link

You are right, @arminbuerkle. Currently in 1.6, there is no way to have an HTTP/S health check for a TCP target group without getting errors on subsequent plan or apply.

@apparentlymart
Copy link
Member

Thanks for that extra detail @deftflux, and sorry for the silence while I was on my holiday break.

Indeed it does seem like there is an issue here based on what you described. I suppose what's happening here is that we're reading back some server-provided defaults from the API that are then causing validation to fail.

Probably the best solution for that would be to add some extra checks to the Read implementation to force the relevant attributes to be saved as empty regardless of what the API returns, so that the values in state stay consistent with the empty values we now require in the configuration.

@jantman
Copy link

jantman commented Jan 8, 2018

I guess I may have gotten a bit lost in the discussion here, but I'm hitting this issue as well. Via both awscli and the Console UI, I can create an NLB with a HTTP health check against a custom path. Assuming the AWS Console UI is correct: a TCP target group can have a HTTP health check and the configurable properties on it are Path, Port and Healthy Threshold. Unhealthy Threshold, Timeout, Interval and Matcher ("Success Codes" in the UI) are grayed out and fixed values.

@whereisaaron
Copy link
Contributor

Hi @jantman yes, you are correct, you can create TCP Target Groups with HTTP health checks, and Terraform should be letting us too, but isn't right now.

What you can't do it create a TCP Target Group with and TCP health check, and then change the health check to HTTP (likewise for any protocol change even HTTP -> HTTPS). Changing the health check protocol requires destroying and recreating the Target Group.

Likewise changing unhealthy threshold, timeout, interval, and success codes (matcher), can be set on creation, but all require the Target Group to be recreated to change them (which seems harsh!).

Another note about 'interval', I note in the UI that for HTTP/HTTPS health checks you can freely set the interval, but for TCP health check you can only set 10 seconds or 30 seconds. That might be an API restriction too. Though I guess Terraform could leave that validation up to AWS.

@bflad
Copy link
Member

bflad commented Jan 12, 2018

This has been released in terraform-provider-aws version 1.7.0. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@bflad bflad added this to the v1.7.0 milestone Jan 12, 2018
@whereisaaron
Copy link
Contributor

PR #2906 lets you create HTTP/HTTPS health checks now 🎉, but it doesn't fully resolve validation. The protocol can be changed between HTTP and HTTPS, but changing to or from TCP should trigger a recreation plan for the Target Group.

* module.nlb.aws_lb_target_group.nlb[0]: 1 error(s) occurred:

* aws_lb_target_group.nlb.0: Error modifying Target Group: InvalidConfigurationRequest: You cannot change the health check protocol for a target group with the TCP protocol
        status code: 400, request id: 1234567-f7db-11e7-8d49-f37d7e7f4cf3

@iancward
Copy link
Contributor

iancward commented Apr 4, 2018

@whereisaaron I just ran into issue switch from TCP to HTTP/HTTPs (it needs to be re-created and terraform doesn't know that); was a separate issue ever opened to address that?

@whereisaaron
Copy link
Contributor

@iancward no sorry, I don't know any issue for this bug where recreation isn't triggered. I currently have to manually intervene.

@maneesh8
Copy link

@whereisaaron it doesnt let me change to a custom healthcheck with TCP protocol. While it lets me in AWS console. So i dont think this is something rejected by AWS api's

wking added a commit to wking/openshift-installer that referenced this issue Dec 16, 2018
As suggested by Dani Comnea [1].  When we switched to network load
balancers in 16dfbb3 (data/aws: use nlbs instead of elbs,
2018-11-01, openshift#594), we replaced things like:

  resource "aws_elb" "api_internal" {
    ...
    health_check {
      healthy_threshold   = 2
      unhealthy_threshold = 2
      timeout             = 3
      target              = "SSL:6443"
      interval            = 5
    }
    ...
  }

with:

  resource "aws_lb_target_group" "api_internal" {
    ...
    health_check {
      healthy_threshold   = 3
      unhealthy_threshold = 3
      interval            = 10
      port                = 6443
      protocol            = "TCP"
    }
  }

This resulted in logs like [2]:

  [core@ip-10-0-11-88 ~]$ sudo crictl ps
  CONTAINER ID        IMAGE                                                                                                                                           CREATED             STATE               NAME                    ATTEMPT
  1bf4870ea6eea       registry.svc.ci.openshift.org/openshift/origin-v4.0-2018-12-15-160933@sha256:97eac256dde260e8bee9a5948efce5edb879dc6cb522a0352567010285378a56   2 minutes ago       Running             machine-config-server   0
  [core@ip-10-0-11-88 ~]$ sudo crictl logs 1bf4870ea6eea
  I1215 20:23:07.088210       1 bootstrap.go:37] Version: 3.11.0-356-gb7ffe0c7-dirty
  I1215 20:23:07.088554       1 api.go:54] launching server
  I1215 20:23:07.088571       1 api.go:54] launching server
  2018/12/15 20:24:17 http: TLS handshake error from 10.0.20.86:28372: EOF
  2018/12/15 20:24:18 http: TLS handshake error from 10.0.20.86:38438: EOF
  2018/12/15 20:24:18 http: TLS handshake error from 10.0.47.69:26320: EOF
  ...

when the health check opens a TCP connection (in this case to the
machine-config server on 49500) and then hangs up without completing
the TLS handshake.  Network load balancers [3,4] do not have an analog
to the classic load balancers' SSL protocol [5,6,7], so we're using
HTTPS.

There's some discussion in [8] about the best way to perform
unauthenticated liveness checks on the Kubernetes API server.  For
now, I'm assuming that both 200 and 401 responses to /healthz requests
indicate a functional server, and we can evaluate other response
status codes as necessary.  Checking against a recent cluster:

  $ curl -i https://wking-api.devcluster.openshift.com:6443/healthz
  curl: (60) Peer's Certificate issuer is not recognized.
  More details here: http://curl.haxx.se/docs/sslcerts.html

  curl performs SSL certificate verification by default, using a "bundle"
   of Certificate Authority (CA) public keys (CA certs). If the default
   bundle file isn't adequate, you can specify an alternate file
   using the --cacert option.
  If this HTTPS server uses a certificate signed by a CA represented in
   the bundle, the certificate verification probably failed due to a
   problem with the certificate (it might be expired, or the name might
   not match the domain name in the URL).
  If you'd like to turn off curl's verification of the certificate, use
   the -k (or --insecure) option.
  $ curl -ik https://wking-api.devcluster.openshift.com:6443/healthz
  HTTP/1.1 200 OK
  Cache-Control: no-store
  Date: Sun, 16 Dec 2018 06:18:23 GMT
  Content-Length: 2
  Content-Type: text/plain; charset=utf-8

I don't know if the network load balancer health checks care about
certificate validity or not.  I guess we'll see how CI testing handles
this.

Ignition is only exposed inside the cluster, and checking that from a
master node:

  [core@ip-10-0-26-134 ~]$ curl -i https://wking-api.devcluster.openshift.com:49500/
  curl: (60) Peer's Certificate issuer is not recognized.
  More details here: http://curl.haxx.se/docs/sslcerts.html

  curl performs SSL certificate verification by default, using a "bundle"
   of Certificate Authority (CA) public keys (CA certs). If the default
   bundle file isn't adequate, you can specify an alternate file
   using the --cacert option.
  If this HTTPS server uses a certificate signed by a CA represented in
   the bundle, the certificate verification probably failed due to a
   problem with the certificate (it might be expired, or the name might
   not match the domain name in the URL).
  If you'd like to turn off curl's verification of the certificate, use
   the -k (or --insecure) option.
  [core@ip-10-0-26-134 ~]$ curl -ik https://wking-api.devcluster.openshift.com:49500/
  HTTP/1.1 404 Not Found
  Content-Type: text/plain; charset=utf-8
  X-Content-Type-Options: nosniff
  Date: Sun, 16 Dec 2018 06:30:14 GMT
  Content-Length: 19

  404 page not found

Unfortunately, setting matcher [9] is not allowed for network load
balancers (e.g. see [10,11]).  Setting it leads to errors like:

  ERROR  * module.vpc.aws_lb_target_group.api_internal: 1 error occurred:
  ERROR  * aws_lb_target_group.api_internal: Error creating LB Target Group: InvalidConfigurationRequest: Custom health check matchers are not supported for health checks for target groups with the TCP protocol
  ERROR  status code: 400, request id: 25a53d63-00fe-11e9-80c5-59885e191c9c

So I've left it unset here, and we'll just hope the 401s don't start
happening.

[1]: openshift#923
[2]: https://groups.google.com/d/msg/openshift-4-dev-preview/Jmt6AK0EJR4/Ed3W7yZyBQAJ
[3]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
[4]: https://www.terraform.io/docs/providers/aws/r/lb_target_group.html#protocol
[5]: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html
[6]: https://www.terraform.io/docs/providers/aws/r/elb.html#target
[7]: hashicorp/terraform-provider-aws#6866
[8]: kubernetes/kubernetes#43784
[9]: https://www.terraform.io/docs/providers/aws/r/lb_target_group.html#matcher
[10]: https://github.com/terraform-providers/terraform-provider-aws/pull/2906/files#diff-375aea487c27a6ada86edfd817ba2401R612
[11]: hashicorp/terraform-provider-aws#2708 (comment)
wking added a commit to wking/machine-config-operator that referenced this issue Dec 19, 2018
The server currently 404s the root path.  From a master:

  [core@ip-10-0-26-134 ~]$ curl -ik https://wking-api.devcluster.openshift.com:49500/
  HTTP/1.1 404 Not Found
  Content-Type: text/plain; charset=utf-8
  X-Content-Type-Options: nosniff
  Date: Sun, 16 Dec 2018 06:30:14 GMT
  Content-Length: 19

  404 page not found

but we need a reliable response on the range 200-399 to satisfy our
network load balancer health checks, which do not support configurable
response status codes [1,2] (these are Terraform links, but they
discuss an AWS restriction that is not Terraform-specific).  This
commit adds a /healthz endpoint which always 204s (when the server is
alive to handle it).

[1]: hashicorp/terraform-provider-aws#2708 (comment)
[2]: https://github.com/terraform-providers/terraform-provider-aws/pull/2906/files#diff-375aea487c27a6ada86edfd817ba2401R612
wking added a commit to wking/openshift-installer that referenced this issue Jan 9, 2019
As suggested by Dani Comnea [1].  When we switched to network load
balancers in 16dfbb3 (data/aws: use nlbs instead of elbs,
2018-11-01, openshift#594), we replaced things like:

  resource "aws_elb" "api_internal" {
    ...
    health_check {
      healthy_threshold   = 2
      unhealthy_threshold = 2
      timeout             = 3
      target              = "SSL:6443"
      interval            = 5
    }
    ...
  }

with:

  resource "aws_lb_target_group" "api_internal" {
    ...
    health_check {
      healthy_threshold   = 3
      unhealthy_threshold = 3
      interval            = 10
      port                = 6443
      protocol            = "TCP"
    }
  }

This resulted in logs like [2]:

  [core@ip-10-0-11-88 ~]$ sudo crictl ps
  CONTAINER ID        IMAGE                                                                                                                                           CREATED             STATE               NAME                    ATTEMPT
  1bf4870ea6eea       registry.svc.ci.openshift.org/openshift/origin-v4.0-2018-12-15-160933@sha256:97eac256dde260e8bee9a5948efce5edb879dc6cb522a0352567010285378a56   2 minutes ago       Running             machine-config-server   0
  [core@ip-10-0-11-88 ~]$ sudo crictl logs 1bf4870ea6eea
  I1215 20:23:07.088210       1 bootstrap.go:37] Version: 3.11.0-356-gb7ffe0c7-dirty
  I1215 20:23:07.088554       1 api.go:54] launching server
  I1215 20:23:07.088571       1 api.go:54] launching server
  2018/12/15 20:24:17 http: TLS handshake error from 10.0.20.86:28372: EOF
  2018/12/15 20:24:18 http: TLS handshake error from 10.0.20.86:38438: EOF
  2018/12/15 20:24:18 http: TLS handshake error from 10.0.47.69:26320: EOF
  ...

when the health check opens a TCP connection (in this case to the
machine-config server on 49500) and then hangs up without completing
the TLS handshake.  Network load balancers [3,4] do not have an analog
to the classic load balancers' SSL protocol [5,6,7], so we're using
HTTPS.

There's some discussion in [8] about the best way to perform
unauthenticated liveness checks on the Kubernetes API server that
suggests 401s are possible in some configurations.  Checking against a
recent cluster:

  $ curl -i https://wking-api.devcluster.openshift.com:6443/healthz
  curl: (60) Peer's Certificate issuer is not recognized.
  More details here: http://curl.haxx.se/docs/sslcerts.html

  curl performs SSL certificate verification by default, using a "bundle"
   of Certificate Authority (CA) public keys (CA certs). If the default
   bundle file isn't adequate, you can specify an alternate file
   using the --cacert option.
  If this HTTPS server uses a certificate signed by a CA represented in
   the bundle, the certificate verification probably failed due to a
   problem with the certificate (it might be expired, or the name might
   not match the domain name in the URL).
  If you'd like to turn off curl's verification of the certificate, use
   the -k (or --insecure) option.
  $ curl -ik https://wking-api.devcluster.openshift.com:6443/healthz
  HTTP/1.1 200 OK
  Cache-Control: no-store
  Date: Sun, 16 Dec 2018 06:18:23 GMT
  Content-Length: 2
  Content-Type: text/plain; charset=utf-8

  ok

I don't know if the network load balancer health checks care about
certificate validity or not.  I guess we'll see how CI testing handles
this.

Ignition is only exposed inside the cluster, and checking that from a
master node:

  [core@ip-10-0-26-134 ~]$ curl -i https://wking-api.devcluster.openshift.com:49500/
  curl: (60) Peer's Certificate issuer is not recognized.
  More details here: http://curl.haxx.se/docs/sslcerts.html

  curl performs SSL certificate verification by default, using a "bundle"
   of Certificate Authority (CA) public keys (CA certs). If the default
   bundle file isn't adequate, you can specify an alternate file
   using the --cacert option.
  If this HTTPS server uses a certificate signed by a CA represented in
   the bundle, the certificate verification probably failed due to a
   problem with the certificate (it might be expired, or the name might
   not match the domain name in the URL).
  If you'd like to turn off curl's verification of the certificate, use
   the -k (or --insecure) option.
  [core@ip-10-0-26-134 ~]$ curl -ik https://wking-api.devcluster.openshift.com:49500/
  HTTP/1.1 404 Not Found
  Content-Type: text/plain; charset=utf-8
  X-Content-Type-Options: nosniff
  Date: Sun, 16 Dec 2018 06:30:14 GMT
  Content-Length: 19

  404 page not found

So we're checking the new /healthz from
openshift/machine-config-operator@d0a7ae21 (server: Add /healthz,
2019-01-04, openshift/machine-config-operator#267) instead.

Unfortunately, setting matcher [9] is not allowed for network load
balancers (e.g. see [10,11]).  Setting it leads to errors like:

  ERROR  * module.vpc.aws_lb_target_group.api_internal: 1 error occurred:
  ERROR  * aws_lb_target_group.api_internal: Error creating LB Target Group: InvalidConfigurationRequest: Custom health check matchers are not supported for health checks for target groups with the TCP protocol
  ERROR  status code: 400, request id: 25a53d63-00fe-11e9-80c5-59885e191c9c

So I've left it unset here, and we'll just hope the 401s don't start
happening.

[1]: openshift#923
[2]: https://groups.google.com/d/msg/openshift-4-dev-preview/Jmt6AK0EJR4/Ed3W7yZyBQAJ
[3]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
[4]: https://www.terraform.io/docs/providers/aws/r/lb_target_group.html#protocol
[5]: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html
[6]: https://www.terraform.io/docs/providers/aws/r/elb.html#target
[7]: hashicorp/terraform-provider-aws#6866
[8]: kubernetes/kubernetes#43784
[9]: https://www.terraform.io/docs/providers/aws/r/lb_target_group.html#matcher
[10]: https://github.com/terraform-providers/terraform-provider-aws/pull/2906/files#diff-375aea487c27a6ada86edfd817ba2401R612
[11]: hashicorp/terraform-provider-aws#2708 (comment)
@MichalPloski
Copy link

Unfortunately problem persist for Terraform v0.11.11

resource "aws_lb_target_group" "hapee_nlb_target" {
  name = "hapee-test-nlb-tg"

  vpc_id = "${aws_vpc.default.id}"

  port     = 443
  protocol = "TCP"

  health_check {
    interval            = 30
    path                = "/haproxy_status"
    port                = 8080
    timeout             = 5
    healthy_threshold   = 3
    unhealthy_threshold = 3
    matcher             = "200,202"
  }
aws_lb_target_group.hapee_nlb_target: Error creating LB Target Group: InvalidConfigurationRequest: Custom health check timeouts are not supported for health checks for target groups with the TCP protocol

@reid-harrison
Copy link

@MichalPloski yes, this looks to still be validated along with other similar parameters: https://github.com/terraform-providers/terraform-provider-aws/blob/d0edc835f07ef347937892b691b9ab0a602b2372/aws/resource_aws_lb_target_group.go#L673

That check technically allows 0 to be set in the case of TCP health check but unfortunately, timeout param fails validation against the allowed range 2-60.

With terraform-provider-aws =1.38.0

Error: aws_lb_target_group.my_tg: expected health_check.0.timeout to be in the range (2 - 60), got 0

@red8888
Copy link

red8888 commented Feb 26, 2019

Why is this closed? seems to still not be working? Or maybe there is a workaround?

@red8888
Copy link

red8888 commented Feb 26, 2019

whoops nevermind I can see in the aws UI you actually cant change the timeout at all (greyed out)
image

@reid-harrison
Copy link

@red8888 It is still an issue for me because I would like to re-use the aws_lb_target_group resource for both HTTP and TCP protocols (as a module, for example). I should be able to set the health_check timeout argument to 0 or 10 for a TCP TG but with this bug, Terraform will always throw an error when health_check timeout is set when protocol is TCP.

@sqlaide
Copy link

sqlaide commented Jun 18, 2019

Is there workaround for this? Why this was closed?

@kaushikreddi9
Copy link

I am using provider.aws: version = "~> 2.23", and I am still facing this issue, Let me know how the issue can be resolved or any workaround.

@djordje-petrovic
Copy link

Had the same issue. TF version 0.11.4. Provider 2.27.0. Solved it by setting matcher to 200~399. Anything else would fail with the same error the rest are having.

@kaushikreddi9
Copy link

Had the same issue. TF version 0.11.4. Provider 2.27.0. Solved it by setting matcher to 200~399. Anything else would fail with the same error the rest are having.

I have made the changes which you have suggested, but still facing the same issue. below is NLB target group resource ,

resource "aws_lb_target_group" "nlb_target" {
name = "nlb-target"
port = XXX
protocol = "TCP"
vpc_id = "XXX"

health_check {
healthy_threshold = 3
unhealthy_threshold = 3
timeout = 6
protocol = "HTTPS"
port = XXX
path = "XXX"
interval = 30
matcher = "200,399"
}
}

Let me know if I need to config any additional params?

@kaushikreddi9
Copy link

I have fixed the issue, by setting below values,
timeout = 10
matcher = "200-399"
interval = 30

etwillbefine added a commit to goci-io/aws-api-gateway-proxy that referenced this issue Oct 6, 2019
as per hashicorp/terraform-provider-aws#2708
and try to use HTTP as protocol to support a health check path
@ghost
Copy link

ghost commented Nov 1, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.