Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set Loadbalancer Service Health Probe Port #1505

Closed
vsabella opened this issue Apr 15, 2022 · 7 comments · Fixed by #2452
Closed

Unable to set Loadbalancer Service Health Probe Port #1505

vsabella opened this issue Apr 15, 2022 · 7 comments · Fixed by #2452
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@vsabella
Copy link
Contributor

vsabella commented Apr 15, 2022

Reopen from #1499

When configuring Health Checks for Azure Load Balancer you can specify the path and type, but it is not possible to specify the port of the health probe.

Istio Ingress Gateway and similar services do not expose health checks on their actual endpoints (as it can operate across multiple protocols) but instead expose a general health check on a different status port for the gateway.

In AWS this is easily possible with the service.beta.kubernetes.io/aws-load-balancer-healthcheck-port annotation but it does not seem possible in Azure.

This prevents us from configuring health check probes against Istio which need to be targeting the "Status Ready" port not the actual port of the backend.

Example

Istio hosted on two ports: Port 80 and Port 443 will not serve any traffic until a Virtual Service is deployed
The health check needs to be a HTTP health check against the status port, let's say that's 15021
The probe protocol and endpoint and everything can be set, but the health check port cannot be.

For the following spec:

spec:
  ports:
    - name: http2
      protocol: TCP
      port: 80
      targetPort: 8080
    - name: https
      protocol: TCP
      port: 443
      targetPort: 8443

The generated probe needs to look like this:

image

@vsabella
Copy link
Contributor Author

@MartinForReal I've started down this path:
vsabella@5303352

If that looks like it makes sense for a fix I'll complete it also for the MixedLB mode /port_{num}-.... annotations

@vsabella
Copy link
Contributor Author

It also looks like since priority is given to port.AppProtocol we are also unable to:

  1. set appProtocol to the correct value (https)
  2. expect the health probe to be a different protocol (say http)

The ability to decouple the health check from the actual service entry is key to getting Istio ingressgateway to have correct health checks / drain behavior / etc...

	if port.AppProtocol == nil {
		if port.AppProtocol, err = consts.GetAttributeValueInSvcAnnotation(annotations, consts.ServiceAnnotationLoadBalancerHealthProbeProtocol); err != nil {
			return nil, fmt.Errorf("failed to parse annotation %s: %w", consts.ServiceAnnotationLoadBalancerHealthProbeProtocol, err)
		}
		if port.AppProtocol == nil {
			port.AppProtocol = to.StringPtr(string(network.ProtocolTCP))
		}
	}

@MartinForReal
Copy link
Contributor

Well, this is a pleasant surprise. :-)

It is reasonable for a fix. If we add two additional annotations, will it work in the current scenario?

port_{num}-probe-protocol
port_{num}-probe-port

Then the priority goes to two new annotations.

@feiskyer @nilo19 @lzhecheng Any advice is appreciated!

@vsabella Thanks for the contribution!

@vsabella
Copy link
Contributor Author

Well, this is a pleasant surprise. :-)

It is reasonable for a fix. If we add two additional annotations, will it work in the current scenario?

port_{num}-probe-protocol
port_{num}-probe-port

Then the priority goes to two new annotations.

@feiskyer @nilo19 @lzhecheng Any advice is appreciated!

@vsabella Thanks for the contribution!

Thats exactly what I was thinking and reasonable. I'll have a PR for you later this week.

@MartinForReal
Copy link
Contributor

Hi @vsabella Do you still have bandwidth to work on this?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2022
@rainest
Copy link
Contributor

rainest commented Sep 30, 2022

/remove-lifecycle stale

Checking on this as well, as we recently encountered this when our users started migrating to 1.24. Is the fork fix still under active development, or is there any ongoing work from Microsoft to implement these annotations?

FWIW, in our case the affected HTTP services:

  • are not likely to serve 200s for GET /, as they're proxy instances that often only route requests that include some configured host header.
  • require a TLS client certificate before accepting any requests.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
5 participants