Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-7621-health-Ex-LB: Adding h. check for vSphere external LB se… #62828

Closed
wants to merge 1 commit into from

Conversation

dfitzmau
Copy link
Contributor

@dfitzmau dfitzmau commented Jul 26, 2023

OCPBUGS-7621

Version(s):
4.14 through to 4.10

Link to docs preview:
Configuring an external load balancer

QE review:

  • QE has approved this change.

Additional information:

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 26, 2023
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Jul 26, 2023

🤖 Updated build preview is available at:
https://62828--docspreview.netlify.app

Build log: https://circleci.com/gh/ocpdocs-previewbot/openshift-docs/20582

@dfitzmau dfitzmau force-pushed the OCPBUGS-7621-health-Ex-LB branch 2 times, most recently from 718215d to f0413b9 Compare July 26, 2023 16:12

To determine that an external load balancer functions correctly, perform a health check on each backend service. A healthy backend service must establish a connection to all health check probes/connection requests.

Create a `Pod` object/health check resource and then specify parameters the probe checks. The settings you apply to the resource depend on the type of external load balancer that you configured for your cluster. Ensure that you create a resource for each backend service. You can then issue the following command to perform a health check on the resource:
Copy link
Contributor Author

@dfitzmau dfitzmau Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod object or health check resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this one. @cybertron do you have any thoughts on this(or know someone who would)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about what's going on here. We're creating a health-check pod (from what image?) and then adding probes on it? Is the probe config below being consumed by something in the pod? Health checking the pod itself isn't going to work, unless it's got some mechanism to forward the checks to the external LB. I guess I'm unfamiliar with this resource and there's not enough here for me to connect the dots.

My assumption has been that external loadbalancers need to be monitored externally. I suppose you could run the monitoring in the cluster, but if something does fail then your monitoring is unavailable too.

Edit: I see this is based on a KCP which is equally vague about what is happening here. I think we need a bunch more clarification from the bug reporter about what they're asking for. The KCP is missing a lot of context, like where do these health check snippets go? I assume it was obvious to the writer, but as someone who isn't familiar with this I have no idea what to do with them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me around health check pod.

When configuring an external load balancer for an OpenShift cluster, it is not sufficient to just determine whether a backend (e.g. the api, machine config or ingress) is up or down just by the port.

All commercial load balancers, including HAProxy, support the idea of "health probes." These need to be configured IN the external load balancer. The solutions article provides the examples of what these should be.

For example, the ingress controller. If we were to configure that in say, an F5, we would configure the F5 to have each of the ingress controllers as backends. In addition, we would configure a health probe.

Path: HTTP:1936/healthz/ready
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 5
Interval: 10

The F5 would then use two data points to determine if the F5 is available:

First, it would use the ingress controller ports (default 80 and 443).

Second, it would do an HTTP GET to the backend IP and port 1936. If that returns back successful, then the ingress controller is healthy.

Only if both of these points are true will the F5 send traffic to that ingress controller.

I've attached a visual to show what this would look like.

Fiserv_F5_Proposed-Page-1 drawio

This only shows around the ingress controller, but the similar setup would be for the API and the Machine Config API.

HTH,

-CFH


.Verification

To determine that an external load balancer functions correctly, perform a health check on each backend service. A healthy backend service must establish a connection to all health check probes/connection requests.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Health check probes or connection requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfitzmau
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 21, 2023
@clarkhale
Copy link

@dfitzmau Here are the diagrams we talked about. In draw.io format (zipped, so github would accept them). There are in three tabs for each of the three cases: API, Machine Config API, and Ingress.

OCPBUGS-7621-LB-Diagrams.zip

@dfitzmau
Copy link
Contributor Author

Closing PR. Work moved to #65501

@dfitzmau dfitzmau closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants