-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-33787: Tolerate the absence of ingress capability on HyperShift clusters #886
OCPBUGS-33787: Tolerate the absence of ingress capability on HyperShift clusters #886
Conversation
@alebedev87: This pull request references NE-1319 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Skipping CI for Draft Pull Request. |
cc41180
to
5e8103c
Compare
5e8103c
to
8f3a66e
Compare
@alebedev87: This pull request references NE-1319 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@alebedev87: This pull request references NE-1319 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alebedev87
This change does not make the operator to go downgraded when the ingress is disabled but whats basically because you are prematurely exiting all of the updated controllers. I doubt that the console will be working in this state.
Have you tested it ?
@@ -114,8 +121,27 @@ func (c *CLIDownloadsSyncController) Sync(ctx context.Context, controllerContext | |||
} | |||
|
|||
statusHandler := status.NewStatusHandler(c.operatorClient) | |||
|
|||
infrastructureConfig, err := c.infrastructureClient.Get(ctx, api.ConfigResourceName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ingress can be enabled/disabled during the clusters lifecycle then we need to keep the fetching logic per controller, but we need to use listers rather then client.
If the ingress can be re-enabled then please move the fetching logic starter.go
and pass the config to each of the controllers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the standalone OpenShift the Ingress capability will always be enabled. On HyperShift: once enabled it cannot be disabled.
} | ||
|
||
// Disable the route check for external control plane topology (hypershift) if the ingress capability is disabled. | ||
if util.IsExternalControlPlaneWithIngressDisabled(infrastructureConfig, clusterVersionConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by doing this you are basically disabling the operator from doing its work by syncing the console routes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhadvig: From what I see both of the functions used by the route controller (SyncCustomRoute
,SyncDefaultRoute
) are related to the component route implementation (custom hostname or tls). The component routes are not supported on HyperShift, so I think we are safe to skip them.
@csrwng: can you please confirm that the component routes are not supported on HyperShift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. If you've disabled ingress you are also indirectly disabling component routes, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I put my followup comment into a wrong conversation. Here it is:
Also, in the absence of the ingress operator the required RBAC won't be created for the components (including the console operator): Component routes chapter of the EP.
|
||
// Disable the oauth client update for external control plane topology (hypershift) if the ingress capability is disabled. | ||
// HyperShift handles the updating of the oauth client for the console. | ||
if util.IsExternalControlPlaneWithIngressDisabled(infrastructureConfig, clusterVersionConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by doing this you are basically disabling the operator from doing its work by syncing the console oauthclients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "syncing the console oauthclients" you mean adding redirectURI
? From what I understand this part will be handled by HyperShift as it takes care of some of the OAuthClient
s already. I suppose it's because the authentication server is not exposed via a route.
@csrwng: What do you think? If we don't disable the updates of OAuthClient
the console operator will go degraded. However it's not yet clear what should be put as the redirectURI
in case the ingress is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to disable updates of the OAuthClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API has been implemented to support an alternative ingress.
} | ||
|
||
// Disable the client download check for external control plane topology (hypershift) if the ingress capability is disabled. | ||
if controllersutil.IsExternalControlPlaneWithIngressDisabled(infrastructureConfig, clusterVersionConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by doing this you are basically disabling the operator from doing its work by syncing the console CLI downloads CRDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, oc-cli-downloads
contains the links to download the oc
client from the download route:
$ oc get consoleclidownloads oc-cli-downloads -o yaml | yq .spec.links[].href
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/amd64/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/amd64/mac/oc.zip
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/amd64/windows/oc.zip
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/arm64/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/arm64/mac/oc.zip
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/ppc64le/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/s390x/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/oc-license
So, the risk is to have the broken links in /command-line-tools
path of the console UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csrwng : can we accept the risk I described above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should be also prematurely exiting the DownloadsDeployment controller and remove the deployment itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API has been implemented to setup alternative ingress for the downloads.
Yes. You are right, the console is not not working out of the box because the routes are left unserved. I was playing with AWS Load Balancer Operator (ALBO) to find the right
|
/retest |
8f3a66e
to
c206e4e
Compare
Rebased from |
/assign @jhadvig |
/retest |
So this case will need to be handle in the service controller, which will then update the service spec accordingly. If I understand it, console itself wont be available though the service if the NodePort is not set? If thats the case I think it needs to be address in this change.
Are these steps documented somewhere? I feel like there is too many gaps - in order to merge this change. What would be the benefit for Hypershift user if he disables the Ingress capability, but the console wont be working? I feel that at minimum we should have a documented way how to make the console work.
|
Should we simply disable console for now if we know it won't work, even with manual load balancer configuration? The console reads the OAuth URLs from the well-known endpoint. If the OAuth metadata can be updated in that document, console should discover the right URL. |
The console is still an important part of HyperShift based offerings like ROSA. The HyperShift engineers prefer the console to be running even without the default ingress (router). The way to expose it is a subject to a discussion, one of the ways to expose the console is described in the doc I added today.
ROSA HCP clusters already have the auth URL which is not an OpenShift route. This makes the authentication from console work smoothly even without the router. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @alebedev87 for the PR. Looks good.
Adding couple of comments. PTAL
docs/alb-ingress-rosa-hcp.md
Outdated
The AWS Load Balancer Controller routes the traffic between cluster nodes. | ||
Therefore the services need to be exposed via a port on the node network: | ||
```bash | ||
cat <<EOF | oc apply -f - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two services should be handled by the service controller, so there is less space for user error.
Lets add them to /bindata/assets/services/
as console-service-np
and downloads-service-np
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new node port service manifests and made them always applied.
|
||
### Create Ingress resources for the services | ||
|
||
To provision ALBs create the following resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this could be also done automatically by the console-operator if ALB setup is detected? I see that CERTIFICATE_ARN
needs to be provided though. Could we fetch it from some CM or Secret on the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this could be also done automatically by the console-operator if ALB setup is detected?
So far, we didn't see much automation around the certificate creation from the ALBO users. Probably because this requires quite some some effort to manage the cloud credentials. We can think about it as a followup enhancement.
$ oc patch console.operator.openshift.io cluster --type=merge -p "{\"spec\":{\"ingress\":{\"consoleURL\":\"${CONSOLE_ALB_HOST}\",\"clientDownloadsURL\":\"${DOWNLOADS_ALB_HOST}\"}}}" | ||
``` | ||
|
||
## Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
docs/alb-ingress-rosa-hcp.md
Outdated
|
||
3. When the ingress capability is disabled, the console operator skips the implementation of the component route customization. | ||
|
||
4. To simulate a disabled ingress set the desired replicas to zero in the default ingress controller: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but will this simulation set the routes to be un-Admitted ? Asking just to have a better understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the routes will still be admitted. I rephrased it to:
To simulate the absence of ingress connectivity when the ingress capability is disabled, set the desired replicas to zero in the default ingress controller
} | ||
|
||
// Do not proceed to the console route checks if alernative ingress is requested. | ||
if len(operatorConfig.Spec.Ingress.ConsoleURL) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets do this at the beginning. It will be cheaper to return prematurely since we have the operatorConfig at the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this controller is running in two instances, one for the console
route and the other one for the downloads
route.
Check https://github.com/openshift/console-operator/blob/master/pkg/console/starter/starter.go#L363-L395
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved upper and took the download instance into account.
@@ -141,3 +143,85 @@ func TestLabelFilter(t *testing.T) { | |||
} | |||
|
|||
} | |||
|
|||
func TestIsExternalControlPlaneWithIngressDisabled(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
@@ -47,12 +47,13 @@ func DefaultConfigMap( | |||
nodeOperatingSystems []string, | |||
copiedCSVsDisabled bool, | |||
telemeterConfig map[string]string, | |||
consoleHost string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the point of adding this additional param, but I would rather avoid it, since for the non ALB setup, consoleHost
will be the same as the host int the activeConsoleRoute.spec
.
Lets rather create a new struct, which would include both of these types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since for the non ALB setup, consoleHost will be the same as the host int the activeConsoleRoute.spec
Well, most of the time it's true. The route.spec.host
is assigned automatically by the apiserver from the default ingress domain without the knowledge about the inrgesscontrollers which is only present in the route's status. That's why in operator/sync_v400.go
I pass down the url returned by routesub.GetActiveRouteInfo()
which returns only the admitted host.
beb12ae
to
47090ae
Compare
Always apply nodeport services, to avoid the chicken and egg problem with alternative ingress: |
/retest |
47090ae
to
35e503b
Compare
|
35e503b
to
24f4886
Compare
@alebedev87: This pull request references Jira Issue OCPBUGS-33787, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
As discussed with @jhadvig: create |
…ft clusters - Implement alternative ingress fields from the console operator config API - Skip component route customizations if ingress capability is disabled - Use NodePort type for console and downloads services if ingress capability is disabled - Add document to describe how to implement alternative ingress on ROSA
24f4886
to
7c4d777
Compare
|
/retest |
/retest CI outage. |
|
Thanks @alebedev87 for the PR. LGTM |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87, jhadvig The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Overriding the TC due to flaking e2e test |
@jhadvig: Overrode contexts on behalf of jhadvig: ci/prow/e2e-aws-console In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@alebedev87: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@alebedev87: Jira Issue OCPBUGS-33787: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-33787 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build openshift-enterprise-console-operator-container-v4.17.0-202406101611.p0.gc608dec.assembly.stream.el9 for distgit openshift-enterprise-console-operator. |
Part of the implementation of Make Ingress optional for HyperShift EP. Note that this change targets only HyperShift managed deployments and should not impact standalone OpenShift installations.
This PR: