New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-22399: Disable UWM Telemetry remote writer when MGMT cluster is disconnected #3332
OCPBUGS-22399: Disable UWM Telemetry remote writer when MGMT cluster is disconnected #3332
Conversation
@jparrill: This pull request references Jira Issue OCPBUGS-22399, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
✅ Deploy Preview for hypershift-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/jira refresh |
@jparrill: This pull request references Jira Issue OCPBUGS-22399, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/retest required |
@jparrill: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
/test e2e-aws |
/lgtm |
/retest-required |
d5bb8e6
to
71a8ad2
Compare
@jparrill: This pull request references Jira Issue OCPBUGS-22399. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jparrill: This pull request references Jira Issue OCPBUGS-22399, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
hypershift-operator/main.go
Outdated
if opts.EnableUWMTelemetryRemoteWrite { | ||
// to remotely write telemetry metrics. The UWM stack will be disabled if the | ||
// cluster is IPv6. | ||
mgmtNetworkConfig := manifests.OpenshiftNetworkConfiguration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we disabling telemetry for all clusters having ipv6 in the management cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are assuming that if the first entry of the serviceNetwork in the MGMT cluster is IPv6, the HostedClusters are disconnected. As I discussed with @sjenning I don't know any other better way to infer if the cluster is disconnected or not. Any suggestion?
hypershift-operator/main.go
Outdated
if err := mgr.GetClient().Get(ctx, crclient.ObjectKeyFromObject(mgmtNetworkConfig), mgmtNetworkConfig); err != nil { | ||
return fmt.Errorf("failed to get network config resource: %w", err) | ||
} | ||
if opts.EnableUWMTelemetryRemoteWrite && !utilsnet.IsIPv6String(mgmtNetworkConfig.Spec.ServiceNetwork[0]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the error that this would though in disconnected atm? it's not clear by reading the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the HO pod get blocked in the reconciliation and the log shows:
{"level":"error","ts":"2023-12-20T15:23:01Z","msg":"Reconciler error","controller":"deployment","controllerGroup":"apps","controllerKind":"Deployment","Deployment":{"name":"operator","namespace":"hypershift"},"namespace":"hypershift","name":"operator","reconcileID":"451fde3c-eb1b-4cf0-98cb-ad0f8c6a6288","error":"cannot get telemeter client secret: Secret \"telemeter-client\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
{"level":"debug","ts":"2023-12-20T15:23:01Z","logger":"events","msg":"Failed to ensure UWM telemetry remote write: cannot get telemeter client secret: Secret \"telemeter-client\" not found","type":"Warning","object":{"kind":"Deployment","namespace":"hypershift","name":"operator","uid":"c6628a3c-a597-4e32-875a-f5704da2bdbb","apiVersion":"apps/v1","resourceVersion":"4091099"},"reason":"ReconcileError"}
It's intrusive in the sense of that block the HC reconciliation not just the normal functioning of the operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the HO pod get blocked in the reconciliation
Can you clarify why? The error you shared is from a different controller, it shouldn't block HC reconciliation
The error you shared says it can't get a secret, how is that related to disconnected?
Because we are assuming that if the first entry of the serviceNetwork in the MGMT cluster is IPv6, the HostedClusters are disconnected
I don't think that's a fair assumption. If --enable-uwm-telemetry-remote-write is enabled and telemetry is not reached because it's disconnected we should just log it.
/hold I think we just want the user to explicitly tell us with the existing flag, not intuit the intent through the |
Ok, makes sense, I will convert this PR in a more documentation oriented one, and engage ACM/MCE people in the loop in order to get this documented in their side also. |
8b002a4
to
e937597
Compare
e937597
to
ba89e86
Compare
…cted envs Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
ba89e86
to
bd8c922
Compare
/approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jparrill, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jparrill: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@jparrill: Jira Issue OCPBUGS-22399: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-22399 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-hypershift-container-v4.16.0-202312210011.p0.g146072b.assembly.stream for distgit hypershift. |
/cherry-pick release-4.15 |
@jparrill: new pull request created: #3380 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Fix included in accepted release 4.16.0-0.nightly-2024-01-09-085011 |
This PR is disabling the UWM telemetry remote writer controller when the management cluster is working in disconnected mode. We asume that this mode is enabled when the first Service network entry is an IPv6, being that entry the first citizen for OVN.
Follow up To-Do:
Signed-off-by: Juan Manuel Parrilla Madrid jparrill@redhat.com
Which issue(s) this PR fixes
Fixes #OCPBUGS-22399