Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-33896: add alert data to upgrade health in oc adm upgrade status #1740

Merged

Conversation

PratikMahajan
Copy link
Contributor

@PratikMahajan PratikMahajan commented Apr 22, 2024

adds alerts that fire during the upgrade to
upgrade health section.

by default all the alerts that started firing after initiating the
upgrade will appear in the upgrade health section

we also have allowed alerts that will show alerts that started
firing before the upgrade was started.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 22, 2024
@openshift-ci openshift-ci bot requested review from deads2k and mfojtik April 22, 2024 21:51
@PratikMahajan PratikMahajan changed the title [WIP] add alert data to upgrade health in oc adm upgrade status [WIP] OTA-1157: add alert data to upgrade health in oc adm upgrade status Apr 22, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 22, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 22, 2024

@PratikMahajan: This pull request references OTA-1157 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@petr-muller
Copy link
Member

/cc
/uncc @deads2k @mfojtik

@openshift-ci openshift-ci bot requested review from petr-muller and removed request for mfojtik and deads2k April 23, 2024 10:56
pkg/cli/admin/upgrade/status/status.go Outdated Show resolved Hide resolved
pkg/cli/admin/upgrade/status/status.go Outdated Show resolved Hide resolved
pkg/cli/admin/upgrade/status/status.go Outdated Show resolved Hide resolved
pkg/cli/admin/upgrade/status/status.go Outdated Show resolved Hide resolved
pkg/cli/admin/upgrade/status/health.go Outdated Show resolved Hide resolved
@@ -21,6 +22,7 @@ const (
scopeKindClusterOperator allowedScopeKind = "ClusterOperator"
scopeKindNode allowedScopeKind = "Node"
scopeKindMachineConfigPool allowedScopeKind = "MachineConfigPool"
scopeKindAlert allowedScopeKind = "Alert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need this. This is used to tie insights to resources in the cluster - either we can do that and we will know the exact kind, or we can and we can have the list of related resources to be empty.

pkg/cli/admin/upgrade/status/health.go Outdated Show resolved Hide resolved
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2024
@PratikMahajan PratikMahajan force-pushed the oc-adm-upgrade-alert branch 2 times, most recently from d1f9fe4 to 6349223 Compare April 25, 2024 23:23
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2024
@PratikMahajan PratikMahajan force-pushed the oc-adm-upgrade-alert branch 3 times, most recently from 707e13b to 054436e Compare April 26, 2024 18:41
@PratikMahajan PratikMahajan changed the title [WIP] OTA-1157: add alert data to upgrade health in oc adm upgrade status OTA-1157: add alert data to upgrade health in oc adm upgrade status Apr 26, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 26, 2024
var alertData AlertData
alertBytes, err := o.getAlerts(ctx)
if err != nil {
fmt.Println("Unable to fetch alerts from thanos, ignoring alerts in 'Update Health': ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "thanos" -> "Thanos", to match the casing preferred by the Thanos folks, for example:

Join users and companies that are using Thanos in production.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although actually, this call-site has no idea that we're reaching out to Thanos; that's an implementation detail. I'd expect inspectalerts to be providing sufficient context for debugging failed requests if we fail to fetch alerts, and here we can probably just do:

fmt.Fprintf(o.ErrOut, "warning: Unable to fetch alerts, ignoring alerts in 'Update Health': %v\n", err)

which will also get us logging this warning to ErrOut, as we already do here.

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
insights := parseAlertDataToInsights(tt.alertData, tt.startedAt)
if got := len(insights); got != tt.expectedCount {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's easier to write this kind of test, all you're testing is "are we skipping this alert or not?". It may be worth an expected []updateInsight for comparison with the returned value, so we can also check on things like "does the updateInsightImpact have the property values we expect for this alert?".

@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 30, 2024
@PratikMahajan PratikMahajan force-pushed the oc-adm-upgrade-alert branch 2 times, most recently from 3b6bd3d to d309f99 Compare May 1, 2024 22:45
Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 16, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 16, 2024
@petr-muller
Copy link
Member

/lgtm

@petr-muller
Copy link
Member

petr-muller commented May 16, 2024

/label acknowledge-critical-fixes-only

This is a part of a TechPreview feature that we want to deliver in 4.16, all functionality gated behind an opt-in envvar

@openshift-ci openshift-ci bot added acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. lgtm Indicates that a PR is ready to be merged. labels May 16, 2024
Copy link
Contributor

openshift-ci bot commented May 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, PratikMahajan, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
Copy link
Contributor

openshift-ci bot commented May 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, PratikMahajan, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 1073b29 and 2 for PR HEAD d5f52ff in total

@petr-muller
Copy link
Member

/retitle OCPBUGS-33896: add alert data to upgrade health in oc adm upgrade status

@openshift-ci openshift-ci bot changed the title OTA-1157: add alert data to upgrade health in oc adm upgrade status OCPBUGS-33896: add alert data to upgrade health in oc adm upgrade status May 17, 2024
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 17, 2024
@openshift-ci-robot
Copy link

@PratikMahajan: This pull request references Jira Issue OCPBUGS-33896, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

adds alerts that fire during the upgrade to
upgrade health section.

by default all the alerts that started firing after initiating the
upgrade will appear in the upgrade health section

we also have allowed alerts that will show alerts that started
firing before the upgrade was started.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@petr-muller
Copy link
Member

/retest

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4978e7d and 1 for PR HEAD d5f52ff in total

Copy link
Contributor

openshift-ci bot commented May 18, 2024

@PratikMahajan: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 3214619 into openshift:master May 18, 2024
13 checks passed
@openshift-ci-robot
Copy link

@PratikMahajan: Jira Issue OCPBUGS-33896: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-33896 has been moved to the MODIFIED state.

In response to this:

adds alerts that fire during the upgrade to
upgrade health section.

by default all the alerts that started firing after initiating the
upgrade will appear in the upgrade health section

we also have allowed alerts that will show alerts that started
firing before the upgrade was started.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-cli-container-v4.17.0-202405180411.p0.g3214619.assembly.stream.el9 for distgit openshift-enterprise-cli.
All builds following this will include this PR.

@petr-muller
Copy link
Member

/cherry-pick release-4.16

@openshift-cherrypick-robot

@petr-muller: new pull request created: #1771

In response to this:

/cherry-pick release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants