-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-0.65] Add alert to notify of nmstate removal #1302
[release-0.65] Add alert to notify of nmstate removal #1302
Conversation
/cc @RamLavi |
9ec9af3
to
ebeab21
Compare
@sradco can you also review this ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add an e2e to alerts_tests?
test/e2e/workflow/deployment_test.go
Outdated
@@ -408,12 +408,33 @@ var _ = Describe("NetworkAddonsConfig", func() { | |||
var ( | |||
configSpec cnao.NetworkAddonsConfigSpec | |||
) | |||
checkMetricValues := func(expectedMetricValueMap map[string]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we recently moved endpoints tests to here. can you do the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also means adding the monitoring lane, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ebeab21
to
1835e5b
Compare
labels: | ||
severity: warning | ||
kubernetes_operator_part_of: kubevirt | ||
kubernetes_operator_component: cluster-network-addons-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to highlight this alert in the UI virtualization overview page.
can we add a label that will tell the UI to show this alert on the top of the virtualization overview page for better visibility ?
for exampe:
kubevirt_ui: highlighted
does that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a naming convention for alert labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have, but @sradco may confirm that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest
kubevirt_console_scope: [cluster,virtualmachine]
But IIUC, we need to provide more info in case an alert is cluster scoped (that's not case of the alert in this PR, just to make the mechanism general).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaacov Can you please explain what is the reasoning to highlight this specific alert?
I believe alerts should be ordered based on their severity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for the label should be part of OCP metrics documentation.
Im not sure if we should have it as mandatory as the severity.
The issue is that the label naming impacts also OCP and all other operators.
We should propose this in the monitoring forum and get their opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it as priority.
makes sense to me too
Im not sure if we should have it as mandatory as the severity.
IMHO it should not be mandatory (default to medium if missing)
my reasoning is that unlike severity that is clear to define by the rule maintainer, priority may depend on things outside the scope of the alert rules maintainer, the cases where the rule maintainer wish to raise/lower priority are the odd cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created openshift/enhancements#1077. Please review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from openshift/enhancements#1077 is looks like "priority: high"
@rhrazdil can you add the "priority" label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've missed your comment.
Added the priority: high
label in a new commit
labels: | ||
severity: warning | ||
kubernetes_operator_part_of: kubevirt | ||
kubernetes_operator_component: cluster-network-addons-operator | ||
- alert: CnaoDown | ||
annotations: | ||
summary: CNAO pod is down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to be more verbose if we highlight this alert.
maybe cluster network addons operator pod is not found, the operator deploy additional networking components required by kubevirt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to highlight the CnaoNmstateMigration alert, but yes, I get your point
Adding a new e2e test module and lane for monitoring. Signed-off-by: Ram Lavi <ralavi@redhat.com>
1835e5b
to
04ceb17
Compare
0cf3891
to
5567456
Compare
/test pull-e2e-cnao-nmstate-functests-release-0.65 |
Adding explicit hold so we don't merge this preemptively /hold |
data/monitoring/prom-rule.yaml
Outdated
@@ -22,6 +22,7 @@ spec: | |||
expr: sum(kubevirt_cnao_nmstate_deployed or vector(0)) > 0 and sum(kubevirt_nmstate_operator_deployments or vector(0)) == 0 | |||
for: 5m | |||
labels: | |||
priority: high |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhrazdil Please move the Priority label below the severity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sradco I have nothing against moving it, but could you explain, what is the reasoning?
p is before s alphabetically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO semantically, the required fields should go before the optional ones,
using that way of thinking, severity
that is required label (and more important) will go before the optional priority
.
I have nothing against going with alphabetical order if it's the common practice for labels, so that is also good option, don't know what is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhrazdil I agree with Kobi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I take it. Updated
46cadb7
to
faf97e0
Compare
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if we really need to define kubevirt_cnao_nmstate_deployed
as gauge, if you can define it with pure prometheus given metrics (like it seems you already have succeeded), then we can remove the gauges, which will allow us to remove some of the cherry picks which will make things simpler IMO)
pkg/monitoring/metrics.go
Outdated
@@ -33,10 +33,15 @@ var ( | |||
Name: "kubevirt_cnao_cr_kubemacpool_deployed", | |||
Help: "Kubemacpool is deployed by Cnao CR", | |||
}) | |||
nmstateHandlerDeployed = prometheus.NewGauge( | |||
prometheus.GaugeOpts{ | |||
Name: "kubevirt_cnao_nmstate_deployed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm.. I don't get something..
if you already defined kubevirt_cnao_nmstate_deployed
with kube_daemonset_labels
, i.e. without the use of a GAUGE variable, why do you need to define one here? isn't this redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the variable in this force-push.
faf97e0
to
7cc8bcd
Compare
Updated (removed the redundant variable). Are not really neccessary, we could keep the tests in workflow/deployment, if you prefer |
7cc8bcd
to
79db7a4
Compare
kubernetes-nmstate is removed in the next CNAO version. This change adds an alert that notifies users who have knmstate deployed with CNAO, and point them to runbook, that explains that standalone kubernetes-nmstate operator should be installed. Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
79db7a4
to
511b995
Compare
Kudos, SonarCloud Quality Gate passed!
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RamLavi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/release-note-none |
Hello @yaacov, |
@rhrazdil I think it is important to add this alerts also to main branch. In case the user upgrade directly to the next major version and not to the next minor z-stream. |
We're blocking an upgrade when user upgrades from CNV 4.10 to CNV 4.11 and has kubernetes-nmstate deployed So this alert has no meaning on 4.11 |
kubernetes-nmstate is removed in the next CNAO version.
This change adds an alert that notifies users who have
knmstate deployed with CNAO, and point them to runbook,
that explains that standalone kubernetes-nmstate operator
should be installed.
Signed-off-by: Radim Hrazdil rhrazdil@redhat.com
What this PR does / why we need it:
Special notes for your reviewer:
Release note: