-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Monitoring code refactor #219
proposal: Monitoring code refactor #219
Conversation
/cc @sradco |
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice initiative, thanks!
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
Thanks @machadovilaca! Separating monitoring logic from business logic is very important! great to see this being pushed forward. 👍 I feel like further explanation is needed here. Specifically, I'd like to understand: IOW, is this change trying to solely achieve a uniformity between the different repositories, or does it also actually help separate monitoring from business logic? I'm asking since even now we have a dedicated In addition, in the proposal I've created once, I've thought of this problem at the opposite direction. Instead aiming for every repo to align with a certain structure, I think it's reasonable for every repo to adopt some kind of a declarative configuration that would describe which fields need to be monitored and how. I'm not sure how feasible this direction really is, but my point is to remove the burden from the developers to automation as much as possible, especially because I think that a lot of repositories use monitoring pretty similarly (e.g. track X and Y fields, alert when X is bigger than 123, etc). I'm not experienced enough with the monitoring field to know if that's feasible and how, but I would like to hear your perspective regarding that direction. |
/cc |
@iholder101 First of all thank you for your review. And I agree with you. I think a declarative configuration for metrics and rules is a solution I would like to have one day. Unfortunately, for now, I think that goal would be way down the road since, in my opinion, we have increasingly more technical debt and it is becoming more difficult to update and 'evolve' monitoring features. In this stage, my goal is to, as you said, align all repositories with a certain structure, have a package that all repositories import and use (reducing code duplication and making it easier to add new features for monitoring in the future), and cleanup business logic code. This way, in the future, I think it would be easier to update the library package to parse and registry metrics and rules from declarative configurations, move the collection logic of external resources to external repositories, and even have an external controller/operator if needed/wanted. Related to
Yes, KubeVirt already has a monitoring directory, some other components like HCO, also have it. But I still think there is still a lot of work to be done. Besides the components creating and registering metrics in very different ways, we see that not all metrics are registered there (for example kubevirt_virt_controller_leading),. We see that if we want to follow Kubernetes Metrics Stability framework in kubevirt/kubevirt alone it would take a lot of manual repetitive work. We see that the metric documentation generator misses a lot of metrics. I think this is hard work, but necessary to allow us to continuously improve our monitoring systems, and important if we ever want to reach some of the ideas you also had. |
/cc |
@machadovilaca @iholder101 Following the idea of a generic approach, I saw this initiative from Meta incubator. |
@enp0s3 Didn't test it, but since we are working with operators and At the time, I found this issue investigating a library that, personally, I think was even more interesting than |
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
design-proposals/monitoring-code-refactor/monitoring-code-refactor.md
Outdated
Show resolved
Hide resolved
Hey @machadovilaca! Let's back down a bit and discuss what is the problem we really aim to solve here. The basic problem here, as I see it, is that every change regarding monitoring needs to happen at many different places. For example, if another label needs to be added to a certain metric, there's a need to perform changes at many different repositories. This is both hard, takes a lot of time, requires going through code-reviews, costs CI resources and makes it mentally harder to decide to change anything. The reality is that it's very hard maintaining monitoring, that's supposed to be aligned and uniform, between many different repositories. A side-effect of this approach is both that repositories diverge from one another over time, making it even harder to maintain as time goes by. As I see it, this proposal suggests to invest a lot of effort in trying to solve the side-effect problem, by making conventions and linters that would try to reduce the divergence of repos from one another. My humble opinion is that we should invest, even if it would take longer, in a solution that solves the root cause of evil here, which is performing changes in many repos manually. What concerns me the most in this approach:
I think that a better approach going forward is to introduce some kind of a framework / library that repositories can import and have under WDYT? |
@machadovilaca @iholder101 More or less what I was thinking of:
|
c9459c8
to
3e75d3e
Compare
3e75d3e
to
c94224c
Compare
Following the work started in kubevirt/kubevirt#10044 , and according to the kubevirt/community#219 proposal, this PR refactors monitoring recording rules and alerts Signed-off-by: avlitman <alitman@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com>
* Apply monitoringlinter Apply monitoringlinter, which is designed to enforce kubevirt/community#219 by ensuring that monitoring-related practices are implemented within the pkg/monitoring directory using operator-observability methods. Signed-off-by: assafad <aadmi@redhat.com> * Apply operator-observability rules linter Apply operator-observability rules linter which ensures that kubevirt alerts and recording rules definitions are following the monitoring best practices. Signed-off-by: assafad <aadmi@redhat.com> --------- Signed-off-by: assafad <aadmi@redhat.com>
This design doc is proposing a code refactor for the monitoring logic in all
KubeVirt components.