New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-1131: Filter for Duplicate=false in metrics #387
NETOBSERV-1131: Filter for Duplicate=false in metrics #387
Conversation
@jotak: This pull request references NETOBSERV-1131 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov Report
@@ Coverage Diff @@
## main #387 +/- ##
=======================================
Coverage 53.72% 53.72%
=======================================
Files 45 45
Lines 5515 5515
=======================================
Hits 2963 2963
Misses 2342 2342
Partials 210 210
Flags with carried forward coverage won't be shown. Click here to find out more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jotak
Is it worth having an extra metric from namespace_flows_total
with the duplicate: false
filter ?
That raises an interesting point, personally I see this metric more as a Health metric than a Network insight metric: because it tells about flows and not about bytes or packets, which is an arbitrary measure for netobserv. If it's there for health reasons, filtering out duplicate doesn't make sense IMO (it's an indicator of the "load" that netobserv components deal with, answering the question: which namespaces are the most responsible of what netobserv is enduring? - removing duplicates would hide some of that load). But that said, it's part of our NetObserv dashboard, not of NetObserv Health - which perhaps is a mistake ? |
/ok-to-test |
New images:
They will expire after two weeks. Catalog source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-06c9459
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
/label qe-approved |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jotak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Requires FLP PR: netobserv/flowlogs-pipeline#448
Some observations when comparing byte rates from the plugin versus from prom metrics, e.g. between two nodes, e.g. with a promQL like that:
And on plugin's side, similarly, filtering on destination node name="ip-10-0-137-27.eu-west-3.compute.internal" , using Scope=Node and observing the top 5 flow rates in Overview:
Before this PR I see increased values, on prometheus side, in a range going from +20% to +80%
After this PR I still see increased values on prom side but seems much more reasonable, from +10% to +25%
(data)
I don't have a definitive explanation as to why I continue to see increased values, but I can make the hypothesis that they come from differences in rate calculation between loki and prometheus - also, the metrics view in the console provide a better resolution than our overview charts, which might play a role in the discrepancies. Anyway, I think this PR brings back values in a much more acceptable range.