NETOBSERV-1131: Filter for Duplicate=false in metrics #387

jotak · 2023-06-29T17:13:06Z

Requires FLP PR: netobserv/flowlogs-pipeline#448

Some observations when comparing byte rates from the plugin versus from prom metrics, e.g. between two nodes, e.g. with a promQL like that:

sum(rate(netobserv_node_ingress_bytes_total{DstK8S_HostName="ip-10-0-137-27.eu-west-3.compute.internal"}[90s])) by (SrcK8S_HostName, DstK8S_HostName)

And on plugin's side, similarly, filtering on destination node name="ip-10-0-137-27.eu-west-3.compute.internal" , using Scope=Node and observing the top 5 flow rates in Overview:

Before this PR I see increased values, on prometheus side, in a range going from +20% to +80%
After this PR I still see increased values on prom side but seems much more reasonable, from +10% to +25%
(data)

I don't have a definitive explanation as to why I continue to see increased values, but I can make the hypothesis that they come from differences in rate calculation between loki and prometheus - also, the metrics view in the console provide a better resolution than our overview charts, which might play a role in the discrepancies. Anyway, I think this PR brings back values in a much more acceptable range.

openshift-ci-robot · 2023-06-29T17:13:10Z

@jotak: This pull request references NETOBSERV-1131 which is a valid jira issue.

In response to this:

Requires FLP PR: netobserv/flowlogs-pipeline#448

Some observations when comparing byte rates from the plugin versus from prom metrics, e.g. between two nodes, e.g. with a promQL like that:
sum(rate(netobserv_node_ingress_bytes_total{DstK8S_HostName="ip-10-0-137-27.eu-west-3.compute.internal"}[90s])) by (SrcK8S_HostName, DstK8S_HostName)
And on plugin's side, similarly, filtering on destination node name="ip-10-0-137-27.eu-west-3.compute.internal" , using Scope=Node and observing the top 5 flow rates in Overview:

Before this PR I see increased values, on prometheus side, in a range going from +20% to +80%
After this PR I still see increased values on prom side but seems much more reasonable, from +10% to +25%
(data)

I don't have a definitive explanation as to why I continue to see increased values, but I can make the hypothesis that they come from differences in rate calculation between loki and prometheus - also, the metrics view in the console provide a better resolution than our overview charts, which might play a role in the discrepancies. Anyway, I think this PR brings back values in a much more acceptable range.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

codecov · 2023-06-29T17:17:44Z

Codecov Report

Merging #387 (c38b09d) into main (57a6951) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #387   +/-   ##
=======================================
  Coverage   53.72%   53.72%           
=======================================
  Files          45       45           
  Lines        5515     5515           
=======================================
  Hits         2963     2963           
  Misses       2342     2342           
  Partials      210      210

Flag	Coverage Δ
unittests	`53.72% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

jpinsonneau

LGTM, thanks @jotak

Is it worth having an extra metric from namespace_flows_total with the duplicate: false filter ?

jotak · 2023-07-03T08:50:19Z

Is it worth having an extra metric from namespace_flows_total with the duplicate: false filter ?

That raises an interesting point, personally I see this metric more as a Health metric than a Network insight metric: because it tells about flows and not about bytes or packets, which is an arbitrary measure for netobserv. If it's there for health reasons, filtering out duplicate doesn't make sense IMO (it's an indicator of the "load" that netobserv components deal with, answering the question: which namespaces are the most responsible of what netobserv is enduring? - removing duplicates would hide some of that load).

But that said, it's part of our NetObserv dashboard, not of NetObserv Health - which perhaps is a mistake ?

memodi · 2023-07-07T19:35:22Z

/ok-to-test

github-actions · 2023-07-07T19:41:02Z

New images:

quay.io/netobserv/network-observability-operator:06c9459
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-06c9459
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-06c9459

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-06c9459
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

memodi · 2023-07-07T22:21:53Z

/label qe-approved

jotak · 2023-07-10T07:29:43Z

/approve

openshift-ci · 2023-07-10T07:29:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jotak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

NETOBSERV-1131: Filter for Duplicate=false in metrics

e1f2c1a

jotak requested review from KalmanMeth, OlivierCazade and jpinsonneau June 29, 2023 17:13

openshift-ci-robot added the jira/valid-reference label Jun 29, 2023

jpinsonneau previously approved these changes Jul 3, 2023

View reviewed changes

openshift-ci bot assigned jpinsonneau Jul 3, 2023

openshift-ci bot added the lgtm label Jul 3, 2023

jotak mentioned this pull request Jul 3, 2023

NETOBSERV-1131: need to allow setting multiple filters netobserv/flowlogs-pipeline#448

Merged

KalmanMeth previously approved these changes Jul 5, 2023

View reviewed changes

openshift-ci bot assigned KalmanMeth Jul 5, 2023

bump flp

c38b09d

jotak dismissed stale reviews from KalmanMeth and jpinsonneau via c38b09d July 5, 2023 07:48

openshift-ci bot removed the lgtm label Jul 5, 2023

KalmanMeth approved these changes Jul 5, 2023

View reviewed changes

openshift-ci bot added the lgtm label Jul 5, 2023

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 7, 2023

openshift-ci bot added the qe-approved QE has approved this pull request label Jul 7, 2023

openshift-ci bot added the approved label Jul 10, 2023

openshift-merge-robot merged commit baa4259 into netobserv:main Jul 10, 2023
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NETOBSERV-1131: Filter for Duplicate=false in metrics #387

NETOBSERV-1131: Filter for Duplicate=false in metrics #387

jotak commented Jun 29, 2023

openshift-ci-robot commented Jun 29, 2023 •

edited by openshift-ci bot

codecov bot commented Jun 29, 2023 •

edited

jpinsonneau left a comment

jotak commented Jul 3, 2023

memodi commented Jul 7, 2023

github-actions bot commented Jul 7, 2023

memodi commented Jul 7, 2023

jotak commented Jul 10, 2023

openshift-ci bot commented Jul 10, 2023

NETOBSERV-1131: Filter for Duplicate=false in metrics #387

NETOBSERV-1131: Filter for Duplicate=false in metrics #387

Conversation

jotak commented Jun 29, 2023

openshift-ci-robot commented Jun 29, 2023 • edited by openshift-ci bot

codecov bot commented Jun 29, 2023 • edited

Codecov Report

jpinsonneau left a comment

Choose a reason for hiding this comment

jotak commented Jul 3, 2023

memodi commented Jul 7, 2023

github-actions bot commented Jul 7, 2023

memodi commented Jul 7, 2023

jotak commented Jul 10, 2023

openshift-ci bot commented Jul 10, 2023

openshift-ci-robot commented Jun 29, 2023 •

edited by openshift-ci bot

codecov bot commented Jun 29, 2023 •

edited