NETOBSERV-579 prefixing all metrics, add new ones #314

jotak · 2022-09-23T16:28:12Z

Allow to prefix operational metrics (it introduces a global settings
for metrics, which will probably be extended later)
Unify metrics usage across stages: e.g. channel size was tracked only
in netflow ingester; now they share a common struct that generate
metrics. More metrics now include the "stage" label.
Adding new gauges for tracking channel sizes
Some renaming
Updated metrics doc

Some refactoring to make doc generation work with metrics that are not defined as globals (need to decouple metrics definition from its instantiation in prom registry) . Update doc to include labels, and sort alphabetically.

Also I had to move health & operationalMetrics into the same package to avoid repetitions like operationalmetrics.Metrics , which resulted also in move health_test into the pipeline package (to avoid cycle)

Breaking changes: the breaking changes are only about metric names (not API), see below

- Allow to prefix operational metrics (it introduces a global settings for metrics, which will probably be extended later) - Unify metrics usage across stages: e.g. channel size was tracked only in netflow ingester; now they share a common struct that generate metrics - Adding new gauges for tracking channel sizes - Some renaming - Updated metrics doc

jotak · 2022-09-23T16:52:57Z

Here's the details of the operational metrics
The names here don't show any prefix. When FLP is started from the operator, it will be configured with the "netobserv_" prefix, which hence will be added in front of all these metrics.
(Note that metrics produced by third party, e.g. goflow, are not affected by this change)

conntrack_input_records (unchanged)

Name	conntrack_input_records
Description	The total number of input records per classification.
Type	counter
Labels	classification

conntrack_memory_connections (unchanged)

Name	conntrack_memory_connections
Description	The total number of tracked connections in memory.
Type	gauge
Labels

conntrack_output_records (unchanged)

Name	conntrack_output_records
Description	The total number of output records.
Type	counter
Labels	type

encode_prom_errors (unchanged)

Name	encode_prom_errors
Description	Total errors during metrics generation
Type	counter
Labels	error, metric, key

ingest_flows_processed (renamed from "ingest_collector_flow_logs_processed", added stage label)

Name	ingest_flows_processed
Description	Number of flow logs processed by the ingester
Type	counter
Labels	stage

metrics_processed (renamed from "encode_prom_metrics_processed", added stage label)

Name	metrics_processed
Description	Number of metrics processed
Type	counter
Labels	stage

records_written (renamed from "loki_records_written", added stage label, now works also with kafka write)

Name	records_written
Description	Number of output records written
Type	counter
Labels	stage

stage_duration_ms (unchanged)

Name	stage_duration_ms
Description	Pipeline stage duration in milliseconds
Type	histogram
Labels	stage

stage_in_queue_size (renamed from ingest_collector_queue_length, generalized for all stages)

Name	stage_in_queue_size
Description	Pipeline stage input queue size (number of elements in queue)
Type	gauge
Labels	stage

stage_out_queue_size (new metric)

Name	stage_out_queue_size
Description	Pipeline stage output queue size (number of elements in queue)
Type	gauge
Labels	stage

jotak · 2022-09-26T06:55:12Z

pkg/pipeline/write/write_loki.go

 		}
 	}

+	// TODO / FIXME / FIGUREOUT: seems like we have 2 input channels for Loki? (this one, and see also pipeline_builder.go / getStageNode / StageWrite)


hey @mariomac / @eranra , what do you think about that? It seems than we have 2 chained channels here for write_loki input: this one and the one defined in pipeline_builder.go for all stages. Any reason for that? Should we get rid of this one?

This channel seems to act as a buffer to avoid blocking the previous pipeline stage. Probably we'd need to verify that removing the in channel wouldn't affect performance.

When we are able to use Go 1.18, maybe we could upgrade to an upper-upperstream version of the go-pipes library that allows defining the buffer length between pipeline stages.

codecov-commenter · 2022-09-27T05:53:38Z

Codecov Report

Merging #314 (08d3dbe) into main (d2d178b) will increase coverage by 0.46%.
The diff coverage is 73.72%.

@@            Coverage Diff             @@
##             main     #314      +/-   ##
==========================================
+ Coverage   69.65%   70.11%   +0.46%     
==========================================
  Files          82       84       +2     
  Lines        4729     4832     +103     
==========================================
+ Hits         3294     3388      +94     
+ Misses       1240     1229      -11     
- Partials      195      215      +20

Flag	Coverage Δ
unittests	`70.11% <73.72%> (+0.46%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
cmd/confgenerator/main.go	`0.00% <0.00%> (ø)`
cmd/flowlogs-pipeline/main.go	`0.00% <0.00%> (ø)`
cmd/operationalmetricstodoc/main.go	`0.00% <0.00%> (ø)`
pkg/confgen/config.go	`33.33% <ø> (ø)`
pkg/config/config.go	`50.00% <10.00%> (-13.64%)`	⬇️
pkg/operational/metrics.go	`66.01% <66.01%> (ø)`
pkg/pipeline/pipeline_builder.go	`70.03% <70.58%> (-0.98%)`	⬇️
pkg/pipeline/write/write_loki.go	`70.43% <75.00%> (-0.39%)`	⬇️
pkg/pipeline/encode/encode_prom.go	`77.82% <80.00%> (+0.45%)`	⬆️
pkg/pipeline/encode/encode_kafka.go	`69.86% <83.33%> (+0.84%)`	⬆️
... and 18 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

mariomac · 2022-09-27T07:25:14Z

pkg/pipeline/encode/encode_prom.go

+		if val, found := flow[info.Filter.Key]; found {
+			sVal, ok := val.(string)
+			if !ok {
+				sVal = fmt.Sprintf("%v", val)


Nit: maybe fmt.Sprint(val) is slightly faster.

ronensc · 2022-10-02T06:26:44Z

pkg/operational/metrics.go

+type MetricDefinition struct {
+	Name   string
+	Help   string
+	Type   metricType
+	Labels []string
+}


@jotak IIUC, Type metricType is only used by the auto-documentation.
So, theoretically, one can create a MetricDefinition of type Counter and pass it to NewGauge() causing a misleading documentation.
If I'm correct, is it worth adding a check on NewGauge()/NewCounter()/NewHistogram() to validate the type?

that's correct, sure we can do what you suggest

jotak force-pushed the add-queue-size-gauges branch from 6b19d3f to 04954c0 Compare September 23, 2022 16:36

jotak requested review from OlivierCazade, mariomac and ronensc September 23, 2022 16:36

jotak force-pushed the add-queue-size-gauges branch from 04954c0 to d4157c0 Compare September 23, 2022 16:41

jotak marked this pull request as ready for review September 23, 2022 16:58

jotak mentioned this pull request Sep 23, 2022

Decouple reporting of operational metrics from network metrics #302

Closed

jotak commented Sep 26, 2022

View reviewed changes

jotak mentioned this pull request Sep 27, 2022

NETOBSERV-579 Use netobserv prefix, add 4 new metrics, remove one netobserv/network-observability-operator#173

Merged

mariomac approved these changes Sep 27, 2022

View reviewed changes

Allow filtering metrics on numeric values

82dd179

jotak force-pushed the add-queue-size-gauges branch from 08d3dbe to 82dd179 Compare September 27, 2022 09:24

OlivierCazade approved these changes Sep 27, 2022

View reviewed changes

jotak added the breaking-change This pull request has breaking changes. They should be described in PR description. label Sep 27, 2022

jotak merged commit 0949fbe into netobserv:main Sep 27, 2022

ronensc reviewed Oct 2, 2022

View reviewed changes

ronensc mentioned this pull request Oct 11, 2022

Verify the type of operational metrics #325

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NETOBSERV-579 prefixing all metrics, add new ones #314

NETOBSERV-579 prefixing all metrics, add new ones #314

Uh oh!

jotak commented Sep 23, 2022 •

edited

Loading

Uh oh!

jotak commented Sep 23, 2022 •

edited

Loading

Uh oh!

jotak Sep 26, 2022

Uh oh!

mariomac Sep 26, 2022 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 27, 2022 •

edited

Loading

Uh oh!

mariomac Sep 27, 2022

Uh oh!

jotak Sep 27, 2022

Uh oh!

ronensc Oct 2, 2022

Uh oh!

jotak Oct 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NETOBSERV-579 prefixing all metrics, add new ones #314

NETOBSERV-579 prefixing all metrics, add new ones #314

Uh oh!

Conversation

jotak commented Sep 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jotak commented Sep 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

conntrack_input_records (unchanged)

conntrack_memory_connections (unchanged)

conntrack_output_records (unchanged)

encode_prom_errors (unchanged)

ingest_flows_processed (renamed from "ingest_collector_flow_logs_processed", added stage label)

metrics_processed (renamed from "encode_prom_metrics_processed", added stage label)

records_written (renamed from "loki_records_written", added stage label, now works also with kafka write)

stage_duration_ms (unchanged)

stage_in_queue_size (renamed from ingest_collector_queue_length, generalized for all stages)

stage_out_queue_size (new metric)

Uh oh!

jotak Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

mariomac Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mariomac Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

jotak Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

ronensc Oct 2, 2022

Choose a reason for hiding this comment

Uh oh!

jotak Oct 5, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jotak commented Sep 23, 2022 •

edited

Loading

jotak commented Sep 23, 2022 •

edited

Loading

mariomac Sep 26, 2022 •

edited

Loading

codecov-commenter commented Sep 27, 2022 •

edited

Loading