feat: expose metrics for prometheus to scrape #67

gruyaume · 2024-04-25T20:25:21Z

Description

Here we expose a metrics endpoint for Prometheus to scrape the network function. Right now, we are only exposing the default Go metrics, allowing users to know whether the service is running or not in addition to valuable information (ex. memory usage, num. of goroutines, etc).

Screenshot

For example, we can now have a dashboard that tells us the status of the network function:

Implementation

We take the same approach to metrics as is done in the AMF, we create a metrics/telemetry.go file and we instantiate the server during the service startup.

Notes

If approved, we will make similar PR's in every network function.

Future Considerations

With this in place, it will be straightforward to add bespoke metrics to the network function.

Reference

https://prometheus.io/docs/guides/go-application/

metrics/telemetry.go

ajaythakurintel · 2024-04-25T20:55:38Z

Few quick questions

I do not see any problem in supporting metrics endpoint in AUSF.
I would be interested in knowing what metrics you are planning to expose.
Does it make sense to send any of these new metrics to Metric Function Pod as well?
Thanks & good start.

gruyaume · 2024-04-25T21:11:37Z

Few quick questions

I do not see any problem in supporting metrics endpoint in AUSF.

I would be interested in knowing what metrics you are planning to expose.

Does it make sense to send any of these new metrics to Metric Function Pod as well?
Thanks & good start.

Awesome
For the AUSF, in addition to the baseline go metrics which are already useful on their own (see screenshot above), we would like to maintain a ausf_ue_authentication CounterVec metric which will maintain the count of the UE authentication requests. Labels will include the serving network name, the authentication type and the result. Note that for security reasons, this metric will not include SUPI or SUCI information. Adding this metric would be done in a different PR.
We did not plan on adding it to metrics func as we do not use this component.

gab-arrobo · 2024-04-25T21:27:42Z

Few quick questions
3. Does it make sense to send any of these new metrics to Metric Function Pod as well?
Thanks & good start.

We did not plan on adding it to metrics func as we do not use this component.

What is the approach you guys use to collect the metrics exposed by the different NFs (currently amf and smf)? Why do not you use the metricfunc component/NF? I am asking this because I am interested to have a better understanding of this and see if we can have a "unified" approach about collecting metrics :-)

gruyaume · 2024-04-25T21:39:01Z

Few quick questions
3. Does it make sense to send any of these new metrics to Metric Function Pod as well?
Thanks & good start.

We did not plan on adding it to metrics func as we do not use this component.

What is the approach you guys use to collect the metrics exposed by the different NFs (currently amf and smf)? Why do not you use the metricfunc component/NF? I am asking this because I am interested to have a better understanding of this and see if we can have a "unified" approach about collecting metrics :-)

Our approach

In the namespace in which we have the control plane, we also deploy Grafana Agent which scrapes all of the network functions that expose metrics. We then integrate Grafana Agent with our Observability stack (Grafana, Prometheus, Loki) which runs in a separate namespace. This allows us to centralise observability (logs, metrics, alert rules, and dashboards).

Here's a crude visualisation:

Why we don't use metricsfunc

1. It prevents metrics being tied to their originators

It's important that metrics are tied to their originating network function, especially for system metrics (ex. up, memstat, etc.) Whether the AMF is up or not should only come from the AMF, not something in the middle. Plus our observability stack adds labels that associate each metrics with its Juju charm, namespace, and we want to keep this topology.

2. We don't benefit from it

Metricsfunc is an additional workload and we would not benefit from maintaining it. For us it'd be added effort for no benefits as the same metrics are already exposed by the individual network functions.

thakurajayL · 2024-04-26T15:44:41Z

I am fine..Fix the conflict

gab-arrobo · 2024-04-26T15:48:01Z

@gruyaume, please rebase this PR.
Also, do you expect additional PRs related to metrics exposure or other stuff in this repo (ausf)? or would it be good to create a patch release after your PR is merged?

gruyaume · 2024-04-26T16:19:36Z

@gruyaume, please rebase this PR. Also, do you expect additional PRs related to metrics exposure or other stuff in this repo (ausf)? or would it be good to create a patch release after your PR is merged?

There will be another PR at some point to add the ausf_ue_authentication metric mentioned earlier, but this won't happen shortly. We will start by adding metrics to every nf before.

gab-arrobo

+1

feat: expose metrics for prometheus to scrape

2209add

gab-arrobo reviewed Apr 25, 2024

View reviewed changes

metrics/telemetry.go Outdated Show resolved Hide resolved

chore: use correct logger

d9e6453

gruyaume requested a review from gab-arrobo April 25, 2024 21:21

gruyaume marked this pull request as ready for review April 25, 2024 21:21

gab-arrobo requested a review from thakurajayL April 25, 2024 21:22

thakurajayL previously approved these changes Apr 26, 2024

View reviewed changes

Merge branch 'master' into dev-metrics

a4ab847

gruyaume dismissed thakurajayL’s stale review via a4ab847 April 26, 2024 16:17

chore: run go mod tidy

4f37533

gab-arrobo requested a review from thakurajayL April 26, 2024 16:20

gab-arrobo approved these changes Apr 26, 2024

View reviewed changes

gab-arrobo merged commit c646646 into omec-project:master Apr 26, 2024
8 checks passed

gruyaume mentioned this pull request Jun 10, 2024

feat: add prometheus integration canonical/sdcore-ausf-k8s-operator#211

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose metrics for prometheus to scrape #67

feat: expose metrics for prometheus to scrape #67

gruyaume commented Apr 25, 2024 •

edited

Loading

ajaythakurintel commented Apr 25, 2024

gruyaume commented Apr 25, 2024 •

edited

Loading

gab-arrobo commented Apr 25, 2024

gruyaume commented Apr 25, 2024 •

edited

Loading

thakurajayL commented Apr 26, 2024

gab-arrobo commented Apr 26, 2024

gruyaume commented Apr 26, 2024

gab-arrobo left a comment

feat: expose metrics for prometheus to scrape #67

feat: expose metrics for prometheus to scrape #67

Conversation

gruyaume commented Apr 25, 2024 • edited Loading

Description

Screenshot

Implementation

Notes

Future Considerations

Reference

ajaythakurintel commented Apr 25, 2024

gruyaume commented Apr 25, 2024 • edited Loading

gab-arrobo commented Apr 25, 2024

gruyaume commented Apr 25, 2024 • edited Loading

Our approach

Why we don't use metricsfunc

1. It prevents metrics being tied to their originators

2. We don't benefit from it

thakurajayL commented Apr 26, 2024

gab-arrobo commented Apr 26, 2024

gruyaume commented Apr 26, 2024

gab-arrobo left a comment

Choose a reason for hiding this comment

gruyaume commented Apr 25, 2024 •

edited

Loading

gruyaume commented Apr 25, 2024 •

edited

Loading

gruyaume commented Apr 25, 2024 •

edited

Loading