Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose metrics for prometheus to scrape #67

Merged
merged 4 commits into from
Apr 26, 2024

Conversation

gruyaume
Copy link
Contributor

@gruyaume gruyaume commented Apr 25, 2024

Description

Here we expose a metrics endpoint for Prometheus to scrape the network function. Right now, we are only exposing the default Go metrics, allowing users to know whether the service is running or not in addition to valuable information (ex. memory usage, num. of goroutines, etc).

Screenshot

For example, we can now have a dashboard that tells us the status of the network function:

image

Implementation

We take the same approach to metrics as is done in the AMF, we create a metrics/telemetry.go file and we instantiate the server during the service startup.

Notes

If approved, we will make similar PR's in every network function.

Future Considerations

With this in place, it will be straightforward to add bespoke metrics to the network function.

Reference

metrics/telemetry.go Outdated Show resolved Hide resolved
@ajaythakurintel
Copy link

Few quick questions

  1. I do not see any problem in supporting metrics endpoint in AUSF.
  2. I would be interested in knowing what metrics you are planning to expose.
  3. Does it make sense to send any of these new metrics to Metric Function Pod as well?
    Thanks & good start.

@gruyaume
Copy link
Contributor Author

gruyaume commented Apr 25, 2024

Few quick questions

  1. I do not see any problem in supporting metrics endpoint in AUSF.
  2. I would be interested in knowing what metrics you are planning to expose.
  3. Does it make sense to send any of these new metrics to Metric Function Pod as well?
    Thanks & good start.
  1. Awesome
  2. For the AUSF, in addition to the baseline go metrics which are already useful on their own (see screenshot above), we would like to maintain a ausf_ue_authentication CounterVec metric which will maintain the count of the UE authentication requests. Labels will include the serving network name, the authentication type and the result. Note that for security reasons, this metric will not include SUPI or SUCI information. Adding this metric would be done in a different PR.
  3. We did not plan on adding it to metrics func as we do not use this component.

@gruyaume gruyaume marked this pull request as ready for review April 25, 2024 21:21
@gab-arrobo
Copy link
Contributor

Few quick questions
3. Does it make sense to send any of these new metrics to Metric Function Pod as well?
Thanks & good start.

  1. We did not plan on adding it to metrics func as we do not use this component.

What is the approach you guys use to collect the metrics exposed by the different NFs (currently amf and smf)? Why do not you use the metricfunc component/NF? I am asking this because I am interested to have a better understanding of this and see if we can have a "unified" approach about collecting metrics :-)

@gruyaume
Copy link
Contributor Author

gruyaume commented Apr 25, 2024

Few quick questions
3. Does it make sense to send any of these new metrics to Metric Function Pod as well?
Thanks & good start.

  1. We did not plan on adding it to metrics func as we do not use this component.

What is the approach you guys use to collect the metrics exposed by the different NFs (currently amf and smf)? Why do not you use the metricfunc component/NF? I am asking this because I am interested to have a better understanding of this and see if we can have a "unified" approach about collecting metrics :-)

Our approach

In the namespace in which we have the control plane, we also deploy Grafana Agent which scrapes all of the network functions that expose metrics. We then integrate Grafana Agent with our Observability stack (Grafana, Prometheus, Loki) which runs in a separate namespace. This allows us to centralise observability (logs, metrics, alert rules, and dashboards).

Here's a crude visualisation:

image

Why we don't use metricsfunc

1. It prevents metrics being tied to their originators

It's important that metrics are tied to their originating network function, especially for system metrics (ex. up, memstat, etc.) Whether the AMF is up or not should only come from the AMF, not something in the middle. Plus our observability stack adds labels that associate each metrics with its Juju charm, namespace, and we want to keep this topology.

2. We don't benefit from it

Metricsfunc is an additional workload and we would not benefit from maintaining it. For us it'd be added effort for no benefits as the same metrics are already exposed by the individual network functions.

@thakurajayL
Copy link
Contributor

I am fine..Fix the conflict

thakurajayL
thakurajayL previously approved these changes Apr 26, 2024
@gab-arrobo
Copy link
Contributor

@gruyaume, please rebase this PR.
Also, do you expect additional PRs related to metrics exposure or other stuff in this repo (ausf)? or would it be good to create a patch release after your PR is merged?

@gruyaume
Copy link
Contributor Author

@gruyaume, please rebase this PR. Also, do you expect additional PRs related to metrics exposure or other stuff in this repo (ausf)? or would it be good to create a patch release after your PR is merged?

There will be another PR at some point to add the ausf_ue_authentication metric mentioned earlier, but this won't happen shortly. We will start by adding metrics to every nf before.

Copy link
Contributor

@gab-arrobo gab-arrobo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@gab-arrobo gab-arrobo merged commit c646646 into omec-project:master Apr 26, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants