Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric hacky friday #455

Merged
merged 21 commits into from Jul 26, 2023
Merged

Conversation

gertzakis
Copy link
Contributor

@gertzakis gertzakis commented Mar 24, 2023

Some basic metrics for Golden Config plugin #453. @Kircheneer helped me and had the baseline for me :)

metric_gc_jobs exposes The successful vs failed GC related jobs
metric_golden_config exposes number of devices that are configured per GC feature
metric_compliance exposes number of compliant vs non-compliant devices per feature

Below there is the exposed metrics in /metrics/

# HELP nautobot_gc_backup_total Nautobot Golden Config Backups
# TYPE nautobot_gc_backup_total gauge
nautobot_gc_backup_total{seconds="86400",status="success"} 0.0
nautobot_gc_backup_total{seconds="86400",status="failure"} 1.0
# HELP nautobot_gc_intended_total Nautobot Golden Config Intended
# TYPE nautobot_gc_intended_total gauge
nautobot_gc_intended_total{seconds="86400",status="success"} 1.0
nautobot_gc_intended_total{seconds="86400",status="failure"} 0.0
# HELP nautobot_gc_compliance_total Nautobot Golden Config Compliance
# TYPE nautobot_gc_compliance_total gauge
nautobot_gc_compliance_total{seconds="86400",status="success"} 1.0
nautobot_gc_compliance_total{seconds="86400",status="failure"} 0.0
# HELP nautobot_gc_devices_per_feature Nautobot Golden Config Devices per feature
# TYPE nautobot_gc_devices_per_feature gauge
nautobot_gc_devices_per_feature{device="syslog"} 1.0
# HELP nautobot_gc_compliant_devices_by_feature_total Nautobot Golden Config Compliance
# TYPE nautobot_gc_compliant_devices_by_feature_total gauge
nautobot_gc_compliant_devices_by_feature_total{compliant="true",feature="syslog"} 1.0
nautobot_gc_compliant_devices_by_feature_total{compliant="false",feature="syslog"} 0.0

Also, I update a lot of files for passing black, and the CI to use Nautobot version 1.5.13 in order to have the metrics functionality.

backup_gauges = GaugeMetricFamily(
"nautobot_gc_backup_total", "Nautobot Golden Config Backups", labels=["seconds", "status"]
)
time_delta_to_include = PLUGIN_SETTINGS.get("metrics", {}).get("time_delta", timedelta(days=1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the last day the most common metric? I would have thought it would have been just the status in general

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what you propose is

success_count = GoldenConfig.objects.filter(backup_last_attempt_date=F('backup_last_success_date')).count()
_attempt_count = GoldenConfig.objects.filter(backup_last_attempt_date__isnull=False).count()
failure_count = _attempt_count - success_count

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status of the last Job run is exposed from the capacity_metrics (which will be migrated into Nautobot core), example visualization here. I like exposing the counts of successful/attempted Job runs within a given interval, because those metrics will be stored in Prometheus and then you may query/process them as you like (e.g. visualize 1 year).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion and agreed that on the operational side of things the metric that Ken proposed would be easier to handle. The time dimension then comes from the time-series database. We just have a gauge that displays the current amount of successful/failing backups/intended/compliance/etc., does that make sense? The key point here is GoldenConfig object status vs. JobResult object status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! and much better said than I could have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I've updated the metrics to reflect your comments. Please check again!

Copy link
Contributor

@nkallergis nkallergis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good after the latest commit.

@gertzakis gertzakis mentioned this pull request Apr 4, 2023
11 tasks
@itdependsnetworks
Copy link
Contributor

Can you address the merge conflicts? We may get one more release out before adding this, still thinking about it.

@itdependsnetworks
Copy link
Contributor

We are good in principle here, however, just determining which release.

@chadell
Copy link
Contributor

chadell commented Jul 26, 2023

Any plans to release this?

@itdependsnetworks itdependsnetworks merged commit b48a3fe into nautobot:next Jul 26, 2023
16 checks passed
jmpettit pushed a commit to jmpettit/nautobot-app-golden-config that referenced this pull request Jan 30, 2024
* POC for Prometheus instrumentation

* Advance POC.

* add compliance job metric

* forgot yield

* add simple metric rules per feature

* added docstring

* clean up stuff

* cleanup

* update invoke

* fix pylint

* fix black

* fix CI for 1.5.3 for metrics

* fix pylint

* fix metrics based on comments

* fix metrics

* update lock file

* fix linters

* update `poetry.lock`

---------

Co-authored-by: Leo Kirchner <leo.kirchner98@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants