Skip to content

[Blog] Prometheus blog post ideas around extending Prometheus #2997

@wbollock

Description

@wbollock

Hello! I'm interested in writing a post for the Prometheus blog and would like some feedback on topic ideas. I have a few ideas I'm considering mostly around how my organization uses Prometheus. I'm happy to flesh out any of these ideas more if they are appealing and would be a good fit for the blog.

Idea 1: Jsonnet

My organization uses jsonnet to manage Prometheus rules at scale. Right now we have roughly 2000 alert rules and 200 recording rules created via jsonnet. We've also recently switched to sjsonnet for a ~20x templating speed improvement. jsonnet helps us keep our rules DRY and enable lots of neat templating and configurations.

Idea 2: Sloth

slok/sloth is an application we use to generate SLOs with PromQL (disclaimer: I am a maintainer for a small part of the codebase). A simple Sloth SLO configuration will help generate 27 or so recording rules to help calculate error budget and enable MWMB rate alerting. We have hundreds of SLOs and have used Sloth for a long time. Other tools in this space include pyrra and OpenSLO is a related project. I would be more comfortable discussing extensive use of Sloth but am wary of having an official Prometheus blog post that "picks a winner" so-to-speak so can discuss all related tools, also.

Idea 3: Pint

cloudflare/pint is a Prometheus rule linter that is so valuable to help maintain a large Prometheus rules code base from contributors of varying PromQL skill levels. Pint helps check for common footguns and best practices with a great library of built-in checks. Pint will help tell PR authors whether their metrics even exist, how many alerts might fire right away for their proposed alert, the cost of expensive Prometheus recording rules, and much more.

Idea 4: Combine all three^ (or a combination of them) Prometheus ecosystem "extensions"

It could also be one blog post talking about all three of those Prometheus extensions in combination. They've really helped us internally maintain a giant repo of in-house Prometheus rules with over 200 contributors!

Idea 5: SLO Review

I'd also be interested in writing a post on SLO Review using Prometheus metrics. This would likely need to include Sloth and/or Pyrra discussion to talk about generating SLOs with PromQL. From Alex Hidalgo's Implementing Service Level Objectives:

SLO reviews
Pick a service and review its SLOs. Suggest improvements, and document the team’s implementation efforts as a case study.

This would be a little tougher to make solely about Prometheus and not the concept of SLOs in general and SRE related guidance. I've been conducting SLO reviews for 2+ years and have a lot to say about their purpose, effectiveness, and getting eyes on your metrics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions