Skip to content

NO-ISSUE: Bump openshift/prometheus-alertmanager to v0.32.0#129

Merged
openshift-merge-bot[bot] merged 148 commits intoopenshift:mainfrom
simonpasquier:pr/128
Apr 24, 2026
Merged

NO-ISSUE: Bump openshift/prometheus-alertmanager to v0.32.0#129
openshift-merge-bot[bot] merged 148 commits intoopenshift:mainfrom
simonpasquier:pr/128

Conversation

@simonpasquier
Copy link
Copy Markdown

No description provided.

TheMeier and others added 30 commits January 21, 2026 08:27
Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com>
Signed-off-by: prombot <prometheus-team@googlegroups.com>
* docs: add new maintainers

---------

Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Ethan Hunter <fc.spaceman@gmail.com>
HRT uses this internally, and have positive experiences with this
project. Thus, this change facilitates them upstreaming their changes,
and makes are project easier to work with.

`buf` also has some nice additional tooling, which we could make use of.

Finally, we want to get rid off `gogoproto`, and having tooling will be
helpful.

Fixes: prometheus#4945

Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
This change does not include any fixes to `config.Load`. Those fixes
should be handled separately, since a project is never completely
fuzzed.

We also don't add any change to the `.github/workflows`. This fuzzer
currently has findings, which it detects frequently. Thus, even a simple
smoke test would be very noisy. Moreover, the goal is to use the
infrastructure of the OSS-fuzz project rather than GitHubs.

`go` ensures that the fuzzer can pass at least once, with the default
data provided. Therefore, `make test` treats `FuzzLoad` as a trivial
unit-test. This avoids fuzzing targets becoming outdated.

Related to prometheus#4912

Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
Signed-off-by: prombot <prometheus-team@googlegroups.com>
Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
To be able to move notifier specific code out of the config and notify packages,
some shared code needs to be moved to other/new packages. Since commoncfg.Secret
already does what config.Secret does, this replaces all usages of config.Secret.

Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com>
while updating a pointer is mostly atomic, better to be safe and use the
proper type.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
* add NotificationReason to notification context
* improve dedup stage to include the reason to notify
* fix bugs, add tests, address code review

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Add omitempty tag to Text field to prevent sending empty text values
to Mattermost, which causes 400 errors.
Per the Mattermost API spec, text is only required if attachments are not set.

Related to prometheus-operator/prometheus-operator#8363

Signed-off-by: Jayapriya Pai <janantha@redhat.com>
- migrate to google.golang.org/protobuf
- drop github.com/gogo/protobuf
- drop github.com/matttproud/golang_protobuf_extensions

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* move buf config into a single file

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

* update generation script

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

---------

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* [dispatcher] benchmark suite

Simple benchmark testing the dispatcher Groups() call, and the
dispatcher ingestion, indirectly via the mem alert provider.

While the mem alert provider takes a lock and then puts the alerts into
a channel, it does block if the dispatcher is not processing the alerts,
so this way we measure the backpressure and how much we can ingest
alerts with concurrency.

This is the result of these patches on main, and after all the
concurrent dispatcher series:

goos: linux
goarch: amd64
pkg: github.com/prometheus/alertmanager/dispatch
cpu: AMD EPYC Processor (with IBPB)
                                                                 │ bench_dispatch_main.txt │    bench_dispatch_concurrent.txt    │
                                                                 │         sec/op          │    sec/op     vs base               │
Groups/500_routes,_5000_groups-40                                             47.34m ±  4%   47.66m ±  3%        ~ (p=0.394 n=6)
Groups/120_routes,_10000_groups-40                                            116.4m ±  1%   117.3m ±  3%   +0.79% (p=0.026 n=6)
IngestionUnderGroupsLoad/500_routes,_0/s_Groups()_callers-40                  5.495m ± 57%   2.290m ± 54%  -58.33% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_1_10/s_Groups()_callers-40               5.591m ±  5%   2.291m ± 17%  -59.02% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_10_10/s_Groups()_callers-40              6.760m ±  7%   3.117m ± 38%  -53.90% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_25_10/s_Groups()_callers-40              12.47m ± 15%   17.70m ± 70%        ~ (p=0.394 n=6)
IngestionUnderGroupsLoad/500_routes,_25_20/s_Groups()_callers-40             11.872m ± 33%   7.978m ± 76%  -32.80% (p=0.041 n=6)
geomean                                                                       14.98m         10.37m        -30.77%

                                                                 │ bench_dispatch_main.txt │    bench_dispatch_concurrent.txt     │
                                                                 │          B/op           │     B/op       vs base               │
Groups/500_routes,_5000_groups-40                                            9.275Mi ±  0%   9.132Mi ±  1%   -1.54% (p=0.002 n=6)
Groups/120_routes,_10000_groups-40                                           18.98Mi ±  0%   18.76Mi ±  0%   -1.15% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_0/s_Groups()_callers-40                 4.604Mi ±  1%   4.614Mi ±  1%        ~ (p=0.240 n=6)
IngestionUnderGroupsLoad/500_routes,_1_10/s_Groups()_callers-40              5.197Mi ±  4%   4.831Mi ±  3%   -7.04% (p=0.004 n=6)
IngestionUnderGroupsLoad/500_routes,_10_10/s_Groups()_callers-40            10.414Mi ± 14%   6.856Mi ± 16%  -34.16% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_25_10/s_Groups()_callers-40             27.43Mi ± 11%   27.38Mi ± 63%        ~ (p=0.937 n=6)
IngestionUnderGroupsLoad/500_routes,_25_20/s_Groups()_callers-40             27.21Mi ± 33%   12.93Mi ± 85%  -52.49% (p=0.002 n=6)
geomean                                                                      11.85Mi         9.892Mi        -16.50%

                                                                 │ bench_dispatch_main.txt │    bench_dispatch_concurrent.txt     │
                                                                 │        allocs/op        │   allocs/op    vs base               │
Groups/500_routes,_5000_groups-40                                             210.7k ±  0%   209.2k ±   2%        ~ (p=0.065 n=6)
Groups/120_routes,_10000_groups-40                                            440.8k ±  0%   437.5k ±   1%   -0.77% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_0/s_Groups()_callers-40                  18.89k ±  0%   19.13k ±   1%   +1.30% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_1_10/s_Groups()_callers-40               36.70k ± 21%   25.07k ±  23%  -31.67% (p=0.004 n=6)
IngestionUnderGroupsLoad/500_routes,_10_10/s_Groups()_callers-40             189.57k ± 22%   85.69k ±  38%  -54.79% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_25_10/s_Groups()_callers-40              660.1k ± 12%   692.4k ±  73%        ~ (p=0.937 n=6)
IngestionUnderGroupsLoad/500_routes,_25_20/s_Groups()_callers-40              657.4k ± 37%   259.0k ± 123%  -60.61% (p=0.002 n=6)
geomean                                                                       176.3k         131.4k         -25.50%

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* Update copyright line

Co-authored-by: Siavash Safi <git@hosted.run>
Signed-off-by: Guido Trotter <ultrotter@gmail.com>

* Fix number of routes in the second Groups test

Co-authored-by: Siavash Safi <git@hosted.run>
Signed-off-by: Guido Trotter <ultrotter@gmail.com>

---------

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Guido Trotter <ultrotter@gmail.com>
Co-authored-by: Guido Trotter <guido@hudson-trading.com>
Co-authored-by: Siavash Safi <git@hosted.run>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* add support multiple matcher set silences

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
… the tree is always the top index

This allows us to have a count for each route, and to be able to move
the route indexing from "pointer to the route object" to its integer
index.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
…t is only used internally for the recursion)

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
- Add a new stopped state
- Remove the mtx lock
- Change aggrGroupsPerRoute map to a new preallocated slice
- Change aggrGroupsNum to an atomic int
- Change done chan to a waitgroup
- Change state to atomic

Add a new type holding the lock at the map fingerprint to aggrgroup
level (tbd if we can make this a sync.Map)

Preallocate the route slice and its content objects

Remove the WaitForLoading call in Groups, which is redundant after
LoadingDone.

Remove copying the immutable slice in Groups, now only copy the inside
maps, holding their individual lock. This also saves copying if there is
a route filter, which can be checked without holding a lock.

Simplify Stop() to not need a lock by storing the state atomically then
calling cancel(). Stop is safe to call more times since cancel() is safe
to call more times, and waiting on the finished channel can be done more
times, unlike listening on the done chan.

In groupAlert only hold the lock for the current route slice. Note that
the limit check is done holding only this lock. This is safe now as only
one alert can be ingested at a time, but if we enable parallel alert
ingestion with N workers, we may overshoot the limit by N. We deem this
to be ok as N is smaller than the number of workers, and the limit is a
safety to avoid too many AGs, not something that will substantially
break between e.g. 1000 and 1016. We will update the documentation about
the limit when we will make the ingestion parallel. We could use Add()
then check, and then undershoot the limit, or CompareAndSwap and retry
but that's making performance worse.

Dispatch tests needed fixes to add the Idx in the manually created
Route, and to not pass a nil Route. The way the maintenance test
populates the aggrgroup also changes with the new system.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Since the waitgroup had Add(1) in New, and then was Done() during Run(),
it changes nothing to instead create the channel in New, and close it
during Run(). This avoid creating a separate goroutine to listen to its
closure during Groups.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
* [dispatcher] Convert groups to a sync map

This allows us to avoid taking the lock in most cases.

The exception is that we still need it when adding an aggrgroup, as
multiple goroutines might be otherwise trying to add theirs, for the
same fp.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* Even with sync map snapshot the groups in each route to avoid contention when the map is dirty

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* Remove lock outside of the per route sync.Map

Use sync map swap operations, with max retries, to avoid one extra lock.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* keep groups len in routeAggrGroups

This allows us to allocate the right amount of memory when we read them
for Groups(). The number may not be precise, but will work in most
cases.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* Use a constant instead of the concurrency field for the Groups() slice copy extra size

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* cast sync.Map Range value to aggrGroup only once

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* Remove 'dontdestroy' property of the alert store

- just pass a DestroyIfEmpty value to DeleteIfNotModified
- never destroy in GC

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

---------

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Co-authored-by: Guido Trotter <guido@hudson-trading.com>
* [dispatcher] increase concurrency of alert ingestion

We create multiple goroutines on Run(), one for maintenance, one for
start timer, and N for ingestion.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

* decrease concurrency to avoid some contention on channels and ag locks

Signed-off-by: Guido Trotter <guido@hudson-trading.com>

---------

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Co-authored-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
@openshift-ci openshift-ci Bot requested review from rexagod and slashpai April 23, 2026 09:19
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2026
@simonpasquier simonpasquier force-pushed the pr/128 branch 2 times, most recently from 84d11ab to 06693bf Compare April 23, 2026 13:23
@simonpasquier
Copy link
Copy Markdown
Author

/test test

We need to commit the frontend assets in order for the build to succeed
in our downstream environment.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@simonpasquier
Copy link
Copy Markdown
Author

/test test

@simonpasquier
Copy link
Copy Markdown
Author

/verified by tests

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@simonpasquier: This PR has been marked as verified by tests.

Details

In response to this:

/verified by tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@simonpasquier
Copy link
Copy Markdown
Author

/test e2e-agnostic-cmo

@simonpasquier
Copy link
Copy Markdown
Author

/hold

I want to test with cluster-bot that the UI is working as expected using port-forward.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2026
@simonpasquier
Copy link
Copy Markdown
Author

/hold cancel

I didn't manage to spin up a cluster with cluster-bot but the UI looks good after building and testing locally on my laptop. It'd be easier to merge the PR and check with the next nightly payload.

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2026
@simonpasquier
Copy link
Copy Markdown
Author

/hold cancel

I didn't manage to spin up a cluster with cluster-bot but the UI looks good after building and testing locally on my laptop. It'd be easier to merge the PR and check with the next nightly payload.

My cluster eventually launched and the UI looks good:

image

@simonpasquier
Copy link
Copy Markdown
Author

/verified by @simonpasquier

@openshift-ci-robot
Copy link
Copy Markdown

@simonpasquier: This PR has been marked as verified by @simonpasquier.

Details

In response to this:

/verified by @simonpasquier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@slashpai
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 24, 2026
@danielmellado
Copy link
Copy Markdown

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielmellado, simonpasquier, slashpai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [danielmellado,simonpasquier,slashpai]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 1af91d5 and 2 for PR HEAD 23db1f4 in total

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 24, 2026

@simonpasquier: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 50fcc4a into openshift:main Apr 24, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.