NO-ISSUE: Bump openshift/prometheus-alertmanager to v0.32.0#129
NO-ISSUE: Bump openshift/prometheus-alertmanager to v0.32.0#129openshift-merge-bot[bot] merged 148 commits intoopenshift:mainfrom
Conversation
Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com>
Signed-off-by: prombot <prometheus-team@googlegroups.com>
* docs: add new maintainers --------- Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com> Signed-off-by: gotjosh <josue.abreu@gmail.com> Co-authored-by: gotjosh <josue.abreu@gmail.com> Co-authored-by: Ethan Hunter <fc.spaceman@gmail.com>
chore: setup stale action
HRT uses this internally, and have positive experiences with this project. Thus, this change facilitates them upstreaming their changes, and makes are project easier to work with. `buf` also has some nice additional tooling, which we could make use of. Finally, we want to get rid off `gogoproto`, and having tooling will be helpful. Fixes: prometheus#4945 Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
This change does not include any fixes to `config.Load`. Those fixes should be handled separately, since a project is never completely fuzzed. We also don't add any change to the `.github/workflows`. This fuzzer currently has findings, which it detects frequently. Thus, even a simple smoke test would be very noisy. Moreover, the goal is to use the infrastructure of the OSS-fuzz project rather than GitHubs. `go` ensures that the fuzzer can pass at least once, with the default data provided. Therefore, `make test` treats `FuzzLoad` as a trivial unit-test. This avoids fuzzing targets becoming outdated. Related to prometheus#4912 Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
Signed-off-by: prombot <prometheus-team@googlegroups.com>
Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
To be able to move notifier specific code out of the config and notify packages, some shared code needs to be moved to other/new packages. Since commoncfg.Secret already does what config.Secret does, this replaces all usages of config.Secret. Signed-off-by: Christoph Maser <christoph.maser+github@gmail.com>
while updating a pointer is mostly atomic, better to be safe and use the proper type. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
* add NotificationReason to notification context * improve dedup stage to include the reason to notify * fix bugs, add tests, address code review Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Add omitempty tag to Text field to prevent sending empty text values to Mattermost, which causes 400 errors. Per the Mattermost API spec, text is only required if attachments are not set. Related to prometheus-operator/prometheus-operator#8363 Signed-off-by: Jayapriya Pai <janantha@redhat.com>
- migrate to google.golang.org/protobuf - drop github.com/gogo/protobuf - drop github.com/matttproud/golang_protobuf_extensions Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* move buf config into a single file Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com> * update generation script Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com> --------- Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* [dispatcher] benchmark suite
Simple benchmark testing the dispatcher Groups() call, and the
dispatcher ingestion, indirectly via the mem alert provider.
While the mem alert provider takes a lock and then puts the alerts into
a channel, it does block if the dispatcher is not processing the alerts,
so this way we measure the backpressure and how much we can ingest
alerts with concurrency.
This is the result of these patches on main, and after all the
concurrent dispatcher series:
goos: linux
goarch: amd64
pkg: github.com/prometheus/alertmanager/dispatch
cpu: AMD EPYC Processor (with IBPB)
│ bench_dispatch_main.txt │ bench_dispatch_concurrent.txt │
│ sec/op │ sec/op vs base │
Groups/500_routes,_5000_groups-40 47.34m ± 4% 47.66m ± 3% ~ (p=0.394 n=6)
Groups/120_routes,_10000_groups-40 116.4m ± 1% 117.3m ± 3% +0.79% (p=0.026 n=6)
IngestionUnderGroupsLoad/500_routes,_0/s_Groups()_callers-40 5.495m ± 57% 2.290m ± 54% -58.33% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_1_10/s_Groups()_callers-40 5.591m ± 5% 2.291m ± 17% -59.02% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_10_10/s_Groups()_callers-40 6.760m ± 7% 3.117m ± 38% -53.90% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_25_10/s_Groups()_callers-40 12.47m ± 15% 17.70m ± 70% ~ (p=0.394 n=6)
IngestionUnderGroupsLoad/500_routes,_25_20/s_Groups()_callers-40 11.872m ± 33% 7.978m ± 76% -32.80% (p=0.041 n=6)
geomean 14.98m 10.37m -30.77%
│ bench_dispatch_main.txt │ bench_dispatch_concurrent.txt │
│ B/op │ B/op vs base │
Groups/500_routes,_5000_groups-40 9.275Mi ± 0% 9.132Mi ± 1% -1.54% (p=0.002 n=6)
Groups/120_routes,_10000_groups-40 18.98Mi ± 0% 18.76Mi ± 0% -1.15% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_0/s_Groups()_callers-40 4.604Mi ± 1% 4.614Mi ± 1% ~ (p=0.240 n=6)
IngestionUnderGroupsLoad/500_routes,_1_10/s_Groups()_callers-40 5.197Mi ± 4% 4.831Mi ± 3% -7.04% (p=0.004 n=6)
IngestionUnderGroupsLoad/500_routes,_10_10/s_Groups()_callers-40 10.414Mi ± 14% 6.856Mi ± 16% -34.16% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_25_10/s_Groups()_callers-40 27.43Mi ± 11% 27.38Mi ± 63% ~ (p=0.937 n=6)
IngestionUnderGroupsLoad/500_routes,_25_20/s_Groups()_callers-40 27.21Mi ± 33% 12.93Mi ± 85% -52.49% (p=0.002 n=6)
geomean 11.85Mi 9.892Mi -16.50%
│ bench_dispatch_main.txt │ bench_dispatch_concurrent.txt │
│ allocs/op │ allocs/op vs base │
Groups/500_routes,_5000_groups-40 210.7k ± 0% 209.2k ± 2% ~ (p=0.065 n=6)
Groups/120_routes,_10000_groups-40 440.8k ± 0% 437.5k ± 1% -0.77% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_0/s_Groups()_callers-40 18.89k ± 0% 19.13k ± 1% +1.30% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_1_10/s_Groups()_callers-40 36.70k ± 21% 25.07k ± 23% -31.67% (p=0.004 n=6)
IngestionUnderGroupsLoad/500_routes,_10_10/s_Groups()_callers-40 189.57k ± 22% 85.69k ± 38% -54.79% (p=0.002 n=6)
IngestionUnderGroupsLoad/500_routes,_25_10/s_Groups()_callers-40 660.1k ± 12% 692.4k ± 73% ~ (p=0.937 n=6)
IngestionUnderGroupsLoad/500_routes,_25_20/s_Groups()_callers-40 657.4k ± 37% 259.0k ± 123% -60.61% (p=0.002 n=6)
geomean 176.3k 131.4k -25.50%
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
* Update copyright line
Co-authored-by: Siavash Safi <git@hosted.run>
Signed-off-by: Guido Trotter <ultrotter@gmail.com>
* Fix number of routes in the second Groups test
Co-authored-by: Siavash Safi <git@hosted.run>
Signed-off-by: Guido Trotter <ultrotter@gmail.com>
---------
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Guido Trotter <ultrotter@gmail.com>
Co-authored-by: Guido Trotter <guido@hudson-trading.com>
Co-authored-by: Siavash Safi <git@hosted.run>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* add support multiple matcher set silences Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
… the tree is always the top index This allows us to have a count for each route, and to be able to move the route indexing from "pointer to the route object" to its integer index. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
…t is only used internally for the recursion) Signed-off-by: Guido Trotter <guido@hudson-trading.com>
- Add a new stopped state - Remove the mtx lock - Change aggrGroupsPerRoute map to a new preallocated slice - Change aggrGroupsNum to an atomic int - Change done chan to a waitgroup - Change state to atomic Add a new type holding the lock at the map fingerprint to aggrgroup level (tbd if we can make this a sync.Map) Preallocate the route slice and its content objects Remove the WaitForLoading call in Groups, which is redundant after LoadingDone. Remove copying the immutable slice in Groups, now only copy the inside maps, holding their individual lock. This also saves copying if there is a route filter, which can be checked without holding a lock. Simplify Stop() to not need a lock by storing the state atomically then calling cancel(). Stop is safe to call more times since cancel() is safe to call more times, and waiting on the finished channel can be done more times, unlike listening on the done chan. In groupAlert only hold the lock for the current route slice. Note that the limit check is done holding only this lock. This is safe now as only one alert can be ingested at a time, but if we enable parallel alert ingestion with N workers, we may overshoot the limit by N. We deem this to be ok as N is smaller than the number of workers, and the limit is a safety to avoid too many AGs, not something that will substantially break between e.g. 1000 and 1016. We will update the documentation about the limit when we will make the ingestion parallel. We could use Add() then check, and then undershoot the limit, or CompareAndSwap and retry but that's making performance worse. Dispatch tests needed fixes to add the Idx in the manually created Route, and to not pass a nil Route. The way the maintenance test populates the aggrgroup also changes with the new system. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Since the waitgroup had Add(1) in New, and then was Done() during Run(), it changes nothing to instead create the channel in New, and close it during Run(). This avoid creating a separate goroutine to listen to its closure during Groups. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
* [dispatcher] Convert groups to a sync map This allows us to avoid taking the lock in most cases. The exception is that we still need it when adding an aggrgroup, as multiple goroutines might be otherwise trying to add theirs, for the same fp. Signed-off-by: Guido Trotter <guido@hudson-trading.com> * Even with sync map snapshot the groups in each route to avoid contention when the map is dirty Signed-off-by: Guido Trotter <guido@hudson-trading.com> * Remove lock outside of the per route sync.Map Use sync map swap operations, with max retries, to avoid one extra lock. Signed-off-by: Guido Trotter <guido@hudson-trading.com> * keep groups len in routeAggrGroups This allows us to allocate the right amount of memory when we read them for Groups(). The number may not be precise, but will work in most cases. Signed-off-by: Guido Trotter <guido@hudson-trading.com> * Use a constant instead of the concurrency field for the Groups() slice copy extra size Signed-off-by: Guido Trotter <guido@hudson-trading.com> * cast sync.Map Range value to aggrGroup only once Signed-off-by: Guido Trotter <guido@hudson-trading.com> * Remove 'dontdestroy' property of the alert store - just pass a DestroyIfEmpty value to DeleteIfNotModified - never destroy in GC Signed-off-by: Guido Trotter <guido@hudson-trading.com> --------- Signed-off-by: Guido Trotter <guido@hudson-trading.com> Co-authored-by: Guido Trotter <guido@hudson-trading.com>
* [dispatcher] increase concurrency of alert ingestion We create multiple goroutines on Run(), one for maintenance, one for start timer, and N for ingestion. Signed-off-by: Guido Trotter <guido@hudson-trading.com> * decrease concurrency to avoid some contention on channels and ag locks Signed-off-by: Guido Trotter <guido@hudson-trading.com> --------- Signed-off-by: Guido Trotter <guido@hudson-trading.com> Co-authored-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
84d11ab to
06693bf
Compare
|
/test test |
We need to commit the frontend assets in order for the build to succeed in our downstream environment. Signed-off-by: Simon Pasquier <spasquie@redhat.com>
|
/test test |
|
/verified by tests |
|
@simonpasquier: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-agnostic-cmo |
|
/hold I want to test with cluster-bot that the UI is working as expected using port-forward. |
|
/hold cancel I didn't manage to spin up a cluster with cluster-bot but the UI looks good after building and testing locally on my laptop. It'd be easier to merge the PR and check with the next nightly payload. |
|
/verified by @simonpasquier |
|
@simonpasquier: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielmellado, simonpasquier, slashpai The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@simonpasquier: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |

No description provided.