"excessive retries creating aggregation group" error after upgrading to v0.32.0

### What did you do?

hi! I have alertmanager running in K8s deployed via the community Helm chart. after upgrading from chart version  1.33.1 to 1.35.0 (Alertmanager version v0.31.1 to v0.32.2 respectively) I have started getting the aforementioned error. this doesn't seem to be a Helm chart issue as the diff says the only change in the chart is the image version bump. no other config changes have been performed. AM is set up to run as a cluster of 3 instances. I have no alert grouping set up and there are a lot of alerts that go through this AM cluster with relatively complex routing/matchers, but I haven't had this issue on v0.31.1. the error seems to randomly cover various alerts, and the debug logs show that some of them are actually in the resolved state when this occurs. I have had a look at the metrics and nothing stood out to me apart from `alertmanager_dispatcher_aggregation_groups`, which usually hovers around 200 but spikes to 300+ along with a slight spike in `alertmanager_dispatcher_alert_processing_duration_seconds_count`, and that's when the errors show up. this is worrying because in [dispatch.go](https://github.com/prometheus/alertmanager/blob/main/dispatch/dispatch.go#L538) it says specifically that this is caused by either a 'bug or extreme contention'.
please let me know if further troubleshooting is necessary to pinpoint the cause of the issue

### What did you expect to see?

no bug/regression when upgrading to a newer version

### What did you see instead? Under which circumstances?

upgrading from v0.31.1 to v0.32.2

### System information

Kubernetes (Talos Linux v1.11.1, kernel 6.12.45-talos)

### Alertmanager version

```text
alertmanager, version 0.32.0 (branch: HEAD, revision: 685a2a1c6bb01b2c17bc1bfae995cb3416c1115e)
  build user:       root@e5ae55633a39
  build date:       20260408-18:08:22
  go version:       go1.26.2
  platform:         linux/amd64
  tags:             netgo
```

### Alertmanager configuration file

```yaml
global: {}
inhibit_rules:
- equal:
  - cluster
  - namespace
  - pod
  source_matchers:
  - alertname = KubePodCrashLooping
  target_matchers:
  - alertname = KubeContainerWaiting
receivers:
<lots-of-receivers>
route:
  group_by:
  - '...'
  group_interval: 5m
  group_wait: 10s
  receiver: "null"
  repeat_interval: 3h
  routes:
<lots-of-routes>
templates:
- /etc/alertmanager/*.tmpl
```

### Prometheus version

```text

```

### Prometheus configuration file

```yaml

```

### Logs

```text
...
time=2026-04-16T14:47:11.144Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=4198ce13258e8234 route={}/{} alert=KubeletRestarted retries=101
time=2026-04-16T14:47:11.163Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=4198ce13258e8234 route={}/{} alert=KubeletRestarted retries=101
time=2026-04-16T14:58:11.161Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=4198ce13258e8234 route={}/{} alert=KubeletRestarted retries=101
time=2026-04-16T15:12:12.750Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=db1705f493471c37 route={}/{} alert=HostNetworkInterfaceSaturationSpike retries=101
time=2026-04-16T15:21:52.746Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=db1705f493471c37 route={}/{} alert=HostNetworkInterfaceSaturationSpike retries=101
time=2026-04-16T15:22:12.736Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=db1705f493471c37 route={}/{} alert=HostNetworkInterfaceSaturationSpike retries=101
time=2026-04-16T15:22:12.736Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=db1705f493471c37 route={}/{} alert=HostNetworkInterfaceSaturationSpike retries=101
time=2026-04-16T15:22:32.746Z level=ERROR source=dispatch.go:531 msg="excessive retries creating aggregation group" component=dispatcher fingerprint=db1705f493471c37 route={}/{} alert=HostNetworkInterfaceSaturationSpike retries=101
...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"excessive retries creating aggregation group" error after upgrading to v0.32.0 #5176

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

System information

Alertmanager version

Alertmanager configuration file

Prometheus version

Prometheus configuration file

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"excessive retries creating aggregation group" error after upgrading to v0.32.0 #5176

Description

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

System information

Alertmanager version

Alertmanager configuration file

Prometheus version

Prometheus configuration file

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions