Skip to content

Delete metrics increment on no-op reconcile when DaemonSet/HPA/PDB are not configured #8438

@jemag

Description

@jemag

Description:

The resource_delete_total and resource_delete_duration_seconds metrics are being incremented every reconcile cycle for resources that don't exist and were never created.

When using a Deployment-based EnvoyProxy, it seems like the controller calls deleteDaemonSet(), deleteHPA(), and deletePDB() on every reconcile because the createOrUpdate* seems to treat nil renders as a signal to delete.

After running for a few days, I'm seeing ~25k deletes for DaemonSet, HPA, and PDB - while the Deployment (which actually exists) shows zero deletes as expected.

This makes the metrics noisy and hard to track actual delete operations. It is also misleading for the dashboards and does not seem to match with the description of the metric:

Image Image Image

Basically, if a resource is not actually deleted, then I wouldn't expect it to be counted as such.

Repro steps:

  1. Deploy Envoy Gateway with default config (Deployment mode, no HPA/PDB)
  2. Create a Gateway and let it reconcile for a while
  3. Query metrics:
resource_delete_total
resource_delete_duration_seconds_sum
  1. Observe DaemonSet/HPA/PDB counters climbing while Deployment stays at 0

Environment:

  • Envoy Gateway v1.7.0
  • Kubernetes 1.34.2
  • Default EnvoyProxy config (Deployment with 2 replicas, no autoscaling or PDB)

Logs:

N/A - no errors, just metric behavior


Possible fix would be to check whether DeleteAllOf actually deleted anything before recording metrics, or skip the delete call entirely if a Get shows the resource doesn't exist.


Metadata

Metadata

Assignees

No one assigned

    Labels

    area/infra-mgrIssues related to the provisioner used for provisioning the managed Envoy Proxy fleet.help wantedExtra attention is neededtriage

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions