HPA and NGF Controller Conflicting

**Describe the bug**

When `autoscaling.enable: true` is configured in the Helm chart, the NGF controller updates the deployment and modifies the `spec.replicas` field in conflict with the HPA. This causes the deployment to scale up and down in the same second, resulting in constant pod churn and preventing the HPA from scaling up or down consistently.

**To Reproduce**

1. Deploy NGF with autoscaling enabled using these Helm values:
```yaml
nginx:
  autoscaling:
    enable: true
    metrics:
      - external:
          metric:
            name: <some-external-metric-providing-connection-count-across-all-replicas>
          target:
            type: Value
            value: 20000
        type: External
    minReplicas: 1
    maxReplicas: 10
```

2. Wait for HPA to trigger a scale-down event

3. Observe scale events:
```bash
kubectl get events -n ngf --sort-by='.lastTimestamp' -o custom-columns='when:lastTimestamp,msg:message,reason:reason,obj:involvedObject.name,cmp:source.component' | grep -E "SuccessfulRescale|ScalingReplicaSet"
```

4. Check who last updated the deployment replicas
```bash
kubectl get deployment nginx-public-gateway-nginx -n ngf --show-managed-fields -o json | \
  jq '.metadata.managedFields[] | select(.fieldsV1."f:spec"."f:replicas") | {manager: .manager, operation: .operation, time: .time}'
```

**Expected behavior**

When `autoscaling.enable: true`, the NGF controller should:
1. Create the HPA resource
2. Not change the `spec.replicas` field after HPA is created
3. Allow HPA to be the sole controller managing replica count

**Your environment**

* **Version of NGINX Gateway Fabric:** 2.1.2 (commit: 877c415d596ebb86b61f20ed77c7db8847a10f6c, date: 2025-09-25T19:31:07Z)
* **Kubernetes Version:** v1.32.6
* **Platform:** Azure Kubernetes Service (AKS)
* **Exposure method:** Service type LoadBalancer
* **Helm Chart Version:** nginx-gateway-fabric-2.1.2

**Observed behavior**

Events show deployment scaling up and down in the same second:
```
> kubectl get events -n immy-routing --sort-by='.lastTimestamp' -o custom-columns='when:lastTimestamp,msg:message,reason:reason,obj:involvedObject.name,cmp:source.component' | grep -E "SuccessfulRescale|ScalingReplicaSet"
2025-10-02T18:17:53Z   New size: 10; reason: external metric datadogmetric@immy-routing:nginx-connection-count-connections(nil) above target      SuccessfulRescale   nginx-public-gateway-nginx                      horizontal-pod-autoscaler
2025-10-02T18:19:38Z   Scaled down replica set nginx-public-gateway-nginx-57b699c549 from 10 to 8                                                 ScalingReplicaSet   nginx-public-gateway-nginx                      deployment-controller
2025-10-02T18:19:38Z   Scaled up replica set nginx-public-gateway-nginx-57b699c549 from 8 to 10                                                   ScalingReplicaSet   nginx-public-gateway-nginx                      deployment-controller
2025-10-02T18:21:38Z   Scaled down replica set nginx-public-gateway-nginx-57b699c549 from 10 to 9                                                 ScalingReplicaSet   nginx-public-gateway-nginx                      deployment-controller
2025-10-02T18:21:38Z   Scaled up replica set nginx-public-gateway-nginx-57b699c549 from 9 to 10                                                   ScalingReplicaSet   nginx-public-gateway-nginx                      deployment-controller
2025-10-02T18:25:23Z   Scaled up replica set ngf-nginx-gateway-fabric-74db69c968 from 0 to 1                                                      ScalingReplicaSet   ngf-nginx-gateway-fabric                        deployment-controller
2025-10-02T18:25:26Z   Scaled down replica set ngf-nginx-gateway-fabric-7b99997d79 from 1 to 0                                                    ScalingReplicaSet   ngf-nginx-gateway-fabric                        deployment-controller
2025-10-02T18:25:39Z   New size: 9; reason: All metrics below target                                                                              SuccessfulRescale   nginx-public-gateway-nginx                      horizontal-pod-autoscaler
2025-10-02T18:51:42Z   Scaled down replica set nginx-public-gateway-nginx-57b699c549 from 9 to 8                                                  ScalingReplicaSet   nginx-public-gateway-nginx                      deployment-controller
2025-10-02T18:51:42Z   New size: 8; reason: All metrics below target                                                                              SuccessfulRescale   nginx-public-gateway-nginx                      horizontal-pod-autoscaler
```

Checking managed fields confirms NGF controller ("gateway" manager) is modifying replicas in the same second as the hpa:
```bash
> kubectl get deployment nginx-public-gateway-nginx -n immy-routing --show-managed-fields -o json | \
  jq '.metadata.managedFields[] | select(.fieldsV1."f:spec"."f:replicas") | {manager: .manager, operation: .operation, time: .time}'
```

```json
{
  "manager": "gateway",
  "operation": "Update",
  "time": "2025-10-02T18:51:42Z"
}
```

And the replica count set by the HPA has been overwritten back to the old value:
```bash
> kubectl get deployment nginx-public-gateway-nginx -n immy-routing -o json | jq '.spec.replicas'     
9
```

**Additional context**

**Suspected root cause:** The NGF controller is updating the deployment, including the `spec.replicas` field, even when HPA is enabled and results a race condition:
1. HPA decides to scale (e.g., 10 → 8 replicas)
2. HPA updates deployment `.spec.replicas: 8`
3. Deployment terminates relevant pods
4. NGF controller reconciles and resets `.spec.replicas` back to the old value (e.g., 10)
5. Deployment spins up pods again

**Impact on production:**
- Pods restart every 2 minutes (matching HPA scale-down period)
- Thousands of websocket connections dropped on each restart
- Connection storms after scale-downs cause metric spikes
- HPA unable to effectively manage scaling due to constant interference



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HPA and NGF Controller Conflicting #4007

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HPA and NGF Controller Conflicting #4007

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions