HPA kills new pods instantly after creation #561

lorrx · 2024-04-21T10:27:59Z

Describe your Issue

When I activate HPA in the helmet chart, a pod is initially scheduled (which is correct). As soon as I synchronize files, the CPU load of this pod increases to >60%. So HPA tries to schedule new pods. These are then also scheduled, but killed again immediately after container creation. This means that there is never more than one pod running at the same time, although there should be 5.

Logs and Errors

There are no errors in the logs. The termination seems to be caused by Kubernetes itself.

Describe your Environment

Kubernetes distribution: k3s
Helm Version (or App that manages helm): ArgoCD version v2.10.7+b060053
Helm Chart Version: 4.6.6
values.yaml:

nextcloud:
  host: 10.3.28.0
  configs:
    custom.config.php: |
      <?php
        $CONFIG = array(
          "check_data_directory_permissions"=> false, # fix data directory permissions error
          "trusted_domains" => array (
            $_ENV["NEXTCLOUD_TRUSTED_DOMAINS"], # fix probes 400 error
          ),
          'trusted_proxies' => array(
            0 => '127.0.0.1',
            1 => '10.0.0.0/8',
          ),
          "forwarded_for_headers" => array("HTTP_X_FORWARDED_FOR"),
        );
  containerPort: 8080
  extraVolumes:
    - name: nginx-cache
      emptyDir: { }
  extraVolumeMounts:
    - name: nginx-cache
      mountPath: "/var/cache/nginx" # fix permission denied error
  securityContext:
    runAsUser: 901000
    runAsGroup: 901000
    runAsNonRoot: true
  podSecurityContext:
    runAsUser: 901000
    runAsGroup: 901000
    runAsNonRoot: true
service:
  type: LoadBalancer
internalDatabase:
  enabled: false
image:
  flavor: fpm
nginx:
  enabled: true
  image:
    repository: nginxinc/nginx-unprivileged
    tag: 1.25 # https://hub.docker.com/r/nginxinc/nginx-unprivileged/tags
  containerPort: 8080
  resources:
    limits:
      cpu: 200m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 64Mi
  securityContext:
    runAsUser: 901000
    runAsGroup: 901000
    runAsNonRoot: true
externalDatabase:
  enabled: true
  type: postgresql
  host: nextcloud-postgresql-primary
  database: nextcloud
  user: nextcloud
  password: nextcloud
hpa:
  enabled: true
  minPods: 1
  maxPods: 5
resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi
persistence:
  enabled: true
  existingClaim: pvc-k8s-nextcloud-app
  nextcloudData:
    enabled: true
    existingClaim: pvc-k8s-nextcloud-data
livenessProbe:
  enabled: true
readinessProbe:
  enabled: true
startupProbe:
  enabled: true

Additional context, if any

I found a possible solution for this issue. As mentioned in this StackOverflow article, the replicas parameter cannot be used in the deployment resource if a HPA definition is used.

I am using NFS as persistant storage with the NFS CSI driver. The PVC has RWX access mode.

The text was updated successfully, but these errors were encountered:

lorrx · 2024-04-21T10:37:14Z

Additional information:
The problem seems to be in ArgoCD. When disabling the self-heal option, all replicas are scheduled as exepted.

I suspect that ArgoCD detects a diff in the deployment.yml (replicas: 1 != replicas: 3) and then adjusts this again. So removing the replicas option when HPA is enabled could really be the solution.

jessebot · 2024-04-30T14:16:23Z

@lorrx thanks for issue and the updates 🙏 , if you have found a solution that works both in Argo CD and via helm directly on a k8s cluster, please feel free to submit a PR to correct the issue.

jessebot · 2024-07-23T10:24:54Z

@lorrx if you can, could you try pointing your Argo CD Application (or ApplicationSet) at my fix/dont-set-replicas-in-pod-if-hpa-enabled branch?

Sorry if I'm over-explaining, but just in case you need the info, it would be something like this for your source:

      source:
        repoURL: 'https://github.com/jessebot/nextcloud-helm'
        targetRevision: fix/dont-set-replicas-in-pod-if-hpa-enabled
        path: charts/nextcloud/

If you're using an Argo Project, then you'll also need to add https://github.com/jessebot/nextcloud-helm as an allowed source repo. Let me know if this works for you and we can work on getting that PR merged 🙏 Thanks!

jessebot · 2024-07-23T11:39:23Z

update: I think we're actually good to merge the above PR based on #596 (comment), which would auto-close this Issue. If that happens, and it's still broken, we can absolutely re-open this Issue, or you can open a second one. Either way, happy to help :)

lorrx · 2024-07-28T13:40:52Z

@lorrx if you can, could you try pointing your Argo CD Application (or ApplicationSet) at my fix/dont-set-replicas-in-pod-if-hpa-enabled branch?

Sorry if I'm over-explaining, but just in case you need the info, it would be something like this for your source:
      source:
        repoURL: 'https://github.com/jessebot/nextcloud-helm'
        targetRevision: fix/dont-set-replicas-in-pod-if-hpa-enabled
        path: charts/nextcloud/
If you're using an Argo Project, then you'll also need to add https://github.com/jessebot/nextcloud-helm as an allowed source repo. Let me know if this works for you and we can work on getting that PR merged 🙏 Thanks!

Many thanks for the improvement. I will test the fix, but it might take some time. I will open the issue again or create a new one if something does not work.

jessebot mentioned this issue Jul 23, 2024

only set spec.replicas in Nextcloud Deployment if .Values.hpa.enabled is set to false #596

Merged

4 tasks

jessebot added needs info Not enough information provided hpa horizontal pod autoscaling - triggers a testing workflow labels Jul 23, 2024

jessebot removed the needs info Not enough information provided label Jul 23, 2024

jessebot self-assigned this Jul 23, 2024

jessebot closed this as completed in #596 Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPA kills new pods instantly after creation #561

HPA kills new pods instantly after creation #561

lorrx commented Apr 21, 2024

lorrx commented Apr 21, 2024

jessebot commented Apr 30, 2024

jessebot commented Jul 23, 2024

jessebot commented Jul 23, 2024

lorrx commented Jul 28, 2024

HPA kills new pods instantly after creation #561

HPA kills new pods instantly after creation #561

Comments

lorrx commented Apr 21, 2024

Describe your Issue

Logs and Errors

Describe your Environment

Additional context, if any

lorrx commented Apr 21, 2024

jessebot commented Apr 30, 2024

jessebot commented Jul 23, 2024

jessebot commented Jul 23, 2024

lorrx commented Jul 28, 2024