Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPA kills new pods instantly after creation #561

Closed
lorrx opened this issue Apr 21, 2024 · 5 comments · Fixed by #596
Closed

HPA kills new pods instantly after creation #561

lorrx opened this issue Apr 21, 2024 · 5 comments · Fixed by #596
Assignees
Labels
hpa horizontal pod autoscaling - triggers a testing workflow

Comments

@lorrx
Copy link

lorrx commented Apr 21, 2024

Describe your Issue

When I activate HPA in the helmet chart, a pod is initially scheduled (which is correct). As soon as I synchronize files, the CPU load of this pod increases to >60%. So HPA tries to schedule new pods. These are then also scheduled, but killed again immediately after container creation. This means that there is never more than one pod running at the same time, although there should be 5.

Logs and Errors

There are no errors in the logs. The termination seems to be caused by Kubernetes itself.

Describe your Environment

  • Kubernetes distribution: k3s

  • Helm Version (or App that manages helm): ArgoCD version v2.10.7+b060053

  • Helm Chart Version: 4.6.6

  • values.yaml:

nextcloud:
  host: 10.3.28.0
  configs:
    custom.config.php: |
      <?php
        $CONFIG = array(
          "check_data_directory_permissions"=> false, # fix data directory permissions error
          "trusted_domains" => array (
            $_ENV["NEXTCLOUD_TRUSTED_DOMAINS"], # fix probes 400 error
          ),
          'trusted_proxies' => array(
            0 => '127.0.0.1',
            1 => '10.0.0.0/8',
          ),
          "forwarded_for_headers" => array("HTTP_X_FORWARDED_FOR"),
        );
  containerPort: 8080
  extraVolumes:
    - name: nginx-cache
      emptyDir: { }
  extraVolumeMounts:
    - name: nginx-cache
      mountPath: "/var/cache/nginx" # fix permission denied error
  securityContext:
    runAsUser: 901000
    runAsGroup: 901000
    runAsNonRoot: true
  podSecurityContext:
    runAsUser: 901000
    runAsGroup: 901000
    runAsNonRoot: true
service:
  type: LoadBalancer
internalDatabase:
  enabled: false
image:
  flavor: fpm
nginx:
  enabled: true
  image:
    repository: nginxinc/nginx-unprivileged
    tag: 1.25 # https://hub.docker.com/r/nginxinc/nginx-unprivileged/tags
  containerPort: 8080
  resources:
    limits:
      cpu: 200m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 64Mi
  securityContext:
    runAsUser: 901000
    runAsGroup: 901000
    runAsNonRoot: true
externalDatabase:
  enabled: true
  type: postgresql
  host: nextcloud-postgresql-primary
  database: nextcloud
  user: nextcloud
  password: nextcloud
hpa:
  enabled: true
  minPods: 1
  maxPods: 5
resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi
persistence:
  enabled: true
  existingClaim: pvc-k8s-nextcloud-app
  nextcloudData:
    enabled: true
    existingClaim: pvc-k8s-nextcloud-data
livenessProbe:
  enabled: true
readinessProbe:
  enabled: true
startupProbe:
  enabled: true

Additional context, if any

I found a possible solution for this issue. As mentioned in this StackOverflow article, the replicas parameter cannot be used in the deployment resource if a HPA definition is used.

I am using NFS as persistant storage with the NFS CSI driver. The PVC has RWX access mode.

@lorrx
Copy link
Author

lorrx commented Apr 21, 2024

Additional information:
The problem seems to be in ArgoCD. When disabling the self-heal option, all replicas are scheduled as exepted.

I suspect that ArgoCD detects a diff in the deployment.yml (replicas: 1 != replicas: 3) and then adjusts this again. So removing the replicas option when HPA is enabled could really be the solution.

@jessebot
Copy link
Collaborator

@lorrx thanks for issue and the updates 🙏 , if you have found a solution that works both in Argo CD and via helm directly on a k8s cluster, please feel free to submit a PR to correct the issue.

@jessebot
Copy link
Collaborator

@lorrx if you can, could you try pointing your Argo CD Application (or ApplicationSet) at my fix/dont-set-replicas-in-pod-if-hpa-enabled branch?

Sorry if I'm over-explaining, but just in case you need the info, it would be something like this for your source:

      source:
        repoURL: 'https://github.com/jessebot/nextcloud-helm'
        targetRevision: fix/dont-set-replicas-in-pod-if-hpa-enabled
        path: charts/nextcloud/

If you're using an Argo Project, then you'll also need to add https://github.com/jessebot/nextcloud-helm as an allowed source repo. Let me know if this works for you and we can work on getting that PR merged 🙏 Thanks!

@jessebot jessebot added needs info Not enough information provided hpa horizontal pod autoscaling - triggers a testing workflow labels Jul 23, 2024
@jessebot
Copy link
Collaborator

update: I think we're actually good to merge the above PR based on #596 (comment), which would auto-close this Issue. If that happens, and it's still broken, we can absolutely re-open this Issue, or you can open a second one. Either way, happy to help :)

@jessebot jessebot removed the needs info Not enough information provided label Jul 23, 2024
@jessebot jessebot self-assigned this Jul 23, 2024
@lorrx
Copy link
Author

lorrx commented Jul 28, 2024

@lorrx if you can, could you try pointing your Argo CD Application (or ApplicationSet) at my fix/dont-set-replicas-in-pod-if-hpa-enabled branch?

Sorry if I'm over-explaining, but just in case you need the info, it would be something like this for your source:

      source:
        repoURL: 'https://github.com/jessebot/nextcloud-helm'
        targetRevision: fix/dont-set-replicas-in-pod-if-hpa-enabled
        path: charts/nextcloud/

If you're using an Argo Project, then you'll also need to add https://github.com/jessebot/nextcloud-helm as an allowed source repo. Let me know if this works for you and we can work on getting that PR merged 🙏 Thanks!

Many thanks for the improvement. I will test the fix, but it might take some time. I will open the issue again or create a new one if something does not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hpa horizontal pod autoscaling - triggers a testing workflow
Projects
None yet
2 participants