Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After a couple of hours of not using the runner, it will become offline #556

Open
ni-aackerman opened this issue Apr 17, 2023 · 8 comments

Comments

@ni-aackerman
Copy link

So after everything is up an running, did run some jobs, all is good but then, after not using the runner for some hours, it wil become offline, and I will have to delete the runner pool and delete the runner from github so that it gets recreated. Is there some setting like idle timeout or something like that? I do have the minimum runner set to 1 in my CRD

@ni-aackerman
Copy link
Author

Hello?

@davidkarlsen
Copy link
Collaborator

What does the logs of the pod say? I am no longer working with tietoevry, and no longer have write-access to the repo - but might be able to shed some light on what is wrong.

@ni-skopp
Copy link

Hi @davidkarlsen

we have these in our logs of the pod

# Runner removal

Cannot connect to server, because config files are missing. Skipping removing runner from the server.
Does not exist. Skipping Removing .credentials
Does not exist. Skipping Removing .runner


--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration




A runner exists with the same name
A runner exists with the same name runner-pool-pod-s4jd2.

looks like it can't be removed which cause it crashloop since the same pod is already running, do know where to look why it's failing to do so?

Cannot connect to server, because config files are missing.

@davidkarlsen
Copy link
Collaborator

Then there is already a runner registered with this name in the GH console, force-remove those. It's maybe wise to set the pod-policy to restart: Never - then they won't reappear and end in this state.

@davidkarlsen
Copy link
Collaborator

BTW: Love traktor ;-)

@ni-skopp
Copy link

I think this is the problem that we need to always force-remove it and sometimes recreate the runner-pool, I've added restartPolicy: Never , lets see if it's help

apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
  name: runner-pool
  namespace: github-actions-runner-operator
spec:
  minRunners: 1
  maxRunners: 9
  reconciliationPeriod: 1m
  podTemplateSpec:
    metadata:
      annotations:
        "prometheus.io/scrape": "true"
        "prometheus.io/port": "3903"
    spec:
      restartPolicy: Never
      affinity:

@ni-skopp
Copy link

@davidkarlsen just wanted to confirm that we don't see any issues since one week, looks like it helped, thank you!

@ni-aackerman
Copy link
Author

ni-aackerman commented Jul 24, 2023

@davidkarlsen it happened again, even with the restartPolicy: Never.

In the meantime, until a fix is found I unded up setting a cron that runs every day inside kubernetes to delete the pod. This way a new one is spawned and is automatically registered in github.
I know its not a real solution, it just helps us to focus on different things that are more urgent.
Here the code for such cronjob:

kind: CronJob
metadata:
  name: delete-runner-pod
spec:
  schedule: "00 22 * * *"  # Schedule to run daily at midnight Berlin time
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl  # Using a kubectl container image
            command:
            - /bin/sh
            - -c
            - kubectl get pods --no-headers=true | awk '/^runner-pool-pod-/ {print $1}' | xargs -I {} kubectl delete pod {} --grace-period=0 --force
          restartPolicy: OnFailure
      ttlSecondsAfterFinished: 172800 # delete job after 48 hours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants