Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to register Arm64 runners #18092

Closed
welteki opened this issue May 29, 2024 · 8 comments
Closed

Unable to register Arm64 runners #18092

welteki opened this issue May 29, 2024 · 8 comments
Assignees
Labels
area/testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. type/flake

Comments

@welteki
Copy link

welteki commented May 29, 2024

Which Github Action / Prow Jobs are flaking?

We noticed an issue with Arm64 jobs running on Actuated. Jobs stay queued because we have insufficient permissions to get a JIT config for registering the runners.

If any changes were made recently, permissions for the Actuated app should be checked at the organisation and repo level.

cc @alexellis

Which tests are flaking?

Arm64 tests running on managed runners.

Github Action / Prow Job link

No response

Reason for failure (if possible)

Insufficient permissions for the Actuated app at the organisation or repo level to create a JIT config for registering runners.

Anything else we need to know?

No response

@welteki
Copy link
Author

welteki commented May 29, 2024

Looking at one of the queued Arm64 jobs we see a message that self-hosted runners are disabled.

Self-hosted runners in the repository are disabled by your administrator

@jmhbnz
Copy link
Member

jmhbnz commented May 29, 2024

Hi @welteki - Thanks for raising this, I am not aware of any changes at our end. Checking our actions queue I can see a number of arm64 jobs have been sitting in queue for a while https://github.com/etcd-io/etcd/actions?query=is%3Aqueued++.

Has something changed on the Actuated side or is there a GitHub change/outage potentially?

If we think it is a permissions change on the etcd side I will need to reach out to Kubernetes project folks as etcd-io org is now under the Kubernetes GitHub Enterprise account.

/assign @jmhbnz

@jmhbnz jmhbnz added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label May 29, 2024
@alexellis
Copy link
Contributor

We've seen the message "Self-hosted runners in the repository are disabled by your administrator" once or twice before - this happens when people are fiddling with the settings at the org/enterprise level without realising the impact the change would have.

@welteki
Copy link
Author

welteki commented May 29, 2024

Hi @jmhbnz,

Has something changed on the Actuated side or is there a GitHub change/outage potentially?

We did not make any changes on the Actuated side. There is no outage reported for GitHub and we are not seeing any issues with jobs for other organisations.

If we think it is a permissions change on the etcd side I will need to reach out to Kubernetes project folks as etcd-io org is now under the Kubernetes GitHub Enterprise account.

I can see jobs queued for other repos in the etcd-io org as well: https://github.com/etcd-io/bbolt/actions?query=is%3Aqueued+. All jobs display the same message Self-hosted runners in the repository are disabled by your administrator so I think the cause is a permissions change at the org level.

@jmhbnz
Copy link
Member

jmhbnz commented May 29, 2024

Thanks team - I've started a slack chat with the Kubernetes GitHub management folks, https://kubernetes.slack.com/archives/C01672LSZL0/p1717015195019279 so we can check from that side 🙏🏻

@jmhbnz
Copy link
Member

jmhbnz commented May 29, 2024

Confirmed issue was on Kubernetes GitHub org admins side and has been corrected:

> @jmhbnz I was reviewing some enterprise-level settings related to a non-etcd related security issue. I’ve reverted the setting change.. can you try again?

Jobs are now being picked up again. Thanks for raising the alarm @welteki.

@jmhbnz jmhbnz closed this as completed May 29, 2024
@alexellis
Copy link
Contributor

Good 👍

We've requested that the pending 64 jobs that were still pending get started now on new VMs. Let us know if you see any further issues with jobs not starting etc.

@alexellis
Copy link
Contributor

In this circumstance, it looks like the jobs that are pending are stuck and in deadlock because their definitions disallow self-hosted runners. They will need to be cancelled/restarted.

Screenshot 2024-05-30 at 08 27 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. type/flake
Development

No branches or pull requests

3 participants