Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: 🌱 Use equinix runner instead of ubuntu for e2e tests #1775

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kashifest
Copy link
Member

No description provided.

@metal3-io-bot metal3-io-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2024
@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kashifest. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 7, 2024
@kashifest kashifest force-pushed the use/equinix-runners branch 2 times, most recently from 82911d2 to 6d50f95 Compare June 7, 2024 06:54
@tuminoid
Copy link
Member

/retest

@tuminoid
Copy link
Member

Please add proper description and link to the relevant GH doc.

It seems, e2e pipeline did not get a runner, as they are just "failed" with no run logs. Do we really want to use Equinix as they seem to be off most of the time?

@kashifest
Copy link
Member Author

Please add proper description and link to the relevant GH doc.

Its a WIP so not ready for review, I will add description once we know it works.

@kashifest
Copy link
Member Author

It seems, e2e pipeline did not get a runner, as they are just "failed" with no run logs. Do we really want to use Equinix as they seem to be off most of the time?

We were suggested to try them in the CNCF cluster issue here cncf/cluster#266 (comment) . I have replied back what we experienced here.

@Rozzii Rozzii modified the milestones: BMO - v0.7.0, BMO - v0.6.2 Jun 28, 2024
@Rozzii
Copy link
Member

Rozzii commented Jul 12, 2024

/retest

@Rozzii
Copy link
Member

Rozzii commented Jul 12, 2024

/ok-to-test

@metal3-io-bot metal3-io-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 12, 2024
@kashifest
Copy link
Member Author

/retest

@metal3-io-bot metal3-io-bot added needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 1, 2024
@kashifest
Copy link
Member Author

/retest

@tuminoid
Copy link
Member

/retest

Still queued after 2.5 hours.

@idvoretskyi
Copy link

@kashifest we'll investigate what's going on with the runners and get back to you.

@RobertKielty
Copy link

@RobertKielty
Copy link

We are still working on determining the root cause of the problem.

Since the last update, we have reviewed and checked org-level settings and restarted a cluster listener that should be picking this job up. I canceled and restarted the failed jobs a few times. We will continue to work on this and update you again in an hour.

Thanks again for your patience. Let me know if this is an immediate blocker for the project. If not we appreciate your adoption of the runner and giving us this opportunity to figure out how to fix this.

Copy link

@RobertKielty RobertKielty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think that changing this line back to

runs-on: ${{ inputs.runner }}

will enable the job to be picked up by the appropriate CNCF Managed External GitHub Action runner.

Signed-off-by: Kashif Khan <kashif.khan@est.tech>
@tuminoid
Copy link
Member

It seems to have picked it up, but fails very early.

Run sudo usermod -a -G libvirt $USER
Usage: usermod [options] LOGIN
...

@kashifest
Copy link
Member Author

It seems to have picked it up, but fails very early.

Run sudo usermod -a -G libvirt $USER
Usage: usermod [options] LOGIN
...

Yeah, @RobertKielty what kind of OS and which versions are these runners using? It would be good if you can point us to detailed information on these runners.

@RobertKielty
Copy link

RobertKielty commented Oct 23, 2024

@kashifest, you are correct that documentation is needed to describe the runners; full end-user documentation is not yet ready, as this iteration of the runners was deployed last week.

The cluster and external runners are defined in the ci directory on the cncf/automation repo.

So we can review the code that created the runners.

The equinix-4cpu-16gb runner is deployed from the following directory which contains a Helm Chart and associated values file. (Runner Helm Charts and related components are deployed using ArgoCD but that should not be relevant to this specific problem)

https://github.com/cncf/automation/tree/main/ci/cluster/equinix/runners/4cpu-16gb

The Container Image is defined here
https://github.com/cncf/automation/tree/main/ci/gha-runner-image

Looking at the Dockerfile we have

USER root
RUN apt-get update -y \
    && apt-get install -y --no-install-recommends \
    build-essential \
    npm \
    git \
    curl \
    jq \
    unzip \
    kmod

USER runner

The GHA runner image we use is based on ghcr.io/actions/actions-runner:latest from

https://github.com/actions/runner

We add typically needed packages as root and then revert to the runner user

Apologies again for not having full documentation in place; it is on our roadmap. Any and all feedback is very much appreciated!

@kashifest kashifest force-pushed the use/equinix-runners branch 4 times, most recently from c06670d to dca3155 Compare October 23, 2024 12:23
@metal3-io-bot metal3-io-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 23, 2024
Signed-off-by: Kashif Khan <kashif.khan@est.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Status: BMO WIP
Development

Successfully merging this pull request may close these issues.

6 participants