Skip to content

🐛 fix: target vllm-d deploy runners with openshift label#4017

Merged
clubanderson merged 1 commit intomainfrom
fix/deploy-runner-label
Mar 31, 2026
Merged

🐛 fix: target vllm-d deploy runners with openshift label#4017
clubanderson merged 1 commit intomainfrom
fix/deploy-runner-label

Conversation

@clubanderson
Copy link
Copy Markdown
Collaborator

Summary

  • The deploy-vllm-d job used runs-on: [self-hosted, kc] which matched both vllm-d and scooter-cks runners
  • When GitHub assigned the job to a scooter-cks runner, it failed because those runners have no RBAC and target the wrong cluster entirely
  • This was the root cause of the intermittent "Runner SA lacks RBAC permissions" failures that have been misdiagnosed 4 times as a vllm-d RBAC issue
  • Fix: add openshift label to the selector → [self-hosted, kc, openshift] — only vllm-d runners have this label

Runner inventory

Runner Set Cluster Labels
npmh7 (2 pods) vllm-d self-hosted, kc, openshift
dmcrj (2 pods) scooter-cks self-hosted, kc, cks, scooter
tsspd (2 pods) pok-prod self-hosted, kc-pok, openshift

Test plan

  • Merge and verify next deploy-vllm-d job succeeds
  • Verify it always picks an npmh7 runner (check runner name in job logs)

…cks from stealing deploy jobs

The deploy-vllm-d job used `runs-on: [self-hosted, kc]` which matched
both vllm-d (npmh7) and scooter-cks (dmcrj) runners. When GitHub
assigned the job to a scooter-cks runner, it failed because those
runners lack RBAC and target the wrong cluster. Adding the `openshift`
label narrows the selector to only vllm-d runners.

Signed-off-by: Andrew Anderson <andy@clubanderson.com>
Copilot AI review requested due to automatic review settings March 31, 2026 17:59
@kubestellar-prow kubestellar-prow bot added the dco-signoff: yes Indicates the PR's author has signed the DCO. label Mar 31, 2026
@kubestellar-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign clubanderson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubestellar-prow kubestellar-prow bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 31, 2026
@clubanderson clubanderson merged commit eec972d into main Mar 31, 2026
11 checks passed
@kubestellar-prow kubestellar-prow bot deleted the fix/deploy-runner-label branch March 31, 2026 17:59
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 31, 2026

Deploy Preview for kubestellarconsole canceled.

Name Link
🔨 Latest commit b4d6bf0
🔍 Latest deploy log https://app.netlify.com/projects/kubestellarconsole/deploys/69cc0b841f27c50008d3dacd

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hey @clubanderson — thanks for opening this PR!

🤖 This project is developed exclusively using AI coding assistants.

Please do not attempt to code anything for this project manually.
All contributions should be authored using an AI coding tool such as:

This ensures consistency in code style, architecture patterns, test coverage,
and commit quality across the entire codebase.


This is an automated message.

@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution! Your PR has been merged.

Check out what's new:

Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes intermittent failures in the deploy-vllm-d GitHub Actions job by narrowing the self-hosted runner selection so the job only lands on the intended OpenShift-backed vllm-d runner pool.

Changes:

  • Update deploy-vllm-d runner selector to include the openshift label ([self-hosted, kc, openshift]) to avoid accidentally matching scooter-cks runners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants