Skip to content

fix: remove existing test repo before upload#519

Merged
drbh merged 8 commits into
mainfrom
update-e2e-to-init-new-repo
May 13, 2026
Merged

fix: remove existing test repo before upload#519
drbh merged 8 commits into
mainfrom
update-e2e-to-init-new-repo

Conversation

@drbh
Copy link
Copy Markdown
Collaborator

@drbh drbh commented May 4, 2026

This PR deletes the existing test repo from staging before upload so we force repo creation each time, this should also fix issues where staging repos have refs removed but keep the repo which causes upload to fail on the /refs call

@drbh drbh force-pushed the update-e2e-to-init-new-repo branch from b9e8f0c to bbba328 Compare May 12, 2026 18:13
E2E_BRANCH: e2e-${{ github.event.pull_request.number || github.run_id }}-${{ github.run_attempt }}

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change updates the concurrency group to be scoped to this workflow instead of the workflow + the pr.

this way the actions should only run one time per workflow, and not per pr, so in the case multiple PR's are open, it should only run this workflow one at a time to avoid race conditions.

since this change adds a step to delete the staging repo, if the workflows ran in parallel then it could hypothetically have a race condition if one action deleted the repo after the other action uploaded and before it downloaded in the get_kernel call.

so this change should guard against that race condition

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some comments.

- name: Delete existing test repo
run: |
cat > /tmp/delete_repos.py << 'PYEOF'
from huggingface_hub import HfApi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use delete_repo() without having to initialize the API?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I wasn't aware that was possible? can you share a snippet we can use to delete a repo in a more concise way? I was under the impression we'd need to create the api object pointing to the staging url.

Happy to make changes to a better approach!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Remove the existing test repo if it exists so we test repo creation in the upload step
- name: Delete existing test repo
run: |
cat > /tmp/delete_repos.py << 'PYEOF'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep it under scripts or workflows directory?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move it if you think thats a better location - however since the snippet is so small and only really expected to be used in this case I thought it made sense to inline.

lmk what you think!

CUDA_TAG=$(echo "$VARIANT" | grep -oP 'cu\d+')
echo "Installing torch matching variant $VARIANT (CUDA tag: $CUDA_TAG)"
uv sync --all-extras --dev
# E2E validates this checkout, so use the local kernels-data binding instead of the released wheel from uv.lock.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was added to ensure the E2E test is using the latest kernel-data in the repo, instead of the version from the lock file.

I ran into an issue with a difference in the api used to load the metadata in a previous action run https://github.com/huggingface/kernels/actions/runs/25754879907/job/75640986832#step:5:36

and by reinstalling the kernels-data library after syncing, it make sure to use the latest changes (and not the one pinned in the uv.lock)

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answers!

@drbh drbh merged commit 7098da1 into main May 13, 2026
68 checks passed
danielfleischer pushed a commit to danielfleischer/kernels that referenced this pull request May 17, 2026
* fix: remove existing test repo before upload (huggingface#519)

* fix: remove existing test repo before upload

* fix: add missing content type

* fix: prefer removing repos via hub library

* fix: use lib from nix shell on runner

* fix: disallow more than one instance of E2E running at once to avoid race conditions

* fix: prefer using ci token

* fix: update e2e to use trust_remote_code for the dummy user

* fix: prefer using latest kernels-data in test

* fix: update nix warns to throws (huggingface#540)

* feat: bump cute dsl/cutlass (huggingface#545)

* feat: add to vouched (huggingface#551)

* hook up skill in the cli and add docs.

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
sayakpaul added a commit that referenced this pull request May 20, 2026
* XPU Skill

Adds a new skill under kernel-builder/skills/xpu-kernels/, alongside the
existing cuda-kernels and rocm-kernels skills, bringing Intel XPU
support to kernel-builder. Target hardware is Intel Battlemage / Arc Pro
B70 (Xe2) via the Intel XPU Backend for
Triton (https://github.com/intel/intel-xpu-backend-for-triton).

The skill packages the Xe-Forge (https://github.com/IntelLabs/Xe-Forge)
workflow — an LLM-driven loop that transforms PyTorch code into
optimized Triton kernels for Intel XPU — into the hf-kernels skill
format. Xe-Forge has been used to produce measured speedups on
KernelBench Level 2 fused kernels (bf16) and Flash Attention
forward (fp16); full results live in that repo.

* hook up skill in the CI and add docs (#1)

* fix: remove existing test repo before upload (#519)

* fix: remove existing test repo before upload

* fix: add missing content type

* fix: prefer removing repos via hub library

* fix: use lib from nix shell on runner

* fix: disallow more than one instance of E2E running at once to avoid race conditions

* fix: prefer using ci token

* fix: update e2e to use trust_remote_code for the dummy user

* fix: prefer using latest kernels-data in test

* fix: update nix warns to throws (#540)

* feat: bump cute dsl/cutlass (#545)

* feat: add to vouched (#551)

* hook up skill in the cli and add docs.

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Copilot <copilot@github.com>

* Update version bumping scripts with the `--major` option (#550)

* Update version bumping scripts with the `--major` option

With this change the script supports both major and minor version
bumping. For example:

Codebase at `0.10.1.dev0`

```
  (none)          -> 0.10.1
  --major         -> 0.11.0
  --dev           -> 0.10.1.dev1
  --dev --major   -> 0.11.0.dev0
```

Codebase at `0.10.1`:

```
  (none)          -> 0.10.2
  --major         -> 0.11.0
  --dev           -> 0.10.2.dev0
```

These are the typical version bumping workflows within the project.

* Sync .PHONY targets

* upload: fix benchmark deletion filter to match upload filter (#543)

* get_local_kernel api changed, leaving backend (second arg) empty for auto discovery (#555)

* Paths fix

Fixing some paths due to the skill living in the agent-specific
location, outside of `kernel-builder/skills/xpu-kernels/`.

* update enum

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com>
Co-authored-by: Erik Kaunismäki <erik.kaum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants