fix: remove existing test repo before upload#519
Conversation
Co-authored-by: Copilot <copilot@github.com>
b9e8f0c to
bbba328
Compare
| E2E_BRANCH: e2e-${{ github.event.pull_request.number || github.run_id }}-${{ github.run_attempt }} | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} |
There was a problem hiding this comment.
Could you explain this change?
There was a problem hiding this comment.
this change updates the concurrency group to be scoped to this workflow instead of the workflow + the pr.
this way the actions should only run one time per workflow, and not per pr, so in the case multiple PR's are open, it should only run this workflow one at a time to avoid race conditions.
since this change adds a step to delete the staging repo, if the workflows ran in parallel then it could hypothetically have a race condition if one action deleted the repo after the other action uploaded and before it downloaded in the get_kernel call.
so this change should guard against that race condition
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks! Left some comments.
| - name: Delete existing test repo | ||
| run: | | ||
| cat > /tmp/delete_repos.py << 'PYEOF' | ||
| from huggingface_hub import HfApi |
There was a problem hiding this comment.
Can't we just use delete_repo() without having to initialize the API?
There was a problem hiding this comment.
oh I wasn't aware that was possible? can you share a snippet we can use to delete a repo in a more concise way? I was under the impression we'd need to create the api object pointing to the staging url.
Happy to make changes to a better approach!
| # Remove the existing test repo if it exists so we test repo creation in the upload step | ||
| - name: Delete existing test repo | ||
| run: | | ||
| cat > /tmp/delete_repos.py << 'PYEOF' |
There was a problem hiding this comment.
Why not keep it under scripts or workflows directory?
There was a problem hiding this comment.
we can move it if you think thats a better location - however since the snippet is so small and only really expected to be used in this case I thought it made sense to inline.
lmk what you think!
| CUDA_TAG=$(echo "$VARIANT" | grep -oP 'cu\d+') | ||
| echo "Installing torch matching variant $VARIANT (CUDA tag: $CUDA_TAG)" | ||
| uv sync --all-extras --dev | ||
| # E2E validates this checkout, so use the local kernels-data binding instead of the released wheel from uv.lock. |
There was a problem hiding this comment.
this was added to ensure the E2E test is using the latest kernel-data in the repo, instead of the version from the lock file.
I ran into an issue with a difference in the api used to load the metadata in a previous action run https://github.com/huggingface/kernels/actions/runs/25754879907/job/75640986832#step:5:36
and by reinstalling the kernels-data library after syncing, it make sure to use the latest changes (and not the one pinned in the uv.lock)
* fix: remove existing test repo before upload (huggingface#519) * fix: remove existing test repo before upload * fix: add missing content type * fix: prefer removing repos via hub library * fix: use lib from nix shell on runner * fix: disallow more than one instance of E2E running at once to avoid race conditions * fix: prefer using ci token * fix: update e2e to use trust_remote_code for the dummy user * fix: prefer using latest kernels-data in test * fix: update nix warns to throws (huggingface#540) * feat: bump cute dsl/cutlass (huggingface#545) * feat: add to vouched (huggingface#551) * hook up skill in the cli and add docs. --------- Co-authored-by: drbh <david.richard.holtz@gmail.com> Co-authored-by: Copilot <copilot@github.com>
* XPU Skill Adds a new skill under kernel-builder/skills/xpu-kernels/, alongside the existing cuda-kernels and rocm-kernels skills, bringing Intel XPU support to kernel-builder. Target hardware is Intel Battlemage / Arc Pro B70 (Xe2) via the Intel XPU Backend for Triton (https://github.com/intel/intel-xpu-backend-for-triton). The skill packages the Xe-Forge (https://github.com/IntelLabs/Xe-Forge) workflow — an LLM-driven loop that transforms PyTorch code into optimized Triton kernels for Intel XPU — into the hf-kernels skill format. Xe-Forge has been used to produce measured speedups on KernelBench Level 2 fused kernels (bf16) and Flash Attention forward (fp16); full results live in that repo. * hook up skill in the CI and add docs (#1) * fix: remove existing test repo before upload (#519) * fix: remove existing test repo before upload * fix: add missing content type * fix: prefer removing repos via hub library * fix: use lib from nix shell on runner * fix: disallow more than one instance of E2E running at once to avoid race conditions * fix: prefer using ci token * fix: update e2e to use trust_remote_code for the dummy user * fix: prefer using latest kernels-data in test * fix: update nix warns to throws (#540) * feat: bump cute dsl/cutlass (#545) * feat: add to vouched (#551) * hook up skill in the cli and add docs. --------- Co-authored-by: drbh <david.richard.holtz@gmail.com> Co-authored-by: Copilot <copilot@github.com> * Update version bumping scripts with the `--major` option (#550) * Update version bumping scripts with the `--major` option With this change the script supports both major and minor version bumping. For example: Codebase at `0.10.1.dev0` ``` (none) -> 0.10.1 --major -> 0.11.0 --dev -> 0.10.1.dev1 --dev --major -> 0.11.0.dev0 ``` Codebase at `0.10.1`: ``` (none) -> 0.10.2 --major -> 0.11.0 --dev -> 0.10.2.dev0 ``` These are the typical version bumping workflows within the project. * Sync .PHONY targets * upload: fix benchmark deletion filter to match upload filter (#543) * get_local_kernel api changed, leaving backend (second arg) empty for auto discovery (#555) * Paths fix Fixing some paths due to the skill living in the agent-specific location, outside of `kernel-builder/skills/xpu-kernels/`. * update enum --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: drbh <david.richard.holtz@gmail.com> Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com> Co-authored-by: Erik Kaunismäki <erik.kaum@gmail.com>
This PR deletes the existing test repo from staging before upload so we force repo creation each time, this should also fix issues where staging repos have refs removed but keep the repo which causes upload to fail on the
/refscall