-
Notifications
You must be signed in to change notification settings - Fork 94
fix: remove existing test repo before upload #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
327a399
39e6e13
ed45c37
bbba328
09d34b7
ed0c47e
0ba827a
a915b9e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ env: | |
| E2E_BRANCH: e2e-${{ github.event.pull_request.number || github.run_id }}-${{ github.run_attempt }} | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} | ||
| group: ${{ github.workflow }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
|
|
@@ -45,6 +45,23 @@ jobs: | |
| env: | ||
| USER: runner | ||
|
|
||
| # Remove the existing test repo if it exists so we test repo creation in the upload step | ||
| - name: Delete existing test repo | ||
| run: | | ||
| cat > /tmp/delete_repos.py << 'PYEOF' | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not keep it under
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can move it if you think thats a better location - however since the snippet is so small and only really expected to be used in this case I thought it made sense to inline. lmk what you think! |
||
| from huggingface_hub import HfApi | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't we just use
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh I wasn't aware that was possible? can you share a snippet we can use to delete a repo in a more concise way? I was under the impression we'd need to create the api object pointing to the staging url. Happy to make changes to a better approach!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| import os | ||
|
|
||
| api = HfApi( | ||
| endpoint=os.environ["HF_ENDPOINT"] | ||
| ) | ||
| repo_id = os.environ["E2E_REPO_ID"] | ||
| for repo_type in ["kernel", "model"]: | ||
| api.delete_repo(repo_id, repo_type=repo_type, missing_ok=True) | ||
| print(f"Deleted {repo_type} repo (or it did not exist)") | ||
| PYEOF | ||
| nix-shell -p python3Packages.huggingface-hub --run "python /tmp/delete_repos.py" | ||
|
|
||
| - name: Init kernel project | ||
| run: | | ||
| cd /tmp | ||
|
|
@@ -121,6 +138,8 @@ jobs: | |
| CUDA_TAG=$(echo "$VARIANT" | grep -oP 'cu\d+') | ||
| echo "Installing torch matching variant $VARIANT (CUDA tag: $CUDA_TAG)" | ||
| uv sync --all-extras --dev | ||
| # E2E validates this checkout, so use the local kernels-data binding instead of the released wheel from uv.lock. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this needed?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this was added to ensure the E2E test is using the latest kernel-data in the repo, instead of the version from the lock file. I ran into an issue with a difference in the api used to load the metadata in a previous action run https://github.com/huggingface/kernels/actions/runs/25754879907/job/75640986832#step:5:36 and by reinstalling the kernels-data library after syncing, it make sure to use the latest changes (and not the one pinned in the uv.lock) |
||
| uv pip install --reinstall ../kernels-data/bindings/python | ||
| uv pip install --upgrade torch --index-url https://download.pytorch.org/whl/$CUDA_TAG | ||
| uv run --no-sync python -c "import torch; print(f'torch={torch.__version__}, cuda={torch.version.cuda}, cxx11_abi={torch.compiled_with_cxx11_abi()}')" | ||
|
|
||
|
|
@@ -131,7 +150,7 @@ jobs: | |
| import torch | ||
| from kernels import get_kernel | ||
|
|
||
| kernel = get_kernel('${{ env.E2E_REPO_ID }}', revision='${{ env.E2E_BRANCH }}') | ||
| kernel = get_kernel('${{ env.E2E_REPO_ID }}', revision='${{ env.E2E_BRANCH }}', trust_remote_code=True) | ||
|
|
||
| x = torch.randn(1024, 1024, dtype=torch.float32, device='cuda') | ||
| result = kernel.kernels_upload_test(x) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change updates the
concurrencygroup to be scoped to this workflow instead of the workflow + the pr.this way the actions should only run one time per workflow, and not per pr, so in the case multiple PR's are open, it should only run this workflow one at a time to avoid race conditions.
since this change adds a step to delete the staging repo, if the workflows ran in parallel then it could hypothetically have a race condition if one action deleted the repo after the other action uploaded and before it downloaded in the
get_kernelcall.so this change should guard against that race condition