-
Notifications
You must be signed in to change notification settings - Fork 40
Add support for targeted GPU architecture builds #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Enables targeting specific compute capabilities during the build process by adding an optional arch parameter to the workflow inputs. This provides more granular control over the build configuration while maintaining backward compatibility with the existing default behavior.
Enables building packages for multiple GPU architectures (80, 86, 89, 90, 100, 120) by adding an architecture matrix parameter to the publish workflow. Expands compatibility across different NVIDIA GPU generations and ensures optimal performance for each target architecture.
Enables building for a single compute capability to reduce build time when targeting specific GPU architectures. Updates wheel naming convention to include arch identifier when specified, ensuring proper artifact identification for architecture-specific builds.
Introduces automatic detection of the preferred SM (Streaming Multiprocessor) architecture from the current CUDA device to improve wheel filename specificity. The detection function safely handles cases where CUDA is unavailable or detection fails by returning None. This enhancement allows for more precise wheel identification based on the actual hardware capabilities rather than relying solely on CUDA version information. Removes unused imports to clean up the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for targeted GPU architecture builds by introducing an architecture input parameter to the build workflows. This allows building wheels optimized for specific NVIDIA GPU compute capabilities rather than the default multi-architecture approach.
- Adds architecture matrix support in publish workflow for broader GPU compatibility
- Implements automatic SM architecture detection for improved wheel naming
- Introduces single-architecture build capability to speed up development builds
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| setup.py | Adds SM architecture detection function and updates wheel naming to include architecture info |
| .github/workflows/publish.yml | Adds architecture matrix for building wheels across multiple GPU generations |
| .github/workflows/build.yml | Adds architecture input parameter for targeted builds |
| .github/workflows/_build.yml | Implements architecture-specific build logic and wheel naming |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| # Determine wheel URL based on CUDA version, torch version, python version and OS | ||
| wheel_filename = f"{PACKAGE_NAME}-{flash_version}+cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl" | ||
| wheel_filename = f"{PACKAGE_NAME}-{flash_version}+sm{sm_arch}cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl" |
Copilot
AI
Sep 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wheel filename will include 'sm{sm_arch}' even when sm_arch is None, resulting in 'smNone' in the filename. This should be handled conditionally to maintain backward compatibility when CUDA is unavailable or detection fails.
| wheel_filename = f"{PACKAGE_NAME}-{flash_version}+sm{sm_arch}cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl" | |
| sm_arch_str = f"sm{sm_arch}" if sm_arch is not None else "" | |
| plus = "+" if sm_arch_str else "" | |
| wheel_filename = f"{PACKAGE_NAME}-{flash_version}{plus}{sm_arch_str}cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl" |
| if [ -n "${MATRIX_ARCH}" ]; then | ||
| tmpname=sm${MATRIX_ARCH}cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }} | ||
| else | ||
| tmpname=cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }} | ||
| fi | ||
| wheel_name=$(ls dist/*whl | xargs -n 1 basename | sed "s/-/+$tmpname-/2") |
Copilot
AI
Sep 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The wheel naming logic is duplicated between setup.py and the workflow file. Consider extracting this logic to a shared function or script to avoid inconsistencies and reduce maintenance burden.
| if [ -n "${MATRIX_ARCH}" ]; then | |
| tmpname=sm${MATRIX_ARCH}cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }} | |
| else | |
| tmpname=cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }} | |
| fi | |
| wheel_name=$(ls dist/*whl | xargs -n 1 basename | sed "s/-/+$tmpname-/2") | |
| # Use shared Python script to generate wheel name suffix | |
| wheel_suffix=$(python scripts/wheel_name.py \ | |
| --arch "${MATRIX_ARCH}" \ | |
| --cuda-version "${WHEEL_CUDA_VERSION}" \ | |
| --torch-version "${MATRIX_TORCH_VERSION}" \ | |
| --cxx11-abi "${{ inputs.cxx11_abi }}") | |
| wheel_name=$(ls dist/*whl | xargs -n 1 basename | sed "s/-/+${wheel_suffix}-/2") |
Introduce an architecture input parameter to the build workflow, allowing for targeted builds for specific GPU architectures. Enhance the publish workflow with an architecture matrix for broader compatibility across NVIDIA GPU generations. Implement automatic detection of the preferred SM architecture for improved wheel naming and artifact identification. Clean up the codebase by removing unused imports.