Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Introduce an architecture input parameter to the build workflow, allowing for targeted builds for specific GPU architectures. Enhance the publish workflow with an architecture matrix for broader compatibility across NVIDIA GPU generations. Implement automatic detection of the preferred SM architecture for improved wheel naming and artifact identification. Clean up the codebase by removing unused imports.

Enables targeting specific compute capabilities during the build process by adding an optional arch parameter to the workflow inputs. This provides more granular control over the build configuration while maintaining backward compatibility with the existing default behavior.
Enables building packages for multiple GPU architectures (80, 86, 89, 90, 100, 120) by adding an architecture matrix parameter to the publish workflow.

Expands compatibility across different NVIDIA GPU generations and ensures optimal performance for each target architecture.
Enables building for a single compute capability to reduce build time when targeting specific GPU architectures.

Updates wheel naming convention to include arch identifier when specified, ensuring proper artifact identification for architecture-specific builds.
Introduces automatic detection of the preferred SM (Streaming Multiprocessor) architecture from the current CUDA device to improve wheel filename specificity.

The detection function safely handles cases where CUDA is unavailable or detection fails by returning None. This enhancement allows for more precise wheel identification based on the actual hardware capabilities rather than relying solely on CUDA version information.

Removes unused imports to clean up the codebase.
Copilot AI review requested due to automatic review settings September 20, 2025 12:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for targeted GPU architecture builds by introducing an architecture input parameter to the build workflows. This allows building wheels optimized for specific NVIDIA GPU compute capabilities rather than the default multi-architecture approach.

  • Adds architecture matrix support in publish workflow for broader GPU compatibility
  • Implements automatic SM architecture detection for improved wheel naming
  • Introduces single-architecture build capability to speed up development builds

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
setup.py Adds SM architecture detection function and updates wheel naming to include architecture info
.github/workflows/publish.yml Adds architecture matrix for building wheels across multiple GPU generations
.github/workflows/build.yml Adds architecture input parameter for targeted builds
.github/workflows/_build.yml Implements architecture-specific build logic and wheel naming

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


# Determine wheel URL based on CUDA version, torch version, python version and OS
wheel_filename = f"{PACKAGE_NAME}-{flash_version}+cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl"
wheel_filename = f"{PACKAGE_NAME}-{flash_version}+sm{sm_arch}cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl"
Copy link

Copilot AI Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wheel filename will include 'sm{sm_arch}' even when sm_arch is None, resulting in 'smNone' in the filename. This should be handled conditionally to maintain backward compatibility when CUDA is unavailable or detection fails.

Suggested change
wheel_filename = f"{PACKAGE_NAME}-{flash_version}+sm{sm_arch}cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl"
sm_arch_str = f"sm{sm_arch}" if sm_arch is not None else ""
plus = "+" if sm_arch_str else ""
wheel_filename = f"{PACKAGE_NAME}-{flash_version}{plus}{sm_arch_str}cu{cuda_version}torch{torch_version}cxx11abi{cxx11_abi}-{python_version}-{python_version}-{platform_name}.whl"

Copilot uses AI. Check for mistakes.
Comment on lines +188 to 193
if [ -n "${MATRIX_ARCH}" ]; then
tmpname=sm${MATRIX_ARCH}cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }}
else
tmpname=cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }}
fi
wheel_name=$(ls dist/*whl | xargs -n 1 basename | sed "s/-/+$tmpname-/2")
Copy link

Copilot AI Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The wheel naming logic is duplicated between setup.py and the workflow file. Consider extracting this logic to a shared function or script to avoid inconsistencies and reduce maintenance burden.

Suggested change
if [ -n "${MATRIX_ARCH}" ]; then
tmpname=sm${MATRIX_ARCH}cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }}
else
tmpname=cu${WHEEL_CUDA_VERSION}torch${MATRIX_TORCH_VERSION}cxx11abi${{ inputs.cxx11_abi }}
fi
wheel_name=$(ls dist/*whl | xargs -n 1 basename | sed "s/-/+$tmpname-/2")
# Use shared Python script to generate wheel name suffix
wheel_suffix=$(python scripts/wheel_name.py \
--arch "${MATRIX_ARCH}" \
--cuda-version "${WHEEL_CUDA_VERSION}" \
--torch-version "${MATRIX_TORCH_VERSION}" \
--cxx11-abi "${{ inputs.cxx11_abi }}")
wheel_name=$(ls dist/*whl | xargs -n 1 basename | sed "s/-/+${wheel_suffix}-/2")

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems merged commit c60a014 into main Sep 20, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants