Skip to content

Conversation

@tianyu-l
Copy link
Contributor

as titled

also fixing a breakage caused by #1086

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 12, 2025
from torchtitan.tools.logging import logger


def is_sm89_or_later():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be made more general, otherwise you are going to have an evergrowing fleet of specific functions and users have to check if said specific version exists...
Instead how about a simple api with major minor verification?

is_sm_version_or_higher(major, minor=0)

and thus checking SM90 for example is just

is_sm_version_or_later(9, 0)

this means adding things like Blackwell (SM100, SM120) etc. do not need a new function - they can all use the same thing in the future rather than writing specific version checks.

rough code...did not test but to give an idea:

def get_cuda_capability():
    if torch.cuda.is_available():
        return torch.cuda.get_device_capability()
    return None

def is_sm_version_or_higher(major: int, minor: int =0)-> bool:
    """
    Returns:
        bool: True if GPU meets or exceeds the specified version
    """
    capability = get_cuda_capability()
    return capability is not None and capability >= (major, minor)

Copy link
Contributor

@lessw2020 lessw2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
Left a comment inline that I do think switching from a series of specific version check functions is more elegantly expressed as a single api that takes in (Major,Minor) and avoids future technical debt since this would scale to future versions intrinsically.
But can be a future PR as well, will work fine as is.

@tianyu-l tianyu-l merged commit 89cdc43 into main Apr 12, 2025
6 checks passed
@tianyu-l tianyu-l deleted the llama4 branch April 12, 2025 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Getting ValueError: too many values to unpack (expected 2) at moe.py#L249

5 participants