feat: add Blackwell GPU (sm_120) CUDA support#401
Conversation
Set TORCH_CUDA_ARCH_LIST in the CUDA build step to include 12.0+PTX for forward compatibility with Blackwell GPUs (RTX 5070 Ti, 5080, etc). Pre-built PyTorch cu128 wheels only ship native kernels for sm_80/86/89/90. Without this, Blackwell GPU users get "no kernel image is available for execution on the device" at runtime. Fixes jamiepine#386 Related: jamiepine#395, jamiepine#396, jamiepine#399, jamiepine#400
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe GitHub Actions release workflow for Windows now explicitly specifies CUDA GPU architectures ( Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Applies the compatibility-checker portion of #367. Adds a check_cuda_compatibility() helper that compares the current device's compute capability against torch.cuda._get_arch_list() and returns a human-readable warning if the PyTorch build doesn't support it. Wired into three places: • HealthResponse gains a gpu_compatibility_warning field so clients can surface the issue in the UI • Startup logs the warning as WARN level • _get_gpu_status() appends "[UNSUPPORTED - see logs]" to the GPU label shown in settings Skipped #367's other half — the switch from stable to nightly cu128 wheels across release.yml, build_binary.py, and justfile. That's redundant with #401's TORCH_CUDA_ARCH_LIST=...12.0+PTX approach and would introduce non-deterministic builds from shifting nightly releases. Co-Authored-By: nyzxor <216715770+nyzxor@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds TORCH_CUDA_ARCH_LIST to the CUDA build step in release.yml to include 12.0+PTX for Blackwell GPU forward compatibility. 5 reports (#386 #395 #396 #399 #400) from RTX 50-series users hitting 'no kernel image' because pre-built PyTorch cu128 doesn't include sm_120. Fixes #386. This contribution was developed with AI assistance (Claude Code).
Summary by CodeRabbit