v0.5: CUDA graph compilation
🎉 Enhancements
🐛 Bugfixes
- Fixed deadlock in sgmv_shrink kernel caused by imbalanced segments by @tgaddair in #156
- Fixed loading adapter from absolute s3 path by @tgaddair in #161
📝 Docs
- Update client docs with new endpoint source by @abidwael in #126
- Update client docs with new endpoint source by @abidwael in #146
🔧 Maintenance
- Reduce Docker size by removing duplicate torch install by @tgaddair in #144
- remove CACHE_MANAGER in flash_causal_lm.py by @michaelfeil in #157
New Contributors
- @michaelfeil made their first contribution in #157
Full Changelog: v0.4.1...v0.5.0