v0.1.16
What's Changed
- [Fix] Postpone cuda import to the calling site by @Ubospica in #231
- [Feature] Support model vocab size being less than tokenizer by @Ubospica in #237
- [Style] Remove unused headers by @DarkSharpness in #219
- Fallback to triton if we fail to compile for CUDA by @zbowling in #223
- [Feature] Build and run C++ Python tests by @DarkSharpness in #218
- [Fix] Fix missing dependency in ci by @DarkSharpness in #239
- Add dependency for ninja by @Ubospica in #244
- Bump to 0.1.16 by @Ubospica in #245
Full Changelog: v0.1.15...v0.1.16