Skip to content

v0.4.0

@angeloskath angeloskath tagged this 15 Apr 13:06
This release contains mostly improvements on kernels and a few features and
fixes. Namely:

- We have a new super fast causal-linear kernel written by NVIDIA's Julien
  Demouth
- We have faster clustered broadcast and clustered aggregate kernels written by
  Apoorv Vyas

What should have been in this release but isn't because I didn't have time to
work on it :-) :

- Fancier masking that allows for different masks per sample while maintaining
  backwards compatibility
- Checkpointing for training huge models on single GPU machines
- 16-bit kernels for linear, local and clustering
Assets 2
Loading