v1.0.0a6: fused CE HVP kernel

Latest

noahgolmant released this 13 May 15:16

v1.0.0a6

1db045c

Fused CE Hessian-vector product kernel via torch.compile (CPU/CUDA/MPS) and a hand-written Triton kernel (CUDA, online softmax). Auto-selected via fused="auto" on hf_lm_loss_of_output(). ~3.4× faster, 2× less memory than eager on A100 at LM-scale vocabulary (PR #47).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0a6: fused CE HVP kernel

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!