Always run SiLU activation in float32 for LLaMA and Mistral #1540

tirthasheshpatel · 2024-04-01T18:47:32Z

PyTorch's SiLU always runs in float32. Running in half-precision causes catastrophic cancellation and leads to huge errors. This PR fixes this issue for both LLaMA and Mistral.

Here's the PyTorch implementations:

CPU Kernel: https://github.com/pytorch/pytorch/blob/35c493f2cf9b623bfdc7e6b34dc1cb39690a7919/aten/src/ATen/native/cpu/Activation.cpp#L1221-L1235

CUDA Kernel: https://github.com/pytorch/pytorch/blob/35c493f2cf9b623bfdc7e6b34dc1cb39690a7919/aten/src/ATen/native/cuda/ActivationSiluKernel.cu

Colab verifying this behavior: https://colab.research.google.com/drive/1v5CNVkWJtyIcQVbh-f51GKbqvrvfDyVd?usp=sharing

mattdangerw

Thanks! LGTM

…am#1540) * Fix discrepency between HF LLaMA and our implementation * Fix Mistral transformer decoder

tirthasheshpatel added 2 commits March 31, 2024 00:34

Fix discrepency between HF LLaMA and our implementation

5a89576

Fix Mistral transformer decoder

878a2be

tirthasheshpatel requested a review from mattdangerw April 1, 2024 18:47

mattdangerw approved these changes Apr 1, 2024

View reviewed changes

mattdangerw merged commit 3b3acb5 into keras-team:master Apr 1, 2024
11 checks passed

abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024

Always run SiLU activation in float32 for LLaMA and Mistral (keras-te…

dcebc7c

…am#1540) * Fix discrepency between HF LLaMA and our implementation * Fix Mistral transformer decoder

SamanehSaadat pushed a commit to SamanehSaadat/keras-nlp that referenced this pull request Apr 10, 2024

Always run SiLU activation in float32 for LLaMA and Mistral (keras-te…

f71c2e3

…am#1540) * Fix discrepency between HF LLaMA and our implementation * Fix Mistral transformer decoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always run SiLU activation in float32 for LLaMA and Mistral #1540

Always run SiLU activation in float32 for LLaMA and Mistral #1540

tirthasheshpatel commented Apr 1, 2024

mattdangerw left a comment

Always run SiLU activation in float32 for LLaMA and Mistral #1540

Always run SiLU activation in float32 for LLaMA and Mistral #1540

Conversation

tirthasheshpatel commented Apr 1, 2024

mattdangerw left a comment

Choose a reason for hiding this comment