Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always run SiLU activation in float32 for LLaMA and Mistral #1540

Merged
merged 2 commits into from
Apr 1, 2024

Conversation

tirthasheshpatel
Copy link
Contributor

PyTorch's SiLU always runs in float32. Running in half-precision causes catastrophic cancellation and leads to huge errors. This PR fixes this issue for both LLaMA and Mistral.

Here's the PyTorch implementations:

CPU Kernel: https://github.com/pytorch/pytorch/blob/35c493f2cf9b623bfdc7e6b34dc1cb39690a7919/aten/src/ATen/native/cpu/Activation.cpp#L1221-L1235

CUDA Kernel: https://github.com/pytorch/pytorch/blob/35c493f2cf9b623bfdc7e6b34dc1cb39690a7919/aten/src/ATen/native/cuda/ActivationSiluKernel.cu

Colab verifying this behavior: https://colab.research.google.com/drive/1v5CNVkWJtyIcQVbh-f51GKbqvrvfDyVd?usp=sharing

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@mattdangerw mattdangerw merged commit 3b3acb5 into keras-team:master Apr 1, 2024
11 checks passed
abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024
…am#1540)

* Fix discrepency between HF LLaMA and our implementation

* Fix Mistral transformer decoder
SamanehSaadat pushed a commit to SamanehSaadat/keras-nlp that referenced this pull request Apr 10, 2024
…am#1540)

* Fix discrepency between HF LLaMA and our implementation

* Fix Mistral transformer decoder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants