Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma fixes - gelu #37

Merged
merged 1 commit into from
Mar 4, 2024
Merged

Gemma fixes - gelu #37

merged 1 commit into from
Mar 4, 2024

Conversation

danielhanchen
Copy link
Contributor

Just a few more Gemma fixes :) Currently checking for more as well!
Related PR: huggingface/transformers#29285, which showed RoPE must be done in float32 and not float16, causing positional encodings to lose accuracy.

  1. Activation function according to https://twitter.com/danielhanchen/status/1763613620909580505 (waiting for confirmation) should be approximate gelu and not exact gelu. Ie gelu should actually be gelu_pytorch_tanh which calls PytorchGELUTanh and that calls nn.functional.gelu(input, approximate="tanh"). gelu calls nn.functional.gelu(input, approximate="none")

Will scour for more and will add them here :) (Hopefully the gelu issue is the only one!!)

Copy link

google-cla bot commented Mar 2, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@pengchongjin
Copy link
Collaborator

Thanks for the contribution!

To clarify, have you included changes for RoPE embedding dtype issue that is mentioned above? I only see GeLU fix.

@danielhanchen
Copy link
Contributor Author

@pengchongjin Oh actually now that I'm reading the repo's code, I forgot about the RoPE embeddings part - I'm assuming using torch.autocast will also lose accuracy during finetuning, but will be fine during normal operations. Although I haven't tested it sadly :(

@pengchongjin
Copy link
Collaborator

OK, thanks, let's check in the GeLU fix first.

@pengchongjin pengchongjin merged commit 71e07fd into google:main Mar 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants