Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bert: fix layer norm epsilon value #1946

Merged
merged 1 commit into from
Feb 26, 2024
Merged

bert: fix layer norm epsilon value #1946

merged 1 commit into from
Feb 26, 2024

Conversation

cebtenzzre
Copy link
Member

ref https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/blob/7dbbc90392e2f80f3d3c277d6e90027e55de9125/config.json#L13

This is a quick-and-dirty fix since this code is going to be replaced anyway. It would be more correct to read layer_norm_eps when we convert to GGUF, and load that hyperparameter from the GGUF at inference time.

The difference between an epsilon of 1e-6 in LLaMA 1 and 1e-5 in LLaMA-2 created a significant difference in perplexity, so they implemented this parameter to ggml_norm and ggml_rms_norm soon after LLaMA-2 came out, and until the switch to GGUF they defaulted to 5e-6, which was a suitable middleground, and allowed the user to customize the parameter at inference time via a command-line option.

The difference between 1e-5 and 1e-12 is certainly more significant... if only we had benchmarks for this code.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
@cebtenzzre cebtenzzre merged commit 007d469 into main Feb 26, 2024
6 of 10 checks passed
@cebtenzzre cebtenzzre deleted the fix-bert-norm-eps branch February 26, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants