bert: fix layer norm epsilon value #1946

cebtenzzre · 2024-02-08T21:00:49Z

ref https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/blob/7dbbc90392e2f80f3d3c277d6e90027e55de9125/config.json#L13

This is a quick-and-dirty fix since this code is going to be replaced anyway. It would be more correct to read layer_norm_eps when we convert to GGUF, and load that hyperparameter from the GGUF at inference time.

The difference between an epsilon of 1e-6 in LLaMA 1 and 1e-5 in LLaMA-2 created a significant difference in perplexity, so they implemented this parameter to ggml_norm and ggml_rms_norm soon after LLaMA-2 came out, and until the switch to GGUF they defaulted to 5e-6, which was a suitable middleground, and allowed the user to customize the parameter at inference time via a command-line option.

The difference between 1e-5 and 1e-12 is certainly more significant... if only we had benchmarks for this code.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

bert: fix layer norm epsilon value

fe5cc0e

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso February 8, 2024 21:00

cebtenzzre force-pushed the main branch from 157514b to dcb0e6c Compare February 9, 2024 17:15

manyoso approved these changes Feb 26, 2024

View reviewed changes

cebtenzzre merged commit 007d469 into main Feb 26, 2024
6 of 10 checks passed

cebtenzzre deleted the fix-bert-norm-eps branch February 26, 2024 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert: fix layer norm epsilon value #1946

bert: fix layer norm epsilon value #1946

cebtenzzre commented Feb 8, 2024

bert: fix layer norm epsilon value #1946

bert: fix layer norm epsilon value #1946

Conversation

cebtenzzre commented Feb 8, 2024