Fix the rotary embedding computation in LLaMA #1544

tirthasheshpatel · 2024-04-02T20:45:35Z

LLaMA backbone ignored the start_index parameter when computing the rotary embeddings which lead to numerical issues during generation. This PR fixes it along with the reverse embedding layer in both Mistral and LLaMA: run the reverse embedding stage in compute_dtype instead of full-precision. This is how HF does it, so helps get the numerics closer.

Also run the reverse embedding stage in compute_dtype instead of full-precision. This is how HF does it, so helps get the numerics closer

mattdangerw · 2024-04-02T21:00:18Z

keras_nlp/models/llama/llama_attention.py

+        # If `cache_update_index` is a tensor, RotaryEmbedding expects it
+        # to have dtype `self.compute_dtype`.
+        start_index = ops.cast(
+            start_index, self.rotary_embedding_layer.compute_dtype


Won't this run into the same problems as #5 in https://unsloth.ai/blog/gemma-bugs? float16 or bfloat16 are both bad for an incrementing integer.

mattdangerw · 2024-04-02T22:38:25Z

keras_nlp/models/llama/llama_attention.py


        # [batch_shape, seq_len, num_key_value_heads, head_dim]
        # -> [batch_shape, seq_len, num_heads, head_dim]
-        key = ops.repeat(key, repeats=self.num_key_value_groups, axis=2)
-        value = ops.repeat(value, repeats=self.num_key_value_groups, axis=2)
+        key = ops.repeat(key, repeats=self._num_key_value_groups, axis=2)


should not be underscore I think

* Fix rotary embedding computation in LLaMA Also run the reverse embedding stage in compute_dtype instead of full-precision. This is how HF does it, so helps get the numerics closer * Don't cast start_index; save rope keys * Remove underscore from num_key_value_heads

Fix rotary embedding computation in LLaMA

768184c

Also run the reverse embedding stage in compute_dtype instead of full-precision. This is how HF does it, so helps get the numerics closer

tirthasheshpatel requested a review from mattdangerw April 2, 2024 20:45

mattdangerw reviewed Apr 2, 2024

View reviewed changes

Don't cast start_index; save rope keys

2d7b70c

mattdangerw reviewed Apr 2, 2024

View reviewed changes

Remove underscore from num_key_value_heads

409c00c

mattdangerw approved these changes Apr 2, 2024

View reviewed changes

mattdangerw merged commit 9ac3335 into keras-team:master Apr 3, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the rotary embedding computation in LLaMA #1544

Fix the rotary embedding computation in LLaMA #1544

tirthasheshpatel commented Apr 2, 2024

mattdangerw Apr 2, 2024

mattdangerw Apr 2, 2024

Fix the rotary embedding computation in LLaMA #1544

Fix the rotary embedding computation in LLaMA #1544

Conversation

tirthasheshpatel commented Apr 2, 2024

mattdangerw Apr 2, 2024

Choose a reason for hiding this comment

mattdangerw Apr 2, 2024

Choose a reason for hiding this comment