Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backward rope after YaRN #3974

Merged
merged 6 commits into from
Nov 7, 2023

Conversation

xaedes
Copy link
Collaborator

@xaedes xaedes commented Nov 6, 2023

The rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in the backward functions.

The code for the rope backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

To avoid future regressions this PR removes the near-duplicate backward functions and reuses the forward code:

For this a new function argument bool forward was added to ggml_compute_forward_rope_f32 and ggml_compute_forward_rope_f16. The sin-values will be negated when forward is false.

Additionally the rope parameters in ggml_compute_backward are now correctly propagated to the backward process.

rope backward process was broken after YaRN RoPE (ggerganov#2268) implementation, due to missing changes in backward functions.

the code for the backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
the sin-values will be negated when forward is false.
it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.
@xaedes xaedes mentioned this pull request Nov 6, 2023
@cebtenzzre
Copy link
Collaborator

xpos is only implemented in ggml_compute_forward_rope_f32 - we should probably fix that.

@cebtenzzre cebtenzzre linked an issue Nov 6, 2023 that may be closed by this pull request
@CoruNethron
Copy link

CoruNethron commented Nov 7, 2023

It's probably a good idea to rename function after that change to not only mention forward. Like ggml_compute_directed_rope_32 or whatever.
Also, is this MR meant that we will be able in theory to finetune models like this 128k ctx one ? Leaving out the fact that it would take much time and memory.

@ggerganov ggerganov merged commit e9c1cec into ggerganov:master Nov 7, 2023
32 checks passed
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
* fix backward process of rope

rope backward process was broken after YaRN RoPE (ggerganov#2268) implementation, due to missing changes in backward functions.

the code for the backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
the sin-values will be negated when forward is false.

* fix finetune rope call to use correct default attn_factor of 1.0f

* remove unused `ggml_rope_xpos_back`

it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.

* fix comments explaining the sinus sign in ggml_forward_rope

* add missing function arguments in declaration

* fix function argument type in declaration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

train-text-from-scratch and finetune nan loss on iter=2
4 participants