Question related to _yarn_linear_ramp_mask #60

chizhang118 · 2024-04-09T22:25:58Z

I have a question for _yarn_linear_ramp_mask implementation, linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min). For this part, the calculation is based on the dimension rather than num of rotation, but when I checked the paper of defining the ramp function, it seems the r, alpha, beta are all relate to num of rotation rather than dimension since the definition of r(d) = L/lambda, which is the num of rotation comparing with alpha and beta.

So is the implementation the same as the paper statement?

Could anyone help me understand this part?

disperaller · 2024-07-09T04:03:45Z

it is just a mask, not the actual r values

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question related to _yarn_linear_ramp_mask #60

Question related to _yarn_linear_ramp_mask #60

chizhang118 commented Apr 9, 2024 •

edited

Loading

disperaller commented Jul 9, 2024

Question related to _yarn_linear_ramp_mask #60

Question related to _yarn_linear_ramp_mask #60

Comments

chizhang118 commented Apr 9, 2024 • edited Loading

disperaller commented Jul 9, 2024

chizhang118 commented Apr 9, 2024 •

edited

Loading