You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question for _yarn_linear_ramp_mask implementation, linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min). For this part, the calculation is based on the dimension rather than num of rotation, but when I checked the paper of defining the ramp function, it seems the r, alpha, beta are all relate to num of rotation rather than dimension since the definition of r(d) = L/lambda, which is the num of rotation comparing with alpha and beta.
So is the implementation the same as the paper statement?
Could anyone help me understand this part?
The text was updated successfully, but these errors were encountered:
I have a question for _yarn_linear_ramp_mask implementation, linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min). For this part, the calculation is based on the dimension rather than num of rotation, but when I checked the paper of defining the ramp function, it seems the r, alpha, beta are all relate to num of rotation rather than dimension since the definition of r(d) = L/lambda, which is the num of rotation comparing with alpha and beta.
So is the implementation the same as the paper statement?
Could anyone help me understand this part?
The text was updated successfully, but these errors were encountered: