Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sincos位置编码问题 #379

Closed
chenxli opened this issue Jun 15, 2021 · 3 comments · Fixed by #387
Closed

sincos位置编码问题 #379

chenxli opened this issue Jun 15, 2021 · 3 comments · Fixed by #387
Assignees
Labels

Comments

@chenxli
Copy link

chenxli commented Jun 15, 2021

transformer的位置编码好像跟公式不一样?
position_enc = np.array([ [pos / np.power(10000, 2. * i / num_units) for i in range(num_units)] for pos in range(T)]
公式是PE(pos, 2i)=sin(pos/10000^{2i/d})
PE(pos,2i+1)=cos(pos/10000^{2i/d})
所以应该是np.power(10000, i // 2 * 2. / num_units) ???

@zanshuxun
Copy link
Collaborator

pos / np.power(10000, 2. * i / num_units) 对应公式里的pos/10000^{2i/d},然后下面分别对奇偶行做sin和cos:

# Second part, apply the cosine to even columns and sin to odds.
position_enc[:, 0::2] = np.sin(position_enc[:, 0::2]) # dim 2i
position_enc[:, 1::2] = np.cos(position_enc[:, 1::2]) # dim 2i+1

这样就得到了公式里的
PE(pos, 2i)=sin(pos/10000^{2i/d})
PE(pos,2i+1)=cos(pos/10000^{2i/d})

代码实现应该是和公式一样的吧

@chenxli
Copy link
Author

chenxli commented Jun 24, 2021

嗯嗯,多谢回复。pos / np.power(10000, 2. * i / num_units) 对应公式里的pos/10000^{2i/d},这句话没错,后面的奇偶数列取sin\cos也没错,问题在于np.array([ [pos / np.power(10000, 2. * i / num_units) for i in range(num_units)] for pos in range(T)]这句代码,得到的结果应该是
PE(pos,2i)=sin(pos/10000^{2*(2i)/d})
PE(pos,2i+1)=cos(pos/10000^{2(2i+1)/d})
因为这个pos / np.power(10000, 2. * i / num_units) for i in range(num_units)
可以代几个数进去
真实的话应该是0/d,0/d,2/d,2/d,4/d,4/d......
按照代码里的公式的话是0/d, 2/d, 4/d, 6/d, 8/d......
按照公式,没加sincos之前 相邻的两个数应该是完全一样的,而代码里的话是不一样的。

@zanshuxun
Copy link
Collaborator

嗯嗯,多谢回复。pos / np.power(10000, 2. * i / num_units) 对应公式里的pos/10000^{2i/d},这句话没错,后面的奇偶数列取sin\cos也没错,问题在于np.array([ [pos / np.power(10000, 2. * i / num_units) for i in range(num_units)] for pos in range(T)]这句代码,得到的结果应该是
PE(pos,2i)=sin(pos/10000^{2*(2i)/d})
PE(pos,2_i+1)=cos(pos/10000^{2(2_i+1)/d})
因为这个pos / np.power(10000, 2. * i / num_units) for i in range(num_units)
可以代几个数进去
真实的话应该是0/d,0/d,2/d,2/d,4/d,4/d......
按照代码里的公式的话是0/d, 2/d, 4/d, 6/d, 8/d......
按照公式,没加sincos之前 相邻的两个数应该是完全一样的,而代码里的话是不一样的。

这里代码确实有问题,非常感谢您细致地排查问题,我们会在下个版本(v0.8.7)中对该问题进行修复,祝好!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants