Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Positional Encoding Clarification #12

Closed
PrashantRanjan09 opened this issue May 28, 2018 · 3 comments
Closed

Positional Encoding Clarification #12

PrashantRanjan09 opened this issue May 28, 2018 · 3 comments

Comments

@PrashantRanjan09
Copy link

PrashantRanjan09 commented May 28, 2018

@srush Thank you so much for this post. However it will be great if you can help me with following clarification regarding Positional Encoding.

The whole intent of using positional encoding is to bring a sense of Positions (absolute or relative) and Time. Using Sine(for even positions) and Cos (for odd positions) wave, how do we embed this?
Also,

  1. For each position we get "dmodel" (say 512) sine representations. These representations have different frequencies. so, for each position we have 512 sinusoidal representations of different frequencies. What does these each representation signify or in other words each representation corresponding to different dimension with different frequency tells what about the particular position ?
  2. You mentioned: since for any fixed offset k, PEpos+k can be represented as a linear function of PEpos.
    Are we saying that since we can transform one function to another (linear transformations) we are keeping track of relative positions for any position with respect to any other position?
@liangbright
Copy link

liangbright commented Jun 1, 2018

add sine wave directly to word embedding vector. It is like to attach a name-tag on someone's face..., kind of weird

@guillaume-chevalier
Copy link

guillaume-chevalier commented Dec 26, 2018

This might help you: https://github.com/guillaume-chevalier/Linear-Attention-Recurrent-Neural-Network/blob/master/AnnotatedMultiHeadAttention.ipynb

  • First, the almost-original pos encoding is plotted, without any random offset.
  • Second, the frequencies are changed to more "perfect" or "natural" ones so that it's like counting in binary, and also they are concatenated as features instead of added. I still wonder why the original frequencies were like they were (I'd love to know). I also wonder why they added them instead of concatenating them, here, concatenating make more sense to me.

@gussmith
Copy link

By concatenating, the dimension increases and thus the number of parameters.
That's one advantage to keep lower dimensions.

The addition is similar to response of cells in early visual cortex, like in V1 in the brain. Many cell response to a visual stimuli, say an edge, yet the response of every cell is in addition modulated by eye position (angle of eye direction) and by vergence (~focus distance).
Thus depending on where you look, the same visual stimulus will elicit a different response in the neuron. The overall population of cells thus not only encode the visual stimulus in the overall visual field, but also the eye positions (the direction where the eye(s) is/are looking)
Here the positional encoding is a bit like the eye position.

@srush srush closed this as completed May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants