Positional Encoding Clarification #12

PrashantRanjan09 · 2018-05-28T20:45:43Z

@srush Thank you so much for this post. However it will be great if you can help me with following clarification regarding Positional Encoding.

The whole intent of using positional encoding is to bring a sense of Positions (absolute or relative) and Time. Using Sine(for even positions) and Cos (for odd positions) wave, how do we embed this?
Also,

For each position we get "dmodel" (say 512) sine representations. These representations have different frequencies. so, for each position we have 512 sinusoidal representations of different frequencies. What does these each representation signify or in other words each representation corresponding to different dimension with different frequency tells what about the particular position ?
You mentioned: since for any fixed offset k, PEpos+k can be represented as a linear function of PEpos.
Are we saying that since we can transform one function to another (linear transformations) we are keeping track of relative positions for any position with respect to any other position?

liangbright · 2018-06-01T17:15:06Z

add sine wave directly to word embedding vector. It is like to attach a name-tag on someone's face..., kind of weird

guillaume-chevalier · 2018-12-26T08:35:24Z

This might help you: https://github.com/guillaume-chevalier/Linear-Attention-Recurrent-Neural-Network/blob/master/AnnotatedMultiHeadAttention.ipynb

First, the almost-original pos encoding is plotted, without any random offset.
Second, the frequencies are changed to more "perfect" or "natural" ones so that it's like counting in binary, and also they are concatenated as features instead of added. I still wonder why the original frequencies were like they were (I'd love to know). I also wonder why they added them instead of concatenating them, here, concatenating make more sense to me.

gussmith · 2020-04-23T06:38:25Z

By concatenating, the dimension increases and thus the number of parameters.
That's one advantage to keep lower dimensions.

The addition is similar to response of cells in early visual cortex, like in V1 in the brain. Many cell response to a visual stimuli, say an edge, yet the response of every cell is in addition modulated by eye position (angle of eye direction) and by vergence (~focus distance).
Thus depending on where you look, the same visual stimulus will elicit a different response in the neuron. The overall population of cells thus not only encode the visual stimulus in the overall visual field, but also the eye positions (the direction where the eye(s) is/are looking)
Here the positional encoding is a bit like the eye position.

srush closed this as completed May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positional Encoding Clarification #12

Positional Encoding Clarification #12

PrashantRanjan09 commented May 28, 2018 •

edited

liangbright commented Jun 1, 2018 •

edited

guillaume-chevalier commented Dec 26, 2018 •

edited

gussmith commented Apr 23, 2020

Positional Encoding Clarification #12

Positional Encoding Clarification #12

Comments

PrashantRanjan09 commented May 28, 2018 • edited

liangbright commented Jun 1, 2018 • edited

guillaume-chevalier commented Dec 26, 2018 • edited

gussmith commented Apr 23, 2020

PrashantRanjan09 commented May 28, 2018 •

edited

liangbright commented Jun 1, 2018 •

edited

guillaume-chevalier commented Dec 26, 2018 •

edited