Absolute positional encoding for grids #133

cswinter · 2022-01-03T04:04:28Z

Add an option similar to --translate that configures absolute positional encoding for environments with discrete positions with one embedding for each grid point.

The text was updated successfully, but these errors were encountered:

cswinter · 2022-01-13T06:52:41Z

The implementation in #135 seems to work, but I'm not actually seeing any benefit over just raw position features on any of our existing environments. It would be good to show that this actually does something useful. It could be there's still a bug, but otherwise perhaps what we should do is come up with the right toy environment that is designed specifically for the strengths of absolute positional encodings.

Bam4d · 2022-01-13T12:47:20Z

perhaps what we should do is come up with the right toy environment that is designed specifically for the strengths of absolute positional encodings.

What I'm going to do next week is add some procedural generation + individual entity control to the clusters environments. My intuition is that relative encoding will benefit training speed for these.

Not sure I'm understanding what the difference between absolute and relative encoding is though. Is "absolute" the same as "top down"? and "relative" translates all the coordinates relative to the actionable entities and calculates each action egocentrically to the entity?

cswinter · 2022-01-13T16:56:28Z

Conceptually, absolute positional encoding adds a feature vector to each entity depending on which position it is located at (e.g. x=3, y=7). So this allows entities to attend to entities at a particular position, and to access information about the location of that entity. Relative positional instead allows each entity to attend to other entities at a relative position, e.g. one position up and one to the left, and to access information about that relative difference in position. It's basically the difference between having features that give the absolute x/y positions of each entity or translating features to be centered around a given entity.

In terms of implementation, absolute positional encoding is fairly straightforward, you just have one positional embedding vector for every position on your grid, and then look up and add the positional embedding to the embedding of each entity corresponding to its position: #135

For relative positional encoding, we have a different relative position for each pairing of two entities. The way we implement it is by adding a relative positional key term to the attention operation that is different for every pair of entities (to allow attending to specific relative positions) and similarly adding a relative positional value term to the output of the attention operation (to insert information about the relative position of the entities we attended to): #139

I would expect relative positional encoding to work much better in any environment where the actor is not placed at a fixed position. There might still be some environments where absolute learned positional encoding works well and it would be good to have just as a baseline.

Bam4d · 2022-01-24T16:34:28Z

I'm not sure I understand the different between having the 'x', and 'y' in the feature vector and separating it out and learning seperate features for the positions.

In my understanding you have an embedding function e.g: Embedding_a = E([x, y, health]) but instead you create a feature vector by doing Embedding_b = E_pos([x,y]) + E([health]) in Neural Network terms, isn't this linear combination of two feature vectors the same mathematically (i.e Embedding_a = Embedding_b)? What am I missing here?

I can totally understand how relative positioning makes a big difference, but not sure whats the difference with absolute.

Bam4d · 2022-01-24T16:47:06Z

In my understanding you have an embedding function e.g: Embedding_a = E([x, y, health]) but instead you create a feature vector by doing Embedding_b = E_pos([x,y]) + E([health]) in Neural Network terms, isn't this linear combination of two feature vectors the same mathematically (i.e Embedding_a = Embedding_b)? What am I missing here?

Is the difference that the E_pos function is univeral across all entities rather than separate per-entity?

cswinter · 2022-01-24T16:54:22Z

Well, in practice I've not actually seen any benefit from this positional encoding so far, so its usefulness has yet to be established.

The main reason why the learned absolute positional encoding could work better is that, while in principle, the network can learn the same positional encoding in both cases, it might require more computation/layers in the case where we give it the x and y features directly. Specifically, in the very first layer, we would have Embedding_a = W [x, y] as a linear combination of x and y, whereas Embedding_b = E[x, y] performs a lookup of the positional embedding vector corresponding to (x, y), which allows it to immediately express any arbitrary function.

Another view on this is that it might allow the network to be much more sensitive to small changes in input, e.g. imagine we have 1000 (1D) positions. If we just pass this as a (normalized) x-feature, the difference between position 1 and position 2 might be very small and difficult for the network to pick up, despite being very important to the agent (the difference between walking into a trap, or still being one tile away).

cswinter · 2022-01-24T16:56:32Z

Is the difference that the E_pos function is univeral across all entities rather than separate per-entity?

Good point, that's another difference. It's actually unclear whether this works to the benefit of the positional encoding or not, since per-entity positional encodings would be more expressive. We could add an option to make the positional encoding per-entity as well.

Bam4d · 2022-01-24T17:00:43Z

I'm currently trying to get big cluster hyperparameter searching to do these kinds of experiments on some Griddly envs... so I'll try out various combinations.

cswinter self-assigned this Jan 3, 2022

cswinter mentioned this issue Jan 3, 2022

Learned absolute positional encoding #135

Closed

cswinter added the research label Jan 13, 2022

cswinter removed their assignment Jan 13, 2022

cswinter added the good first issue Good for newcomers label Jan 13, 2022

cswinter closed this as completed May 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Absolute positional encoding for grids #133

Absolute positional encoding for grids #133

cswinter commented Jan 3, 2022

cswinter commented Jan 13, 2022

Bam4d commented Jan 13, 2022

cswinter commented Jan 13, 2022

Bam4d commented Jan 24, 2022

Bam4d commented Jan 24, 2022

cswinter commented Jan 24, 2022

cswinter commented Jan 24, 2022

Bam4d commented Jan 24, 2022

Absolute positional encoding for grids #133

Absolute positional encoding for grids #133

Comments

cswinter commented Jan 3, 2022

cswinter commented Jan 13, 2022

Bam4d commented Jan 13, 2022

cswinter commented Jan 13, 2022

Bam4d commented Jan 24, 2022

Bam4d commented Jan 24, 2022

cswinter commented Jan 24, 2022

cswinter commented Jan 24, 2022

Bam4d commented Jan 24, 2022