Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Absolute positional encoding for grids #133

Closed
cswinter opened this issue Jan 3, 2022 · 8 comments
Closed

Absolute positional encoding for grids #133

cswinter opened this issue Jan 3, 2022 · 8 comments
Labels

Comments

@cswinter
Copy link
Collaborator

cswinter commented Jan 3, 2022

Add an option similar to --translate that configures absolute positional encoding for environments with discrete positions with one embedding for each grid point.

@cswinter cswinter self-assigned this Jan 3, 2022
@cswinter cswinter removed their assignment Jan 13, 2022
@cswinter
Copy link
Collaborator Author

The implementation in #135 seems to work, but I'm not actually seeing any benefit over just raw position features on any of our existing environments. It would be good to show that this actually does something useful. It could be there's still a bug, but otherwise perhaps what we should do is come up with the right toy environment that is designed specifically for the strengths of absolute positional encodings.

@cswinter cswinter added the good first issue Good for newcomers label Jan 13, 2022
@Bam4d
Copy link
Contributor

Bam4d commented Jan 13, 2022

perhaps what we should do is come up with the right toy environment that is designed specifically for the strengths of absolute positional encodings.

What I'm going to do next week is add some procedural generation + individual entity control to the clusters environments. My intuition is that relative encoding will benefit training speed for these.

Not sure I'm understanding what the difference between absolute and relative encoding is though. Is "absolute" the same as "top down"? and "relative" translates all the coordinates relative to the actionable entities and calculates each action egocentrically to the entity?

@cswinter
Copy link
Collaborator Author

Conceptually, absolute positional encoding adds a feature vector to each entity depending on which position it is located at (e.g. x=3, y=7). So this allows entities to attend to entities at a particular position, and to access information about the location of that entity. Relative positional instead allows each entity to attend to other entities at a relative position, e.g. one position up and one to the left, and to access information about that relative difference in position. It's basically the difference between having features that give the absolute x/y positions of each entity or translating features to be centered around a given entity.

In terms of implementation, absolute positional encoding is fairly straightforward, you just have one positional embedding vector for every position on your grid, and then look up and add the positional embedding to the embedding of each entity corresponding to its position: #135

For relative positional encoding, we have a different relative position for each pairing of two entities. The way we implement it is by adding a relative positional key term to the attention operation that is different for every pair of entities (to allow attending to specific relative positions) and similarly adding a relative positional value term to the output of the attention operation (to insert information about the relative position of the entities we attended to): #139

I would expect relative positional encoding to work much better in any environment where the actor is not placed at a fixed position. There might still be some environments where absolute learned positional encoding works well and it would be good to have just as a baseline.

@Bam4d
Copy link
Contributor

Bam4d commented Jan 24, 2022

I'm not sure I understand the different between having the 'x', and 'y' in the feature vector and separating it out and learning seperate features for the positions.

In my understanding you have an embedding function e.g: Embedding_a = E([x, y, health]) but instead you create a feature vector by doing Embedding_b = E_pos([x,y]) + E([health]) in Neural Network terms, isn't this linear combination of two feature vectors the same mathematically (i.e Embedding_a = Embedding_b)? What am I missing here?

I can totally understand how relative positioning makes a big difference, but not sure whats the difference with absolute.

@Bam4d
Copy link
Contributor

Bam4d commented Jan 24, 2022

In my understanding you have an embedding function e.g: Embedding_a = E([x, y, health]) but instead you create a feature vector by doing Embedding_b = E_pos([x,y]) + E([health]) in Neural Network terms, isn't this linear combination of two feature vectors the same mathematically (i.e Embedding_a = Embedding_b)? What am I missing here?

Is the difference that the E_pos function is univeral across all entities rather than separate per-entity?

@cswinter
Copy link
Collaborator Author

Well, in practice I've not actually seen any benefit from this positional encoding so far, so its usefulness has yet to be established.

The main reason why the learned absolute positional encoding could work better is that, while in principle, the network can learn the same positional encoding in both cases, it might require more computation/layers in the case where we give it the x and y features directly. Specifically, in the very first layer, we would have Embedding_a = W [x, y] as a linear combination of x and y, whereas Embedding_b = E[x, y] performs a lookup of the positional embedding vector corresponding to (x, y), which allows it to immediately express any arbitrary function.

Another view on this is that it might allow the network to be much more sensitive to small changes in input, e.g. imagine we have 1000 (1D) positions. If we just pass this as a (normalized) x-feature, the difference between position 1 and position 2 might be very small and difficult for the network to pick up, despite being very important to the agent (the difference between walking into a trap, or still being one tile away).

@cswinter
Copy link
Collaborator Author

Is the difference that the E_pos function is univeral across all entities rather than separate per-entity?

Good point, that's another difference. It's actually unclear whether this works to the benefit of the positional encoding or not, since per-entity positional encodings would be more expressive. We could add an option to make the positional encoding per-entity as well.

@Bam4d
Copy link
Contributor

Bam4d commented Jan 24, 2022

I'm currently trying to get big cluster hyperparameter searching to do these kinds of experiments on some Griddly envs... so I'll try out various combinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants