New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Absolute positional encoding for grids #133
Comments
The implementation in #135 seems to work, but I'm not actually seeing any benefit over just raw position features on any of our existing environments. It would be good to show that this actually does something useful. It could be there's still a bug, but otherwise perhaps what we should do is come up with the right toy environment that is designed specifically for the strengths of absolute positional encodings. |
What I'm going to do next week is add some procedural generation + individual entity control to the clusters environments. My intuition is that relative encoding will benefit training speed for these. Not sure I'm understanding what the difference between absolute and relative encoding is though. Is "absolute" the same as "top down"? and "relative" translates all the coordinates relative to the actionable entities and calculates each action egocentrically to the entity? |
Conceptually, absolute positional encoding adds a feature vector to each entity depending on which position it is located at (e.g. x=3, y=7). So this allows entities to attend to entities at a particular position, and to access information about the location of that entity. Relative positional instead allows each entity to attend to other entities at a relative position, e.g. one position up and one to the left, and to access information about that relative difference in position. It's basically the difference between having features that give the absolute x/y positions of each entity or translating features to be centered around a given entity. In terms of implementation, absolute positional encoding is fairly straightforward, you just have one positional embedding vector for every position on your grid, and then look up and add the positional embedding to the embedding of each entity corresponding to its position: #135 For relative positional encoding, we have a different relative position for each pairing of two entities. The way we implement it is by adding a relative positional key term to the attention operation that is different for every pair of entities (to allow attending to specific relative positions) and similarly adding a relative positional value term to the output of the attention operation (to insert information about the relative position of the entities we attended to): #139 I would expect relative positional encoding to work much better in any environment where the actor is not placed at a fixed position. There might still be some environments where absolute learned positional encoding works well and it would be good to have just as a baseline. |
I'm not sure I understand the different between having the 'x', and 'y' in the feature vector and separating it out and learning seperate features for the positions. In my understanding you have an embedding function e.g: I can totally understand how relative positioning makes a big difference, but not sure whats the difference with absolute. |
Is the difference that the E_pos function is univeral across all entities rather than separate per-entity? |
Well, in practice I've not actually seen any benefit from this positional encoding so far, so its usefulness has yet to be established. The main reason why the learned absolute positional encoding could work better is that, while in principle, the network can learn the same positional encoding in both cases, it might require more computation/layers in the case where we give it the Another view on this is that it might allow the network to be much more sensitive to small changes in input, e.g. imagine we have 1000 (1D) positions. If we just pass this as a (normalized) x-feature, the difference between position 1 and position 2 might be very small and difficult for the network to pick up, despite being very important to the agent (the difference between walking into a trap, or still being one tile away). |
Good point, that's another difference. It's actually unclear whether this works to the benefit of the positional encoding or not, since per-entity positional encodings would be more expressive. We could add an option to make the positional encoding per-entity as well. |
I'm currently trying to get big cluster hyperparameter searching to do these kinds of experiments on some Griddly envs... so I'll try out various combinations. |
Add an option similar to
--translate
that configures absolute positional encoding for environments with discrete positions with one embedding for each grid point.The text was updated successfully, but these errors were encountered: