R-NaD with multiple heads #980

spktrm · 2022-12-21T22:07:16Z

In the paper for rnad, the policy is divided into heads that handle different aspects of the game. How does rnad work in this case? The code provided shows a single-headed policy output. Would their simply be N policies for N heads, all of which are calculated using the same formal in the code?

lanctot · 2022-12-23T12:42:27Z

@perolat @bartdevylder any ideas?

bartdevylder · 2022-12-23T13:35:06Z

Hi, to generate a full trajectory (i.e. a full match of stratego), different policy heads are indeed used depending on the game state: the deployment head during the deployment phase, and then alternating the piece-selection head and the target-square head. Now for the RNaD-algorithm itself does not really matter how/where a policy originated, as long as such policies can be changed by backprop (using the neural replicator dynamics update in our case). In practice this means that during a learner update step on a batch of full trajectories, the weights of the different parts of the network will be affected in different ways: the torso parameters are affected by all states of the game, while the weights defining the different heads will only be updated according to game states where this head was active.

spktrm · 2022-12-25T02:38:30Z

Thank you for the response!

lanctot closed this as completed Apr 20, 2023

lanctot mentioned this issue Apr 20, 2023

[RNaD] Multiple Policy Heads #1053

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R-NaD with multiple heads #980

R-NaD with multiple heads #980

spktrm commented Dec 21, 2022

lanctot commented Dec 23, 2022

bartdevylder commented Dec 23, 2022

spktrm commented Dec 25, 2022

R-NaD with multiple heads #980

R-NaD with multiple heads #980

Comments

spktrm commented Dec 21, 2022

lanctot commented Dec 23, 2022

bartdevylder commented Dec 23, 2022

spktrm commented Dec 25, 2022