-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R-NaD with multiple heads #980
Comments
@perolat @bartdevylder any ideas? |
Hi, to generate a full trajectory (i.e. a full match of stratego), different policy heads are indeed used depending on the game state: the deployment head during the deployment phase, and then alternating the piece-selection head and the target-square head. Now for the RNaD-algorithm itself does not really matter how/where a policy originated, as long as such policies can be changed by backprop (using the neural replicator dynamics update in our case). In practice this means that during a learner update step on a batch of full trajectories, the weights of the different parts of the network will be affected in different ways: the torso parameters are affected by all states of the game, while the weights defining the different heads will only be updated according to game states where this head was active. |
Thank you for the response! |
In the paper for rnad, the policy is divided into heads that handle different aspects of the game. How does rnad work in this case? The code provided shows a single-headed policy output. Would their simply be N policies for N heads, all of which are calculated using the same formal in the code?
The text was updated successfully, but these errors were encountered: