Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R-NaD with multiple heads #980

Closed
spktrm opened this issue Dec 21, 2022 · 3 comments
Closed

R-NaD with multiple heads #980

spktrm opened this issue Dec 21, 2022 · 3 comments

Comments

@spktrm
Copy link
Contributor

spktrm commented Dec 21, 2022

In the paper for rnad, the policy is divided into heads that handle different aspects of the game. How does rnad work in this case? The code provided shows a single-headed policy output. Would their simply be N policies for N heads, all of which are calculated using the same formal in the code?

@lanctot
Copy link
Collaborator

lanctot commented Dec 23, 2022

@perolat @bartdevylder any ideas?

@bartdevylder
Copy link
Collaborator

Hi, to generate a full trajectory (i.e. a full match of stratego), different policy heads are indeed used depending on the game state: the deployment head during the deployment phase, and then alternating the piece-selection head and the target-square head. Now for the RNaD-algorithm itself does not really matter how/where a policy originated, as long as such policies can be changed by backprop (using the neural replicator dynamics update in our case). In practice this means that during a learner update step on a batch of full trajectories, the weights of the different parts of the network will be affected in different ways: the torso parameters are affected by all states of the game, while the weights defining the different heads will only be updated according to game states where this head was active.

@spktrm
Copy link
Contributor Author

spktrm commented Dec 25, 2022

Thank you for the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants