New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/Population Based Training #285
Conversation
This pull request introduces 6 alerts when merging c234d31 into 7e6fd83 - view on LGTM.com new alerts:
|
This pull request introduces 6 alerts when merging 3a59f3c into 7e6fd83 - view on LGTM.com new alerts:
|
This pull request introduces 6 alerts when merging 24d6707 into 7e6fd83 - view on LGTM.com new alerts:
|
This pull request introduces 6 alerts when merging b25af4a into 7e6fd83 - view on LGTM.com new alerts:
|
This pull request introduces 6 alerts when merging 3fdfcba into 7e6fd83 - view on LGTM.com new alerts:
|
What?
Add the first example of population based training in Mava. This example uses the recurrent MAD4PG algorithm to train a population of 5 networks, using 5 trainers and 5 executors, on the debugging environment. The hyperparameters that are getting tuned are the discount factor, target update rate and the target update period. This PR will remain in draft form for now as it still needs to be tested in a more complicated environment for longer time periods.
Why?
Population based training allows for the joint optimisation of hyperparameters and network parameters in one training setting.
How?
Various hooks have been added inside the MADDPG system. A PBT wrapper has also been added. The PBT wrapper can now wrap an MADDG and MAD4PG system and overwrite the appropriate hooks to add PBT to the system.
Extra