You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is meant to act as an interactive roadmap for Mava's stable version release
We are happy to announce the Beta release of Mava! 馃コ Although we believe the Beta version offers much in the way of building MARL systems, there is still much work to be done.
Below is a list of what we want to implement before releasing a stable, benchmarked and tested, first official version of Mava. Please note that this checklist refers primarily to core features and components underlying the working philosophy behind the framework. For more in the way of extensions and/or feature requests such as additional system implementations, please refer to our wish list -> #247.
Checklist for first release
Create Nightly Release
Make quick start notebook for mava
Reach >90% code coverage for testing
Create general abstract system and builder classes.
Benchmark all systems on popular environments, including PettingZoo, SMAC (including system checkpoints and downloadable plotting data).
Benchmark system scaling.
Add system integration testing.
Better structuring/fixes for examples, perhaps create helper functions
Single process versions (confirm if system.py needed).
Centralised variable source. Counters and all fixed-length variables can be stored here for distributed access.
Provide the option for using multiple trainers. This enables faster training for non-weight sharing agents and allows for hyperparameter tuning.
Enhancing shared weights. Allow for custom specification of which agents share weights and which do not. Each agent has a net_key instead of agent_type. shared_weights now just means all the agent net_keys are the same.
Add executor trainer waiting logging info
Update .keys() and .values() code to fix bugs
Batch the executor policies that use the same networks for increased performance.
Implement hybrid action space for MA-DDPG, MA-D4PG and MAPPO.
Recurrency for QMIX and VDN - Inherit from MADRQN. enhancement.
Implement a partially observable debugging environment (flicker spread).
Handle legal actions in a generic way
_num_steps in MADDPG should be incremented outside the for loop in _update_target_network().
Fix typing for Robocup environment.
Investigate MADDPG/MAD4PG training at end of episode
Fix mypy import issue
Add/cleanup doc strings and comments
Add option to update executors every n environment steps instead of every step.
Support latest Acme and Reverb versions.
Create a create_system function for all the systems as is done in MADDPG. This simplifies the system code.
Add code duplication checker in Mava to check for major code duplication.
DeepMind Melting Pot environment suite integration.
Multi-agent MuJoCo integration.
dm-control soccer integration.
Fix the dockerfile. It seems to download unnecessary packages.
Fix MA-DDPG and MA-D4PG evaluator. Evaluator still has exploration in it.
Fix resource allocation in examples. Create a Mava function and change counters and replay to run on CPU. Assigned to @KaleabTessera .
Look at inconsistencies in MAPPO - between feedforward execution and sequence adders.
Different types of algorithms in the same Mava system maybe?
Implement a sequence adder wrapper to allow for information at end of episodes to be passed to sequences. This might allow us to implement some of QMIX's stability improvements.
Fix centralised and state-based MADD(4)PG in the case of the shared weight. Shared_weights causes that the critic does not know which agent's value function it should output. This is a problem in any case except for the full cooperative setting with a shared global reward. This can probably be fixed by just adding a one-hot indicating which agent's value function should be calculated.
Add option to have recurrent critic networks as well.
Change loggers to only log after each episode (not after every step). This reduces unnecessary computation as the loggers only write after 10 seconds by default.
Add some waiting function for the evaluator. Maybe it does not have to run continuously and can only run at every few executor episodes.
Fix PPO. Why is the entropy weighting zero?
Mask losses with the discount values to zero out losses which on zero padded data.
The text was updated successfully, but these errors were encountered:
This issue is meant to act as an interactive roadmap for Mava's stable version release
We are happy to announce the Beta release of Mava! 馃コ Although we believe the Beta version offers much in the way of building MARL systems, there is still much work to be done.
Below is a list of what we want to implement before releasing a stable, benchmarked and tested, first official version of Mava. Please note that this checklist refers primarily to core features and components underlying the working philosophy behind the framework. For more in the way of extensions and/or feature requests such as additional system implementations, please refer to our wish list -> #247.
Checklist for first release
The text was updated successfully, but these errors were encountered: