Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for Mava's next release #246

Closed
20 of 43 tasks
arnupretorius opened this issue Jun 23, 2021 · 0 comments
Closed
20 of 43 tasks

Roadmap for Mava's next release #246

arnupretorius opened this issue Jun 23, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@arnupretorius
Copy link
Collaborator

arnupretorius commented Jun 23, 2021

This issue is meant to act as an interactive roadmap for Mava's stable version release

We are happy to announce the Beta release of Mava! 馃コ Although we believe the Beta version offers much in the way of building MARL systems, there is still much work to be done.

Below is a list of what we want to implement before releasing a stable, benchmarked and tested, first official version of Mava. Please note that this checklist refers primarily to core features and components underlying the working philosophy behind the framework. For more in the way of extensions and/or feature requests such as additional system implementations, please refer to our wish list -> #247.

Checklist for first release

  • Create Nightly Release
  • Make quick start notebook for mava
  • Reach >90% code coverage for testing
  • Create general abstract system and builder classes.
  • Benchmark all systems on popular environments, including PettingZoo, SMAC (including system checkpoints and downloadable plotting data).
  • Benchmark system scaling.
  • Add system integration testing.
  • Better structuring/fixes for examples, perhaps create helper functions
  • Single process versions (confirm if system.py needed).
  • Centralised variable source. Counters and all fixed-length variables can be stored here for distributed access.
  • Provide the option for using multiple trainers. This enables faster training for non-weight sharing agents and allows for hyperparameter tuning.
  • Enhancing shared weights. Allow for custom specification of which agents share weights and which do not. Each agent has a net_key instead of agent_type. shared_weights now just means all the agent net_keys are the same.
  • Add executor trainer waiting logging info
  • Update .keys() and .values() code to fix bugs
  • Batch the executor policies that use the same networks for increased performance.
  • Implement hybrid action space for MA-DDPG, MA-D4PG and MAPPO.
  • Recurrency for QMIX and VDN - Inherit from MADRQN. enhancement.
  • Implement a partially observable debugging environment (flicker spread).
  • Handle legal actions in a generic way
  • _num_steps in MADDPG should be incremented outside the for loop in _update_target_network().
  • Fix typing for Robocup environment.
  • Investigate MADDPG/MAD4PG training at end of episode
  • Fix mypy import issue
  • Add/cleanup doc strings and comments
  • Add option to update executors every n environment steps instead of every step.
  • Support latest Acme and Reverb versions.
  • Create a create_system function for all the systems as is done in MADDPG. This simplifies the system code.
  • Add code duplication checker in Mava to check for major code duplication.
  • DeepMind Melting Pot environment suite integration.
  • Multi-agent MuJoCo integration.
  • dm-control soccer integration.
  • Fix the dockerfile. It seems to download unnecessary packages.
  • Fix MA-DDPG and MA-D4PG evaluator. Evaluator still has exploration in it.
  • Fix resource allocation in examples. Create a Mava function and change counters and replay to run on CPU. Assigned to @KaleabTessera .
  • Look at inconsistencies in MAPPO - between feedforward execution and sequence adders.
  • Different types of algorithms in the same Mava system maybe?
  • Implement a sequence adder wrapper to allow for information at end of episodes to be passed to sequences. This might allow us to implement some of QMIX's stability improvements.
  • Fix centralised and state-based MADD(4)PG in the case of the shared weight. Shared_weights causes that the critic does not know which agent's value function it should output. This is a problem in any case except for the full cooperative setting with a shared global reward. This can probably be fixed by just adding a one-hot indicating which agent's value function should be calculated.
  • Add option to have recurrent critic networks as well.
  • Change loggers to only log after each episode (not after every step). This reduces unnecessary computation as the loggers only write after 10 seconds by default.
  • Add some waiting function for the evaluator. Maybe it does not have to run continuously and can only run at every few executor episodes.
  • Fix PPO. Why is the entropy weighting zero?
  • Mask losses with the discount values to zero out losses which on zero padded data.
@arnupretorius arnupretorius added the enhancement New feature or request label Jun 23, 2021
@arnupretorius arnupretorius self-assigned this Jun 23, 2021
@arnupretorius arnupretorius pinned this issue Jun 23, 2021
@arnupretorius arnupretorius changed the title Roadmap to Mava release V0.1.0 Mava Roadmap Jul 6, 2021
@arnupretorius arnupretorius changed the title Mava Roadmap Roadmap for Mava's next release Jul 6, 2021
@arnupretorius arnupretorius unpinned this issue Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants