Feature/Multiple trainers for MA-DDPG #253

DriesSmit · 2021-06-29T08:27:29Z

What?

Implements a scaled-up version of MADDPG where multiple trainers can now be used with multiple executors. A centralised variable server is also implemented that absorbs the responsibilities of the counter node, trainer checkpointing and trainer variable source. The trainers and executors now read and write to the centralised variable source directly. A multiple trainer example is included where 3 trainers and 2 executors are used to train 3 non-weight sharing agents on the debugging environment.

Why?

Multiple trainers allow for the parallelisation of the trainer's tasks. Just like is already done with executors. This also opens the door to hyperparameter tuning directly using Mava in future updates.

How?

Added a new Scaled MA-DDPG system that allows for the use of multiple trainers.

Extra

This PR uses changes proposed in updated-network-keys. Therefore that PR should be merged first. After that point, this PR can be moved out of the draft status.

…nt and working.

…form.

KaleabTessera

Thanks so much for this massive effort @DriesSmit 🙌 🏆 This is close to ready from my standpoint, I just had some comments and then we can do the benchmarks 🚀 🔥

mava/systems/tf/mad4pg/system.py

mava/utils/adder_utils.py

mava/utils/environments/render_utils.py

mava/utils/sort_utils.py

KaleabTessera · 2021-09-10T15:23:33Z

mava/adders/reverb/base.py

+            )
+
+        # Flush the writer.
+        self._writer.flush(self._max_in_flight_items)


So for certain adders, we previously flushed after every write , e.g. sequence adder (

Mava/mava/adders/reverb/sequence.py

Line 136 in cbdbf6a

self._writer.flush(self._max_in_flight_items)

). Not we flush the writer once at the end (like we did for the transition adder). Not sure if this has any side-effects?

Great observation! I have not considered this. Flush seems to wait for all the data to be written. Please see here. So maybe its fine to do this only at the end of a trajectory step. Not sure. I am happy either way. I just don't want to make the writing even slower if we flush to often. But also if we need to flush after each write then we should change it.

Sorry, I phrased my previous message incorrectly. The flush is after every create_item , not write.

Let's do a flush after every create_item. Acme did that recently across all writers (google-deepmind/acme@94711b1). Please check we do it for the case where it is not multiple trainers as well.

I changed this back to how it was previously. I think doing it after every create_item broke my training.

mava/systems/tf/maddpg/system.py

mava/systems/tf/variable_utils.py

DriesSmit · 2021-09-23T07:41:33Z

Trainer logging does not seem to be working for mava/examples/petting_zoo/sisl/multiwalker/recurrent/decentralised/run_maddpg.py. Fix this. Also compare training MAD4G mutliwalker executor and training speeds with develop.

DriesSmit · 2021-10-07T14:22:44Z

Trainer logging does not seem to be working for mava/examples/petting_zoo/sisl/multiwalker/recurrent/decentralised/run_maddpg.py. Fix this. Also compare training MAD4G mutliwalker executor and training speeds with develop.

Fixed.

arnupretorius

Nice @DriesSmit! 🙌 Just see my few comments. :)

mava/components/tf/architectures/__init__.py

mava/components/tf/architectures/state_based.py

mava/systems/tf/mad4pg/__init__.py

mava/systems/tf/mad4pg/training.py

mava/systems/tf/maddpg/training.py

mava/wrappers/system_trainer_statistics.py

arnupretorius

Thanks @DriesSmit!! 🔥

DriesSmit added 20 commits June 15, 2021 15:16

feature: Save work on seperate variable source.

f543f5f

fix: A general inter-node variable communicator module is now impleme…

5333e74

…nt and working.

fix: Cleanup variable_utils and some other files.

ab10d87

Merge remote-tracking branch 'origin' into feature/mava-scaling

a84946e

Merge remote-tracking branch 'origin/develop' into feature/mava-scaling

329f95f

Add scaled mad4pg example.

5d93108

Merge remote-tracking branch 'origin/develop' into feature/mava-scaling

58926f5

feature: Save latest code.

0a1119b

feature: Save latest code.

e259dd1

fix: A lot of bugfixes.

e9d6a55

fix: A lot of bugfixes.

9f2f2cd

fix: Last save for today.

6648245

fix: Fix code so that other algorithms can still run in their normal …

6cda7fc

…form.

Fix some bugs in the debugging 3 trainer example.

bc2a013

fix: Fix some more bugs.

acbfd4e

Fix makefile.

9aa7834

fix: First attempt running.

a678bf3

fix: Fix bug where trainers did not update variable source.

e5037f9

Merge remote-tracking branch 'origin/develop' into feature/mava-scaling

ad76e16

fix: Small fix.

ad3a184

DriesSmit requested review from arnupretorius and KaleabTessera June 29, 2021 08:27

DriesSmit added 8 commits June 30, 2021 10:13

fix: Resolve merge conflict.

8c159b9

fix: Fix environment_loop for when no variable_client is presented.

fb29faa

fix: Fix error where no count variable crashes the environment_loop.

0607dca

fix: Small fix.

efec9d4

fix: Small fix.

ce939e7

Fix: Small fix.

2bf2e46

Merge remote-tracking branch 'origin/develop' into feature/mava-scaling

28cb0e9

Merge branch 'bugfix/logging-running-stats' into feature/mava-scaling

8b24be2

Small fix to trainer_networks setup.

163a1d7

KaleabTessera reviewed Sep 10, 2021

View reviewed changes

DriesSmit added 12 commits September 11, 2021 07:47

Address PR comments.

4663856

Address PR comments.

266db23

Add termination_conditions.

3bb8a3c

Change write flush back.

41429a4

Address PR comments.

ed271be

Update trainer naming to match dev branch for one trainer.

2a0d7c1

Quick fix.

91bc297

Small updates to make the trainers and executors faster.

7b0db2a

Small fixes.

e478da0

Small fix to the get and set async functions.

df9ea64

Add a buffer to the async_add function if it is called to often.

bd97bcc

Fix mypy issue?

0bdb98b

DriesSmit added 6 commits September 30, 2021 21:57

Small changes.

d1a51df

Fix error where get variables was not getting pulled in the trainer.

2a82347

Remove sigma.

b4f664f

Add state based critic with one one action.

62cd1fa

Small fixes.

71e4117

Fix problem with tf.function and num_step not incrementing.

19a867e

arnupretorius reviewed Oct 7, 2021

View reviewed changes

DriesSmit added 4 commits October 7, 2021 16:35

Address PR comments.

d3a272f

Small fix.

e39daca

Add comments on multiple trainer being experimental.

fb97273

Merge develop.

be2a4df

arnupretorius approved these changes Oct 25, 2021

View reviewed changes

arnupretorius merged commit 57ae3c8 into develop Oct 25, 2021

arnupretorius deleted the feature/mava-scaling branch October 25, 2021 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/Multiple trainers for MA-DDPG #253

Feature/Multiple trainers for MA-DDPG #253

DriesSmit commented Jun 29, 2021 •

edited

KaleabTessera left a comment

KaleabTessera Sep 10, 2021

DriesSmit Sep 11, 2021

KaleabTessera Sep 13, 2021

DriesSmit Sep 14, 2021

DriesSmit Sep 15, 2021

DriesSmit commented Sep 23, 2021

DriesSmit commented Oct 7, 2021

arnupretorius left a comment

arnupretorius left a comment

Feature/Multiple trainers for MA-DDPG #253

Feature/Multiple trainers for MA-DDPG #253

Conversation

DriesSmit commented Jun 29, 2021 • edited

What?

Why?

How?

Extra

KaleabTessera left a comment

Choose a reason for hiding this comment

KaleabTessera Sep 10, 2021

Choose a reason for hiding this comment

DriesSmit Sep 11, 2021

Choose a reason for hiding this comment

KaleabTessera Sep 13, 2021

Choose a reason for hiding this comment

DriesSmit Sep 14, 2021

Choose a reason for hiding this comment

DriesSmit Sep 15, 2021

Choose a reason for hiding this comment

DriesSmit commented Sep 23, 2021

DriesSmit commented Oct 7, 2021

arnupretorius left a comment

Choose a reason for hiding this comment

arnupretorius left a comment

Choose a reason for hiding this comment

DriesSmit commented Jun 29, 2021 •

edited