Social Curiosity Module implementation and MOA fixes #179

internetcoffeephone · 2020-05-13T21:31:29Z

@eugenevinitsky As discussed in our email conversation.

Highlights:

Optimizations/bugfixes in map_env
Move the MOA predictions to a single tensor evaluation rather than handling them in postprocessing
On the whole, experiments run a lot faster now
Upgrade to ray 0.8.4, tensorflow-gpu 2.1.0, other misc library upgrades
Fix tests
Add isort, pre-commit and black config files.
Add a simple environment where a single agent has to pull a set of switches, then walk through a door. This can be used for testing purposes.
The addition of my own model, the social curiosity module (often denoted in the code as scm): a combination of the social influence reward with Burda's intrinsic curiosity module.

Linting to abide by flake8.

…lues in curiosity_model.

…y environment step.

Accompanying the according black linting.

The expectation is that this value is never used, provided that lr_curriculum values are defined.

The argument is pointless otherwise.

…enefits from vectorization.

This prevents errors in eager execution. Renamed A3CTFPolicy to A3CAuxTFPolicy for clarity.

Moved fcnet_hiddens to model dict instead of model/custom_options dict for moa. Removed run_train_baseline and run_train_baseline_moa scripts. These will be re-added later. Added default args model and small_model.

… in the model config. Changed some comments to be more clear.

eugenevinitsky · 2020-07-01T21:31:35Z

ray_autoscale.yaml

+min_workers: 14 #<NUM WORKERS IN CLUSTER>
+
+# The maximum number of workers nodes to launch in addition to the head
+# node. This takes precedence over min_workers.
+max_workers: 14
+
+initial_workers: 14


should set these to 0

All 3? If so, done.

eugenevinitsky · 2020-07-01T21:31:52Z

ray_uint8_patch.sh

+# Find Python folder name so that this patch can run correctly on different versions of Python.
+python_folder_name=$(ls venv/lib)
+
+# Apply patches
+sed -i '119s/tf.float32/tf.uint8/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/policy/dynamic_tf_policy.py # Hardcoded observation space to uint8.
+sed -i '76s/np.float32/np.uint8/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/preprocessors.py # Same as above.
+sed -i '231s/np.zeros(self.shape)/np.zeros(self.shape, dtype=self.observation_space.dtype)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/preprocessors.py # Change observation shape to what we actually provide
+sed -i '214s/tf.int64/action_space.dtype/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/catalog.py # Change action shape to what we actually provide
+sed -i '56s/tf.math.argmax(self.inputs, axis=1)/tf.math.argmax(self.inputs, axis=1, output_type=tf.int32)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/tf/tf_action_dist.py # Actions should not sample at int64, int32 is the lowest that multinomial takes
+sed -i '84s/tf.multinomial(self.inputs, 1)/tf.multinomial(self.inputs, 1, output_dtype=tf.int32)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/tf/tf_action_dist.py # Same as above
+sed -i '656i\        actions = np.array(actions, dtype=policy.action_space.dtype)' venv/lib/"$python_folder_name"/site-packages/ray/rllib/evaluation/sampler.py # Insert action to uint8 conversion to save even more memory
+sed -i '164i\        return self.sess.run(self.variables)' venv/lib/"$python_folder_name"/site-packages/ray/experimental/tf_utils.py # Make initialization faster, as in https://github.com/ray-project/ray/pull/8491


can you clarify what this is for?

Added clarification in the file.

Oh I see. Does the code still run without this or is this just for a speedup?

It won't run without it, and it is for a speedup. The last line reduces initialization time by up to 60%, and training time by <20% as well (although I haven't tested the last part extensively, might just be a fluke).

The rest of the lines make observations 4 times smaller, because ray normally doesn't support this and casts everything to float32. Note that this replaces one hard-coding with another, which is an ugly hack - but fixing ray would require restructuring in many places, I feel like I'm out of my depth there.

I think this is a great idea, I just worry that this hard-codes us to a particular ray version in a very difficult way (for example, if the ray version changes even slightly the patch file is a haul to fix). I don't have any good ideas yet but if you have any thoughts on a way to make this more modular let me know.

I fully agree, it's an ugly hack. Any future-proof non-hacky solution would have to fix ray-project/ray#7946. That's the only way to guarantee future compatibility without extra work on each patch.

The last line has already been patched in ray 0.8.6, by yours truly.

eugenevinitsky · 2020-07-01T21:32:06Z

run_scripts/README.md

+To run scripts on AWS do the following:
+


You wrote these instructions! I never updated them, and from a cursory glance ray_autoscale.yaml is almost certainly broken. I never used AWS, nor ray_autoscale.yaml. The regular requirements.txt did change a lot while I never updated requirements_autoscale.txt.

I don't have access to an AWS cluster, so if you want to see this working I'm afraid you'll have to bring the relevant files up to speed.

Is there a reason you were using the developer version of rllib on autoscale? Why not install it through including ray[rllib] in requirements.txt?

Haha go me then! We were implementing some custom stuff so we needed the developer version. It'd be better to stick it in the requirements now.

social_dilemmas/envs/agent.py

eugenevinitsky · 2020-07-01T21:34:10Z

Wow, you've done a crazy amount of work. Going to tweet this out whenever it's merged so let me know if you have a handle I can tag.

If rollout_fragment_length is < episode length, the learning process becomes unstable around transitions between episodes.

Sort plot legend by label name. Cut off plotting at 5e8 steps.

Change latex table large numbers to scientific notation.

…olor.

…les.

Change plotting to save as svg instead of eps, because eps does not support transparency.

… throwing an exception.

Values smaller than the episode size (horizon in train.py, 1000 by default) leads to noisy learning.

Add PPO results, remove internetcoffeephone setup instructions, switch to eugenevinitsky (parent repo).

…sigma. Rename variables in plotting code. Print progress while plotting collective plots.

…dealing with a sample, not the full population. This is done by setting ddof to 1.

internetcoffeephone · 2020-09-25T14:01:20Z

Wow, you've done a crazy amount of work. Going to tweet this out whenever it's merged so let me know if you have a handle I can tag.

I don't have a twitter, if you could link to my personal website that would be great! Also, feel free to mention that I'm job-hunting for ML Engineer positions.

Individual experiment plots now use the same code as collective plots to determine model/env from the filepath.

internetcoffeephone added 30 commits December 6, 2019 21:33

Linting using black library (https://pypi.org/project/black/).

37291c4

Updated flake8 to abide by black line length.

01e3730

Linting to abide by flake8.

Fixed rollout.py to work with new config file/args.

7c40b83

Change test_envs action_space reference from agent to env.

d99baf4

Removed unused/undefined variables.

5a4f2c3

Updated memory experiment run script.

e339ac4

Merge remote-tracking branch 'origin/curiosity' into curiosity

3fef2de

Fixed the filename string of the file generated in test_rollout.py

ecca1da

Merge remote-tracking branch 'origin/curiosity' into curiosity

470f1ec

Removed deprecated key from travis.yml

c7a1c18

Did TODO: original dimension size calculation instead of hardcoded va…

74e67bf

…lues in curiosity_model.

Linting.

b51be1b

Made trainer name more clear in a3c_aux. No longer uses model name.

4f35423

Fixed agents being able to walk through switches or closed doors.

6824e10

Changed rgb_arr so that it no longer has to be re-initialized on ever…

e46a1be

…y environment step.

Added comment in plot_results.

ff252b3

Black linting.

f237f14

Added isort and pre-commit.

cc6cd57

Added pyproject.toml to configure Black.

cb621de

isort linting. Added isort.cfg

1b4e7fd

Updated pyproject.toml config file to allow line length 101.

b43316e

Accompanying the according black linting.

Added lr to default_args and training scripts for debugging purposes.

3453a43

The expectation is that this value is never used, provided that lr_curriculum values are defined.

Changed --use_gpu_for_driver to default to False.

a6c2f1c

The argument is pointless otherwise.

Increased default num_envs_per_worker. This is more efficient as it b…

e92e4d8

…enefits from vectorization.

Removed needless casts, enforced float32 across all parameters.

05c5d12

This prevents errors in eager execution. Renamed A3CTFPolicy to A3CAuxTFPolicy for clarity.

Added simple experiment runfile, used for local debugging.

8c48d93

Merged train_moa and train_curiosity scripts.

8d17e80

Moved fcnet_hiddens to model dict instead of model/custom_options dict for moa. Removed run_train_baseline and run_train_baseline_moa scripts. These will be re-added later. Added default args model and small_model.

Fixed a bug where conv_to_fcnet_v2.py would not use the hidden layers…

96f3602

… in the model config. Changed some comments to be more clear.

Fixed incorrect comments in curiosity_model.py

e075275

Locked pytz requirement version.

59b1c7d

eugenevinitsky reviewed Jul 1, 2020

View reviewed changes

social_dilemmas/envs/agent.py Show resolved Hide resolved

internetcoffeephone added 20 commits July 2, 2020 06:45

Clarify patch script with comments.

67e29a7

Change ray_autoscale.yaml default parameters.

26bfe33

Update run scripts for final experiments.

18b6f70

If rollout_fragment_length is < episode length, the learning process becomes unstable around transitions between episodes.

Change hparam tuning to tune minimal sets of hparams.

742af66

Improve plotting.

eba9bb5

Sort plot legend by label name. Cut off plotting at 5e8 steps.

Update run_baseline_* scripts to reflect final experiment settings.

e7eaeab

Change latex table generation to only output ssd experiments.

1c7a80c

Change latex table large numbers to scientific notation.

Change individual reward plot colors so that each model has its own c…

7abf8f6

…olor.

Fix hparam plot legend to show both mean and individual experiments.

8b7fde6

Change latex hparam table generation to correctly center overfull tab…

a748dd1

…les.

Add sliding window means/confidence intervals to plotting.

0106ef5

Change plotting to save as svg instead of eps, because eps does not support transparency.

Change plotting to print a warning and use a default color instead of…

d10f28e

… throwing an exception.

Add arg option to train using collective reward.

74a529e

Change rollout_fragment_length default value to 1000.

4629636

Values smaller than the episode size (horizon in train.py, 1000 by default) leads to noisy learning.

Remove small_model arg, no longer needed.

0ad2c28

Update README.md.

66ef439

Add PPO results, remove internetcoffeephone setup instructions, switch to eugenevinitsky (parent repo).

Print progress while plotting.

5cdd605

Change plotting to calculate 99.5% confidence interval rather than 2 …

9f43bcf

…sigma. Rename variables in plotting code. Print progress while plotting collective plots.

Change standard deviation calculation to take into account that it's …

45f0aae

…dealing with a sample, not the full population. This is done by setting ddof to 1.

Simplify plotting labels by removing CI from legend.

9c53fc3

internetcoffeephone and others added 2 commits September 27, 2020 17:06

Refactor plotting code.

4aa693f

Individual experiment plots now use the same code as collective plots to determine model/env from the filepath.

Merge branch 'master' into master

71324ab

eugenevinitsky approved these changes Mar 19, 2021

View reviewed changes

eugenevinitsky merged commit 6de7a79 into eugenevinitsky:master Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Social Curiosity Module implementation and MOA fixes #179

Social Curiosity Module implementation and MOA fixes #179

internetcoffeephone commented May 13, 2020

eugenevinitsky Jul 1, 2020

internetcoffeephone Jul 2, 2020

eugenevinitsky Jul 1, 2020

internetcoffeephone Jul 2, 2020

eugenevinitsky Jul 2, 2020

internetcoffeephone Jul 2, 2020 •

edited

Loading

eugenevinitsky Jul 2, 2020

internetcoffeephone Jul 2, 2020

eugenevinitsky Jul 1, 2020

internetcoffeephone Jul 2, 2020

eugenevinitsky Jul 2, 2020

eugenevinitsky commented Jul 1, 2020

internetcoffeephone commented Sep 25, 2020

Social Curiosity Module implementation and MOA fixes #179

Social Curiosity Module implementation and MOA fixes #179

Conversation

internetcoffeephone commented May 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

internetcoffeephone Jul 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eugenevinitsky commented Jul 1, 2020

internetcoffeephone commented Sep 25, 2020

internetcoffeephone Jul 2, 2020 •

edited

Loading