Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Social Curiosity Module implementation and MOA fixes #179

Merged
merged 539 commits into from
Mar 19, 2021

Conversation

internetcoffeephone
Copy link
Contributor

@eugenevinitsky As discussed in our email conversation.

Highlights:

  • Optimizations/bugfixes in map_env
  • Move the MOA predictions to a single tensor evaluation rather than handling them in postprocessing
  • On the whole, experiments run a lot faster now
  • Upgrade to ray 0.8.4, tensorflow-gpu 2.1.0, other misc library upgrades
  • Fix tests
  • Add isort, pre-commit and black config files.
  • Add a simple environment where a single agent has to pull a set of switches, then walk through a door. This can be used for testing purposes.
  • The addition of my own model, the social curiosity module (often denoted in the code as scm): a combination of the social influence reward with Burda's intrinsic curiosity module.

Accompanying the according black linting.
The expectation is that this value is never used, provided that lr_curriculum values are defined.
The argument is pointless otherwise.
This prevents errors in eager execution.
Renamed A3CTFPolicy to A3CAuxTFPolicy for clarity.
Moved fcnet_hiddens to model dict instead of model/custom_options dict for moa.
Removed run_train_baseline and run_train_baseline_moa scripts. These will be re-added later.
Added default args model and small_model.
… in the model config.

Changed some comments to be more clear.
Comment on lines 8 to 14
min_workers: 14 #<NUM WORKERS IN CLUSTER>

# The maximum number of workers nodes to launch in addition to the head
# node. This takes precedence over min_workers.
max_workers: 14

initial_workers: 14
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should set these to 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 3? If so, done.

Comment on lines 1 to 12
# Find Python folder name so that this patch can run correctly on different versions of Python.
python_folder_name=$(ls venv/lib)

# Apply patches
sed -i '119s/tf.float32/tf.uint8/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/policy/dynamic_tf_policy.py # Hardcoded observation space to uint8.
sed -i '76s/np.float32/np.uint8/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/preprocessors.py # Same as above.
sed -i '231s/np.zeros(self.shape)/np.zeros(self.shape, dtype=self.observation_space.dtype)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/preprocessors.py # Change observation shape to what we actually provide
sed -i '214s/tf.int64/action_space.dtype/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/catalog.py # Change action shape to what we actually provide
sed -i '56s/tf.math.argmax(self.inputs, axis=1)/tf.math.argmax(self.inputs, axis=1, output_type=tf.int32)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/tf/tf_action_dist.py # Actions should not sample at int64, int32 is the lowest that multinomial takes
sed -i '84s/tf.multinomial(self.inputs, 1)/tf.multinomial(self.inputs, 1, output_dtype=tf.int32)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/tf/tf_action_dist.py # Same as above
sed -i '656i\ actions = np.array(actions, dtype=policy.action_space.dtype)' venv/lib/"$python_folder_name"/site-packages/ray/rllib/evaluation/sampler.py # Insert action to uint8 conversion to save even more memory
sed -i '164i\ return self.sess.run(self.variables)' venv/lib/"$python_folder_name"/site-packages/ray/experimental/tf_utils.py # Make initialization faster, as in https://github.com/ray-project/ray/pull/8491
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify what this is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarification in the file.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Does the code still run without this or is this just for a speedup?

Copy link
Contributor Author

@internetcoffeephone internetcoffeephone Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't run without it, and it is for a speedup. The last line reduces initialization time by up to 60%, and training time by <20% as well (although I haven't tested the last part extensively, might just be a fluke).

The rest of the lines make observations 4 times smaller, because ray normally doesn't support this and casts everything to float32. Note that this replaces one hard-coding with another, which is an ugly hack - but fixing ray would require restructuring in many places, I feel like I'm out of my depth there.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great idea, I just worry that this hard-codes us to a particular ray version in a very difficult way (for example, if the ray version changes even slightly the patch file is a haul to fix). I don't have any good ideas yet but if you have any thoughts on a way to make this more modular let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree, it's an ugly hack. Any future-proof non-hacky solution would have to fix ray-project/ray#7946. That's the only way to guarantee future compatibility without extra work on each patch.

The last line has already been patched in ray 0.8.6, by yours truly.

Comment on lines +1 to +2
To run scripts on AWS do the following:

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote these instructions! I never updated them, and from a cursory glance ray_autoscale.yaml is almost certainly broken. I never used AWS, nor ray_autoscale.yaml. The regular requirements.txt did change a lot while I never updated requirements_autoscale.txt.

I don't have access to an AWS cluster, so if you want to see this working I'm afraid you'll have to bring the relevant files up to speed.

Is there a reason you were using the developer version of rllib on autoscale? Why not install it through including ray[rllib] in requirements.txt?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha go me then! We were implementing some custom stuff so we needed the developer version. It'd be better to stick it in the requirements now.

@eugenevinitsky
Copy link
Owner

Wow, you've done a crazy amount of work. Going to tweet this out whenever it's merged so let me know if you have a handle I can tag.

If rollout_fragment_length is < episode length, the learning process becomes unstable around transitions between episodes.
Sort plot legend by label name. Cut off plotting at 5e8 steps.
Change latex table large numbers to scientific notation.
Change plotting to save as svg instead of eps, because eps does not support transparency.
Values smaller than the episode size (horizon in train.py, 1000 by default) leads to noisy learning.
Add PPO results, remove internetcoffeephone setup instructions, switch to eugenevinitsky (parent repo).
…sigma.

Rename variables in plotting code.
Print progress while plotting collective plots.
…dealing with a sample, not the full population.

This is done by setting ddof to 1.
@internetcoffeephone
Copy link
Contributor Author

Wow, you've done a crazy amount of work. Going to tweet this out whenever it's merged so let me know if you have a handle I can tag.

I don't have a twitter, if you could link to my personal website that would be great! Also, feel free to mention that I'm job-hunting for ML Engineer positions.

Individual experiment plots now use the same code as collective plots to determine model/env from the filepath.
@eugenevinitsky eugenevinitsky merged commit 6de7a79 into eugenevinitsky:master Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants