-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Social Curiosity Module implementation and MOA fixes #179
Social Curiosity Module implementation and MOA fixes #179
Conversation
Linting to abide by flake8.
…lues in curiosity_model.
…y environment step.
Accompanying the according black linting.
The expectation is that this value is never used, provided that lr_curriculum values are defined.
The argument is pointless otherwise.
…enefits from vectorization.
This prevents errors in eager execution. Renamed A3CTFPolicy to A3CAuxTFPolicy for clarity.
Moved fcnet_hiddens to model dict instead of model/custom_options dict for moa. Removed run_train_baseline and run_train_baseline_moa scripts. These will be re-added later. Added default args model and small_model.
… in the model config. Changed some comments to be more clear.
ray_autoscale.yaml
Outdated
min_workers: 14 #<NUM WORKERS IN CLUSTER> | ||
|
||
# The maximum number of workers nodes to launch in addition to the head | ||
# node. This takes precedence over min_workers. | ||
max_workers: 14 | ||
|
||
initial_workers: 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should set these to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 3? If so, done.
ray_uint8_patch.sh
Outdated
# Find Python folder name so that this patch can run correctly on different versions of Python. | ||
python_folder_name=$(ls venv/lib) | ||
|
||
# Apply patches | ||
sed -i '119s/tf.float32/tf.uint8/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/policy/dynamic_tf_policy.py # Hardcoded observation space to uint8. | ||
sed -i '76s/np.float32/np.uint8/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/preprocessors.py # Same as above. | ||
sed -i '231s/np.zeros(self.shape)/np.zeros(self.shape, dtype=self.observation_space.dtype)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/preprocessors.py # Change observation shape to what we actually provide | ||
sed -i '214s/tf.int64/action_space.dtype/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/catalog.py # Change action shape to what we actually provide | ||
sed -i '56s/tf.math.argmax(self.inputs, axis=1)/tf.math.argmax(self.inputs, axis=1, output_type=tf.int32)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/tf/tf_action_dist.py # Actions should not sample at int64, int32 is the lowest that multinomial takes | ||
sed -i '84s/tf.multinomial(self.inputs, 1)/tf.multinomial(self.inputs, 1, output_dtype=tf.int32)/' venv/lib/"$python_folder_name"/site-packages/ray/rllib/models/tf/tf_action_dist.py # Same as above | ||
sed -i '656i\ actions = np.array(actions, dtype=policy.action_space.dtype)' venv/lib/"$python_folder_name"/site-packages/ray/rllib/evaluation/sampler.py # Insert action to uint8 conversion to save even more memory | ||
sed -i '164i\ return self.sess.run(self.variables)' venv/lib/"$python_folder_name"/site-packages/ray/experimental/tf_utils.py # Make initialization faster, as in https://github.com/ray-project/ray/pull/8491 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify what this is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added clarification in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Does the code still run without this or is this just for a speedup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't run without it, and it is for a speedup. The last line reduces initialization time by up to 60%, and training time by <20% as well (although I haven't tested the last part extensively, might just be a fluke).
The rest of the lines make observations 4 times smaller, because ray normally doesn't support this and casts everything to float32. Note that this replaces one hard-coding with another, which is an ugly hack - but fixing ray would require restructuring in many places, I feel like I'm out of my depth there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea, I just worry that this hard-codes us to a particular ray version in a very difficult way (for example, if the ray version changes even slightly the patch file is a haul to fix). I don't have any good ideas yet but if you have any thoughts on a way to make this more modular let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree, it's an ugly hack. Any future-proof non-hacky solution would have to fix ray-project/ray#7946. That's the only way to guarantee future compatibility without extra work on each patch.
The last line has already been patched in ray 0.8.6, by yours truly.
To run scripts on AWS do the following: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wrote these instructions! I never updated them, and from a cursory glance ray_autoscale.yaml
is almost certainly broken. I never used AWS, nor ray_autoscale.yaml. The regular requirements.txt
did change a lot while I never updated requirements_autoscale.txt
.
I don't have access to an AWS cluster, so if you want to see this working I'm afraid you'll have to bring the relevant files up to speed.
Is there a reason you were using the developer version of rllib on autoscale? Why not install it through including ray[rllib] in requirements.txt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha go me then! We were implementing some custom stuff so we needed the developer version. It'd be better to stick it in the requirements now.
Wow, you've done a crazy amount of work. Going to tweet this out whenever it's merged so let me know if you have a handle I can tag. |
If rollout_fragment_length is < episode length, the learning process becomes unstable around transitions between episodes.
Sort plot legend by label name. Cut off plotting at 5e8 steps.
Change latex table large numbers to scientific notation.
Change plotting to save as svg instead of eps, because eps does not support transparency.
… throwing an exception.
Values smaller than the episode size (horizon in train.py, 1000 by default) leads to noisy learning.
Add PPO results, remove internetcoffeephone setup instructions, switch to eugenevinitsky (parent repo).
…sigma. Rename variables in plotting code. Print progress while plotting collective plots.
…dealing with a sample, not the full population. This is done by setting ddof to 1.
I don't have a twitter, if you could link to my personal website that would be great! Also, feel free to mention that I'm job-hunting for ML Engineer positions. |
Individual experiment plots now use the same code as collective plots to determine model/env from the filepath.
@eugenevinitsky As discussed in our email conversation.
Highlights: