Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Social Curiosity Module implementation and MOA fixes #179

Merged
merged 539 commits into from
Mar 19, 2021
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
539 commits
Select commit Hold shift + click to select a range
37291c4
Linting using black library (https://pypi.org/project/black/).
internetcoffeephone Dec 6, 2019
01e3730
Updated flake8 to abide by black line length.
internetcoffeephone Dec 6, 2019
7c40b83
Fixed rollout.py to work with new config file/args.
internetcoffeephone Dec 6, 2019
d99baf4
Change test_envs action_space reference from agent to env.
internetcoffeephone Dec 6, 2019
5a4f2c3
Removed unused/undefined variables.
internetcoffeephone Dec 6, 2019
e339ac4
Updated memory experiment run script.
internetcoffeephone Dec 6, 2019
3fef2de
Merge remote-tracking branch 'origin/curiosity' into curiosity
internetcoffeephone Dec 6, 2019
ecca1da
Fixed the filename string of the file generated in test_rollout.py
internetcoffeephone Dec 6, 2019
470f1ec
Merge remote-tracking branch 'origin/curiosity' into curiosity
internetcoffeephone Dec 6, 2019
c7a1c18
Removed deprecated key from travis.yml
internetcoffeephone Dec 6, 2019
74e67bf
Did TODO: original dimension size calculation instead of hardcoded va…
internetcoffeephone Dec 7, 2019
b51be1b
Linting.
internetcoffeephone Dec 7, 2019
4f35423
Made trainer name more clear in a3c_aux. No longer uses model name.
internetcoffeephone Dec 9, 2019
6824e10
Fixed agents being able to walk through switches or closed doors.
internetcoffeephone Dec 9, 2019
e46a1be
Changed rgb_arr so that it no longer has to be re-initialized on ever…
internetcoffeephone Dec 10, 2019
ff252b3
Added comment in plot_results.
internetcoffeephone Dec 10, 2019
f237f14
Black linting.
internetcoffeephone Dec 10, 2019
cc6cd57
Added isort and pre-commit.
internetcoffeephone Dec 10, 2019
cb621de
Added pyproject.toml to configure Black.
internetcoffeephone Dec 10, 2019
1b4e7fd
isort linting. Added isort.cfg
internetcoffeephone Dec 10, 2019
b43316e
Updated pyproject.toml config file to allow line length 101.
internetcoffeephone Dec 11, 2019
3453a43
Added lr to default_args and training scripts for debugging purposes.
internetcoffeephone Dec 11, 2019
a6c2f1c
Changed --use_gpu_for_driver to default to False.
internetcoffeephone Dec 12, 2019
e92e4d8
Increased default num_envs_per_worker. This is more efficient as it b…
internetcoffeephone Dec 12, 2019
05c5d12
Removed needless casts, enforced float32 across all parameters.
internetcoffeephone Dec 12, 2019
8c48d93
Added simple experiment runfile, used for local debugging.
internetcoffeephone Dec 16, 2019
8d17e80
Merged train_moa and train_curiosity scripts.
internetcoffeephone Dec 23, 2019
96f3602
Fixed a bug where conv_to_fcnet_v2.py would not use the hidden layers…
internetcoffeephone Dec 30, 2019
e075275
Fixed incorrect comments in curiosity_model.py
internetcoffeephone Jan 1, 2020
59b1c7d
Locked pytz requirement version.
internetcoffeephone Jan 1, 2020
275a861
Removed unused import.
internetcoffeephone Jan 1, 2020
176d8cf
Renamed conv_to_fcnet_v2.py to baseline_model.py, fixed it to work.
internetcoffeephone Jan 1, 2020
abc1dd8
Fixed A3C_aux curiosity trainer_name bug. It was not assigned in the …
internetcoffeephone Jan 1, 2020
9fb6475
Added support for A3C baseline training.
internetcoffeephone Jan 1, 2020
26c29ae
Added IMPALA baseline training support.
internetcoffeephone Jan 1, 2020
8b253ad
Updated simple_exp.py.
internetcoffeephone Jan 1, 2020
8f5cf3e
Added run_baseline_cleanup.sh training script.
internetcoffeephone Jan 1, 2020
d760797
Updated ray to 0.8.0, specified extra [rllib] dependencies.
internetcoffeephone Jan 2, 2020
aed2dde
Changed default sample/train batch size to 1000, default algorithm to…
internetcoffeephone Jan 2, 2020
2166a14
Updated run_baseline_cleanup.sh script to fix batch size/gpu issues.
internetcoffeephone Jan 2, 2020
a735579
Fixed float casting bug.
internetcoffeephone Jan 5, 2020
1a558a7
Updated moa_weight to behave consistently with aux_loss_weight.
internetcoffeephone Jan 5, 2020
6e77755
Implemented aux ppo.
internetcoffeephone Jan 5, 2020
96ad9dd
Renamed ppo_causal to ppo_aux.
internetcoffeephone Jan 5, 2020
fa4f7dc
Removed unused function/import that wasn't common, but specific to PPO.
internetcoffeephone Jan 6, 2020
db52757
Removed grid_search from parameters - these would cause deep copy to …
internetcoffeephone Jan 6, 2020
dec1cea
Simplified moa_model lines, functionality is equivalent.
internetcoffeephone Jan 8, 2020
5895d87
Renamed curriculum to schedule. These parameters were schedules all a…
internetcoffeephone Jan 10, 2020
4101799
Set default memory parameter to None, as ray can auto-detect this value.
internetcoffeephone Jan 10, 2020
346ae90
Fixed default memory parameter casting error.
internetcoffeephone Jan 10, 2020
d9f977c
Added run_baseline_switch.sh
internetcoffeephone Jan 10, 2020
43118db
Fixed typo.
internetcoffeephone Jan 10, 2020
933ab95
Changed num_cpus/num_gpus into more specific arguments.
internetcoffeephone Jan 12, 2020
9993235
Added ppo_sgd_minibatch_size as a config parameter.
internetcoffeephone Jan 12, 2020
3b1d240
Fixed wrongly named parameters in run scripts.
internetcoffeephone Jan 12, 2020
aeb8d59
Fixed gpus parameter to allow for fractional values.
internetcoffeephone Jan 12, 2020
b98e5ca
run_baseline_cleanup.sh updated to reflect optimal (speed-wise) test …
internetcoffeephone Jan 12, 2020
2eab6e5
Added uint8 preprocessor, and changed observations to uint8.
internetcoffeephone Jan 19, 2020
094be96
Fixed comment typo.
internetcoffeephone Jan 19, 2020
eb7f2de
Changed chars in map_env to actual chars, using 4x less memory.
internetcoffeephone Jan 20, 2020
eb904d0
Removed unused models. Moved KerasRNN to its own file: lstm.py.
internetcoffeephone Jan 20, 2020
cd50625
Converted forgotten strings to chars.
internetcoffeephone Jan 21, 2020
6653619
Combined rotation and map to color functions to prevent allocation of…
internetcoffeephone Jan 21, 2020
21fc9c2
Added forgotten converted char bytes in cleanup.py.
internetcoffeephone Jan 21, 2020
28d4169
Added rotation unit test for new combined rotation/map to color funct…
internetcoffeephone Jan 21, 2020
64153b7
Removed double opencv requirement, only one should be present at any …
internetcoffeephone Jan 21, 2020
46a0e9f
Attempt to fix test_rollout.py
internetcoffeephone Jan 21, 2020
1e0a6e2
Attempt #2 to fix test_rollout
internetcoffeephone Jan 21, 2020
9b44f56
Attempt #3 to fix test_rollout on travis.
internetcoffeephone Jan 21, 2020
ca1af2d
Fixed rollout.py to work with new env creation.
internetcoffeephone Jan 21, 2020
32bfd7d
Attempt number 4 to fix test_rollout.
internetcoffeephone Jan 21, 2020
eff1f7a
Upgrade to ray 0.8.3.
internetcoffeephone Mar 28, 2020
e2686c3
Update requirements to use newest versions of all libraries.
internetcoffeephone Mar 28, 2020
d3735ee
Remove libraries from requirements.txt that are already installed thr…
internetcoffeephone Mar 28, 2020
d3099e3
Remove replace_rnn_sequencing.py as the custom change has been merged…
internetcoffeephone Mar 30, 2020
b6d85da
Change sample_batch_size to rollout_fragment_length in run scripts, t…
internetcoffeephone Mar 30, 2020
410ab2b
Fix unrecognized webui_host.
internetcoffeephone Mar 30, 2020
40bf92f
Remove parameters incompatible with newest version of ray in ppo_aux.py.
internetcoffeephone Apr 2, 2020
242e18b
Change erroneous cast from uint8 to int32.
internetcoffeephone Apr 2, 2020
fc9a676
Refactor observation_space out of individual environments and into th…
internetcoffeephone Apr 8, 2020
136f72f
Removed custom preprocessor, moved uint8->float32 conversion to model…
internetcoffeephone Apr 10, 2020
ecf70df
Fix tests failing due to removed/added arguments in environment creat…
internetcoffeephone Apr 10, 2020
f0a46c2
Simplify casting logic, add names to layers for debugging, change str…
internetcoffeephone Apr 13, 2020
5141667
Fix moa to work with num_envs > 1.
internetcoffeephone Apr 13, 2020
447b1b0
Change padding order so that agents' own actions come first in counte…
internetcoffeephone Apr 13, 2020
f6abd4b
Input actions to moa as one hot rather than their absolute value. Sim…
internetcoffeephone Apr 14, 2020
3f6b8a5
Add ray patch script for incorrect hardcoded float32 values. Temporar…
internetcoffeephone Apr 14, 2020
ae3b604
Remove duplicate line.
internetcoffeephone Apr 14, 2020
d56bcbb
Refactor moa_model.py in preparation of scm.
internetcoffeephone Apr 14, 2020
9f08a95
Rewrite curiosity model into SCM, SCM is not functional yet. Refactor…
internetcoffeephone Apr 19, 2020
7e46737
Add run_baseline_harvest.sh.
internetcoffeephone Apr 19, 2020
cba6085
Fix wrong default_args.py commit.
internetcoffeephone Apr 19, 2020
59095c5
Set final baseline cleanup experiment parameters.
internetcoffeephone Apr 20, 2020
a7e6427
Add bounds check to agent action, and simplify bounds check expressio…
internetcoffeephone Apr 20, 2020
13caf24
Remove agent.get_pos() function, replace with direct property access.
internetcoffeephone Apr 20, 2020
89b9c69
Deduplicate code.
internetcoffeephone Apr 21, 2020
ac9bd34
Change switch tile representation from s,S to w,W.
internetcoffeephone Apr 21, 2020
6f88f5f
Prevent DEFAULT_COLORS from changing when using multiple envs.
internetcoffeephone Apr 21, 2020
1141fe3
Fix typos.
internetcoffeephone Apr 21, 2020
d15ade2
Move view_len initialization to map_env.
internetcoffeephone Apr 22, 2020
1189fbd
Separate the logical map from the color map.
internetcoffeephone Apr 22, 2020
2c3de24
Change world color map updating so that firing beams cover agents
internetcoffeephone Apr 22, 2020
ea3f552
Fix wrong beam comparison, it is now correctly compared to the tile b…
internetcoffeephone Apr 24, 2020
41fec1e
Optimize spawning of apples/waste in harvest and cleanup.
internetcoffeephone Apr 24, 2020
e2f40df
Add environment profiling scripts.
internetcoffeephone Apr 24, 2020
892967a
Update MOA run script parameters for use on remote machine.
internetcoffeephone Apr 27, 2020
80230da
Remove aux generalization, parameters now refer explicitly to MOA or …
internetcoffeephone Apr 27, 2020
d5c85ef
Fix hparams that were incorrectly renamed.
internetcoffeephone Apr 27, 2020
7493e93
Change plotting so that only reward plots have a fixed bottom of 0.
internetcoffeephone Apr 27, 2020
44181ba
Fix run scripts to incorporate the earlier aux -> moa/influence hpara…
internetcoffeephone Apr 27, 2020
43225d7
Add SCM hparams.
internetcoffeephone Apr 27, 2020
34d2b08
Change default args to use 8 num_envs_per_worker and default env to c…
internetcoffeephone Apr 27, 2020
8329bad
Remove superfluous NotImplementedError.
internetcoffeephone Apr 27, 2020
c4680f0
Remove superfluous parameter in scm_model.py.
internetcoffeephone Apr 27, 2020
7abdd06
Rename postprocessed_input to preprocessed_input in moa/scm models.
internetcoffeephone Apr 27, 2020
29290a2
Moved add_time_dimension to last possible moment for baseline_model.py.
internetcoffeephone Apr 27, 2020
7483394
Rename lstm.py to actor_critic_lstm.py to better reflect its contents.
internetcoffeephone Apr 28, 2020
dac0408
Refactor CNN/FC layer construction into their own methods.
internetcoffeephone Apr 28, 2020
a59b803
Refactor CNN/FC layer construction into common_layers.py, both baseli…
internetcoffeephone Apr 28, 2020
00a0ca3
"Hide" unused return value with _.
internetcoffeephone Apr 28, 2020
637aad8
Add several SCM hparams to train.py. Incomplete.
internetcoffeephone Apr 28, 2020
0b5e88d
SCM model building. Incomplete.
internetcoffeephone Apr 28, 2020
48dad39
Extra comments in moa_model.py to clarify model evaluation architecture.
internetcoffeephone Apr 30, 2020
85592d4
Change hparams of moa experiment run scripts. Reward schedule is unce…
internetcoffeephone Apr 30, 2020
712e3f8
Remove unused dictionary, simplify observation_space in map_env.py.
internetcoffeephone May 1, 2020
d068ae6
Change string occurrences of "total_influence_reward" with SOCIAL_INF…
internetcoffeephone May 1, 2020
daeb9a0
Clarified method description for _reshaped_one_hot_actions.
internetcoffeephone May 1, 2020
9397661
Fix broken plotting, add extrinsic_reward to metrics to be plotted.
internetcoffeephone May 1, 2020
dd551f7
Change MOA reward evaluation to be done in moa_model.py forward rathe…
internetcoffeephone May 4, 2020
3abbbd0
Rename trajectory to sample_batch in common_funcs_moa.py to enforce c…
internetcoffeephone May 5, 2020
b6077a0
Add comment to moa_model.py. Unpack unused elements with *_ instead o…
internetcoffeephone May 5, 2020
0d1c48d
Fix incorrect loss reporting so that weight is no longer applied twice.
internetcoffeephone May 5, 2020
3f2c6a2
Add model names, useful when debugging.
internetcoffeephone May 5, 2020
ebb589d
Update ray[rllib] requirement from 0.8.3 to 0.8.4.
internetcoffeephone May 5, 2020
1b35fb6
Make capitalization consistent in run_scripts.
internetcoffeephone May 5, 2020
27a61da
Change train.py so that local mode is automatically turned on upon de…
internetcoffeephone May 5, 2020
9a4e958
Fix comment.
internetcoffeephone May 5, 2020
8d1770e
Parameterize EXTRINSIC_REWARD dictionary key. Update a previously mis…
internetcoffeephone May 5, 2020
7d660cd
Complete SCM, it now runs without errors. Reshuffle functions to call…
internetcoffeephone May 5, 2020
51275dd
Change auto-local mode so that it no longer relies on __debug__ but o…
internetcoffeephone May 6, 2020
f5015c5
Rename provile.sh to profile_env.sh.
internetcoffeephone May 6, 2020
85d98aa
Add script for profiling train.py initialization.
internetcoffeephone May 6, 2020
6b90939
Remove unused run script parameter in run_baseline_switch.sh
internetcoffeephone May 8, 2020
2a18385
Clarify default_args.py comment.
internetcoffeephone May 8, 2020
d4e9680
Change --exp_name parameter to be None by default, change automatical…
internetcoffeephone May 8, 2020
c3c24c5
Remove commented out code.
internetcoffeephone May 8, 2020
23037e8
Add method description to batched_mse in scm_model.py
internetcoffeephone May 8, 2020
1fef555
Stop SCM backpropagation through MOA LSTM output in scm_model.py
internetcoffeephone May 8, 2020
00aa359
Add scm_forward_vs_inverse_loss_weight and moa/scm config validation.
internetcoffeephone May 8, 2020
d844684
Remove unused experiment.
internetcoffeephone May 8, 2020
37aa9e7
Remove two more unused experiments.
internetcoffeephone May 8, 2020
30aa230
Add SCM run scripts.
internetcoffeephone May 8, 2020
67cd736
Move cluster experiments to their own subdirectory in run_scripts.
internetcoffeephone May 8, 2020
526ef4a
Add code that generate LaTeX tables from run_scripts parameters.
internetcoffeephone May 11, 2020
9745a4f
Merge upstream changes.
internetcoffeephone May 11, 2020
6120f87
Merge curiosity into master.
internetcoffeephone May 11, 2020
aeea1b5
Save debug experiments in their own folder in ray_results.
internetcoffeephone May 12, 2020
ae53264
Move test_map to test_envs.py.
internetcoffeephone May 12, 2020
2ed5a46
Remove unused moa losses fetch, saving memory.
internetcoffeephone May 12, 2020
26abd73
Remove unused encoded observation scm losses fetch.
internetcoffeephone May 12, 2020
a49315c
Put __main__ code into a function, so that it can be profiled. Put un…
internetcoffeephone May 13, 2020
e482ead
Add DiscreteWithDType class so that map_env.action_space can have the…
internetcoffeephone May 13, 2020
d9b6cfe
Add to ray patch script to make it use correct sample dtypes for obse…
internetcoffeephone May 13, 2020
d3f1c29
Remove unused variable. rgb_arr is now a view object on the fully dra…
internetcoffeephone May 14, 2020
4621c00
Lint.
internetcoffeephone May 14, 2020
fc1b4e8
Disable ray dashboard/webui because it leaks memory.
internetcoffeephone May 14, 2020
4c1a315
Group default_args.py by type, fix --num_envs_per_worker dtype (from …
internetcoffeephone May 14, 2020
f626a6b
Update and expand setup/run instructions, fix broken link. Add credit…
internetcoffeephone May 18, 2020
94d78c6
Fix typo and link formatting.
internetcoffeephone May 18, 2020
f0fb009
Fix patch script typo.
internetcoffeephone May 18, 2020
18f3156
Change setup patch script to accommodate all versions of Python.
internetcoffeephone May 18, 2020
0b1a905
Escape spaces in ray patch script.
internetcoffeephone May 18, 2020
714d278
Add a fix for slow initialization speed to ray patch.
internetcoffeephone May 18, 2020
d497405
Change stop_at_episode_reward_min default value in default_args.py to…
internetcoffeephone May 19, 2020
7f41cfd
Refactor train.py to more cleanly separate the purpose of functions.
internetcoffeephone May 19, 2020
7055b3e
Modify the args train_batch_size and ppo_sgd_minibatch_size to be aut…
internetcoffeephone May 19, 2020
c94787a
Modify run scripts to use rollout_fragment_length=64 and num_envs_per…
internetcoffeephone May 19, 2020
8cdcb16
Remove ppo_sgd_minibatch_size from run script, as it is now automatic…
internetcoffeephone May 19, 2020
c3e1d9a
Add train_multiple_experiments.py, which can train run multiple diffe…
internetcoffeephone May 19, 2020
61d97e3
Fix incorrect stats fetch in ppo_scm.py.
internetcoffeephone May 20, 2020
600b700
Disambiguate mixin function so that calls from outside the class are …
internetcoffeephone May 20, 2020
2e03fc7
Add plottable stats to plot_results.py.
internetcoffeephone May 20, 2020
10be324
Fix bug where in case of automatic train_batch_size calculation, it w…
internetcoffeephone May 21, 2020
6644762
Change scm_forward_vs_inverse_loss_weight to value used by Burda et al.
internetcoffeephone May 21, 2020
bb57677
Revert erroneous moa_loss_weight change from https://github.com/inter…
internetcoffeephone May 21, 2020
cbaef1b
Add --resume arg to default_args, experiments can now be resumed.
internetcoffeephone May 21, 2020
261f954
Remove newlines from args while parsing args in train_multiple_experi…
internetcoffeephone May 21, 2020
5db37e2
Add docstring for update_nested_dict.
internetcoffeephone May 29, 2020
a4e2099
Upgrade ray from 0.8.4 to 0.8.5 to fix tensorflow device assignment e…
internetcoffeephone May 29, 2020
840864f
Replace tune hparam grid search with population-based training. Enabl…
internetcoffeephone Jun 1, 2020
fbb9a5c
Update run scripts for tuning.
internetcoffeephone Jun 2, 2020
530560e
Reduce checkpoint frequency, a checkpoint every 50 iterations is too …
internetcoffeephone Jun 2, 2020
7fe7d71
Add baseline experiments to train_multiple_experiments.py default val…
internetcoffeephone Jun 2, 2020
5be4cff
Enable unlimited retries on experiments that give errors.
internetcoffeephone Jun 4, 2020
3e99cf0
Re-add lr_schedule and influence_schedule to moa run scripts.
internetcoffeephone Jun 4, 2020
6f370cc
Change checkpoint frequency to 100, 500 was too few in practice.
internetcoffeephone Jun 4, 2020
153a498
Fix MOA loss reporting in a3c_moa.py by preventing moa_loss_weight fr…
internetcoffeephone Jun 9, 2020
b9d679d
Change PPO vf_loss_coeff hparam from 1e-4 to 0.5, which gives higher …
internetcoffeephone Jun 9, 2020
63f9074
Fix rotation view bug.
internetcoffeephone Jun 10, 2020
6d3910a
Fix bug where agent visibility was not calculated correctly.
internetcoffeephone Jun 10, 2020
ce7cf6c
Add test_agent_visibility to test_envs.py.
internetcoffeephone Jun 10, 2020
f6eda77
Change plot_results.py from using transparency to using a different l…
internetcoffeephone Jun 10, 2020
d9c082e
Set vf_share_layers to be True in the base PPO config dict. Previousl…
internetcoffeephone Jun 10, 2020
f0d1fa2
Fix plot_results.py unhandled exception error, remove try/except.
internetcoffeephone Jun 10, 2020
0fd8331
Remove old comment that is no longer applicable. Ray switched its pro…
internetcoffeephone Jun 10, 2020
694df1f
Add single env multiple model plotting. plot_results.py now draws the…
internetcoffeephone Jun 10, 2020
70e4495
Change plot_results.py filename outputs to include env and model wher…
internetcoffeephone Jun 10, 2020
cfc768c
Rename SOCIAL_INFLUENCE_REWARD parameter from total_influence_reward …
internetcoffeephone Jun 16, 2020
0314419
Removed ALL_ACTIONS from sample_batch. The only way in which it was s…
internetcoffeephone Jun 16, 2020
0275ec4
Fix influence reward off-by-one error.
internetcoffeephone Jun 16, 2020
aa29b7d
Rename variables in moa_model.py for clarity.
internetcoffeephone Jun 16, 2020
0bc5c7d
Fix bug where MOA would use wrong observation: it would use the obser…
internetcoffeephone Jun 16, 2020
227e3be
Fix bug where wrong agent visibility was used for calculating the inf…
internetcoffeephone Jun 23, 2020
766129c
Add default value for agent.prev_visible_agents.
internetcoffeephone Jun 23, 2020
33c00ab
Remove ternary operator, as this is ambiguous with arrays and throws …
internetcoffeephone Jun 23, 2020
6df722d
Fix off-by-one error in MOA loss calculation.
internetcoffeephone Jun 23, 2020
37ff551
Fix marginalized action probability calculation.
internetcoffeephone Jun 24, 2020
3f732f8
Clean up unused public variables in scm_model.py.
internetcoffeephone Jun 24, 2020
4045502
Refactor scm_model.py method so that it becomes static.
internetcoffeephone Jun 24, 2020
f2774e3
Add missing parameters, needed due to create_action_input_layer stati…
internetcoffeephone Jun 24, 2020
ef5ae78
Fix off-by-one error in curiosity reward. Make comments more clear in…
internetcoffeephone Jun 24, 2020
b7aa922
Change ppo_sgd_minibatch_size and PPO vf_loss_coeff to the values pre…
internetcoffeephone Jun 24, 2020
e978c73
Fix influence reward calculation.
internetcoffeephone Jun 25, 2020
860f886
Add a small mountain of docstrings. Some minor comment changes, and a…
internetcoffeephone Jun 25, 2020
93c3b96
Add color view checks to test_envs.py.
internetcoffeephone Jun 30, 2020
0526be1
Change plotting to only run if __name__ == "__main__".
internetcoffeephone Jun 30, 2020
851b4a7
Updated checkpoint rendering to work with new models/new version of ray.
internetcoffeephone Jun 30, 2020
6ad4926
Move video upscaling to correct place in rendering chain.
internetcoffeephone Jul 1, 2020
67e29a7
Clarify patch script with comments.
internetcoffeephone Jul 2, 2020
26bfe33
Change ray_autoscale.yaml default parameters.
internetcoffeephone Jul 2, 2020
18b6f70
Update run scripts for final experiments.
internetcoffeephone Jul 30, 2020
742af66
Change hparam tuning to tune minimal sets of hparams.
internetcoffeephone Jul 30, 2020
eba9bb5
Improve plotting.
internetcoffeephone Aug 13, 2020
e7eaeab
Update run_baseline_* scripts to reflect final experiment settings.
internetcoffeephone Aug 14, 2020
1c7a80c
Change latex table generation to only output ssd experiments.
internetcoffeephone Aug 18, 2020
7abf8f6
Change individual reward plot colors so that each model has its own c…
internetcoffeephone Aug 18, 2020
8b7fde6
Fix hparam plot legend to show both mean and individual experiments.
internetcoffeephone Aug 18, 2020
a748dd1
Change latex hparam table generation to correctly center overfull tab…
internetcoffeephone Aug 18, 2020
0106ef5
Add sliding window means/confidence intervals to plotting.
internetcoffeephone Sep 2, 2020
d10f28e
Change plotting to print a warning and use a default color instead of…
internetcoffeephone Sep 3, 2020
74a529e
Add arg option to train using collective reward.
internetcoffeephone Sep 3, 2020
4629636
Change rollout_fragment_length default value to 1000.
internetcoffeephone Sep 3, 2020
0ad2c28
Remove small_model arg, no longer needed.
internetcoffeephone Sep 3, 2020
66ef439
Update README.md.
internetcoffeephone Sep 4, 2020
5cdd605
Print progress while plotting.
internetcoffeephone Sep 9, 2020
9f43bcf
Change plotting to calculate 99.5% confidence interval rather than 2 …
internetcoffeephone Sep 10, 2020
45f0aae
Change standard deviation calculation to take into account that it's …
internetcoffeephone Sep 11, 2020
9c53fc3
Simplify plotting labels by removing CI from legend.
internetcoffeephone Sep 24, 2020
4aa693f
Refactor plotting code.
internetcoffeephone Sep 27, 2020
71324ab
Merge branch 'master' into master
eugenevinitsky Nov 19, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
[flake8]
max-line-length = 101
extend-ignore = E203 # See https://github.com/PyCQA/pycodestyle/issues/373
6 changes: 6 additions & 0 deletions .isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[settings]
line_length = 101
multi_line_output = 3
include_trailing_comma = True
known_third_party = cv2,gym,matplotlib,numpy,pandas,pytz,ray,setuptools
use_parentheses=True
18 changes: 18 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
repos:
- repo: https://github.com/asottile/seed-isort-config
rev: v1.9.3
hooks:
- id: seed-isort-config
- repo: https://github.com/pre-commit/mirrors-isort
rev: v4.3.21
hooks:
- id: isort
- repo: https://github.com/ambv/black
rev: stable
hooks:
- id: black
language_version: python3.6
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: flake8
25 changes: 3 additions & 22 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,16 @@ language: python
cache: pip

python:
- "3.5"
- "3.6.8"

os: linux

dist: trusty

sudo: required

before_install:
- sudo apt-get update
# Setup conda (needed for opencv, ray dependency)
# WARNING: enforces py3.5
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
- bash miniconda.sh -b -p $HOME/miniconda
- export PATH="$HOME/miniconda/bin:$PATH"
- hash -r
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- conda info -a
- python -V

# Set up requirements for running tests
- conda env create -f environment.yml
- source activate causal
dist: bionic

install:
- pip install flake8 .
- pip install pytest
- pip install -r requirements.txt

before_script:
- flake8 --version
Expand All @@ -39,4 +21,3 @@ before_script:
script:
- python setup.py install
- python -m pytest

41 changes: 32 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Sequential Social Dilemma Games
This repo is an open-source implementation of DeepMind's Sequential Social Dilemma (SSD) multi-agent game-theoretic environments [[1]](https://arxiv.org/abs/1702.03037). SSDs can be thought of as analogous to spatially and temporally extended Prisoner's Dilemma-like games. The reward structure poses a dilemma because individual short-term optimal strategies lead to poor long-term outcomes for the group.

The implemented environments are structured to be compatible with OpenAIs gym environments (https://github.com/openai/gym) as well as RLlib's Multiagent Environment (https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/multi_agent_env.py)
The implemented environments are structured to be compatible with [OpenAIs gym environments](https://github.com/openai/gym) as well as [RLlib's Multiagent Environment](https://github.com/ray-project/ray/blob/master/rllib/env/multi_agent_env.pyhttps://github.com/ray-project/ray/blob/master/python/ray/rllib/env/multi_agent_env.py)

## Implemented Games

Expand All @@ -29,15 +29,38 @@ The above plot shows the empirical Schelling diagrams for both Cleanup (A) and H


# Setup instructions
* Create `causal` virtual environment: `conda env create -n causal environment.yml`
* Run `python setup.py develop`
* Activate your environment by running `source activate causal`, or `conda activate causal`.
```
git clone -b master https://github.com/internetcoffeephone/sequential_social_dilemma_games
cd sequential_social_dilemma_games
python3 -m venv venv # Create a Python virtual environment
. venv/bin/activate
pip3 install --upgrade pip setuptools wheel
python3 setup.py develop
pip3 install -r requirements.txt
. ray_uint8_patch.sh # Ray patch due to https://github.com/ray-project/ray/issues/7946
cd run_scripts
```

After the setup, you can run experiments like so:
- To train with default parameters (baseline model cleanup with 2 agents):
`python3 train.py`

- To train the MOA with 5 agents:
`python3 train.py --model moa --num_agents 5`

Many more options are available which can be found in [default_args.py](config/default_args.py). A collection preconfigured training scripts can be found in [run_scripts](run_scripts).

Note that the initialization time is rather high (up to 12 minutes) the more agents you use, possibly due to a [Ray bug](https://github.com/ray-project/ray/issues/5982#issuecomment-629217172).

# CUDA, cuDNN and tensorflow-gpu

To then set up the branch of Ray on which we have built the causal influence code, clone the repo to your desired folder:
`git clone https://github.com/natashamjaques/ray.git`.
If you run into any cuda errors, make sure you've got a [compatible set](https://www.tensorflow.org/install/source#tested_build_configurations) of cuda/cudnn/tensorflow versions installed. However, beware of the following:
>The compatibility table given in the tensorflow site does not contain specific minor versions for cuda and cuDNN. However, if the specific versions are not met, there will be an error when you try to use tensorflow. [source](https://stackoverflow.com/a/53727997)

Next, go to the rllib folder:
` cd ray/python/ray/rllib ` and run the script `python setup-rllib-dev.py`. This will copy the rllib folder into the pip install of Ray and allow you to use the version of RLlib that is in your local folder by creating a softlink.
A configuration that works for me is:
- CUDA 10.1.105
- cuDNN 7.6.5
- tensorflow-gpu 2.1.0 (this is automatically installed during with the above script, see [requirements.txt](requirements.txt))

# Tests
Tests are located in the test folder and can be run individually or run by running `python -m pytest`. Many of the less obviously defined rules for the games can be understood by reading the tests, each of which outline some aspect of the game.
Expand Down Expand Up @@ -65,4 +88,4 @@ Every environment that subclasses MapEnv probably needs to implement the followi

# Contributors

This code base was developed by Eugene Vinitsky and Natasha Jaques; help with reproduction was provided by Joel Leibo, Antonio Castenada, and Edward Hughes.
This code base was developed by Eugene Vinitsky and Natasha Jaques; help with reproduction was provided by Joel Leibo, Antonio Castenada, and Edward Hughes. Additional development was done by Hugo Heemskerk.
Empty file added algorithms/__init__.py
Empty file.
17 changes: 17 additions & 0 deletions algorithms/a3c_baseline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from __future__ import absolute_import, division, print_function

from ray.rllib.agents.a3c.a3c import get_policy_class, make_async_optimizer, validate_config
from ray.rllib.agents.a3c.a3c_tf_policy import A3CTFPolicy
from ray.rllib.agents.trainer_template import build_trainer


def build_a3c_baseline_trainer(config):
a3c_trainer = build_trainer(
name="A3C",
default_config=config,
default_policy=A3CTFPolicy,
get_policy_class=get_policy_class,
validate_config=validate_config,
make_policy_optimizer=make_async_optimizer,
)
return a3c_trainer
164 changes: 164 additions & 0 deletions algorithms/a3c_moa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
"""Note: Keep in sync with changes to VTraceTFPolicy."""

from __future__ import absolute_import, division, print_function

from ray.rllib.agents.a3c.a3c import validate_config
from ray.rllib.agents.a3c.a3c_tf_policy import postprocess_advantages
from ray.rllib.agents.trainer_template import build_trainer
from ray.rllib.evaluation.postprocessing import Postprocessing
from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.policy.tf_policy import LearningRateSchedule
from ray.rllib.policy.tf_policy_template import build_tf_policy
from ray.rllib.utils import try_import_tf
from ray.rllib.utils.explained_variance import explained_variance
from ray.rllib.utils.tf_ops import make_tf_callable

from algorithms.common_funcs_moa import (
EXTRINSIC_REWARD,
SOCIAL_INFLUENCE_REWARD,
get_moa_mixins,
moa_fetches,
moa_postprocess_trajectory,
setup_moa_loss,
setup_moa_mixins,
)

tf = try_import_tf()


class A3CLoss(object):
def __init__(
self, action_dist, actions, advantages, v_target, vf, vf_loss_coeff=0.5, entropy_coeff=0.01,
):
log_prob = action_dist.logp(actions)

# The "policy gradients" loss
self.pi_loss = -tf.reduce_sum(log_prob * advantages)

delta = vf - v_target
self.vf_loss = 0.5 * tf.reduce_sum(tf.square(delta))
self.entropy = tf.reduce_sum(action_dist.entropy())
self.total_loss = self.pi_loss + self.vf_loss * vf_loss_coeff - self.entropy * entropy_coeff


def postprocess_a3c_moa(policy, sample_batch, other_agent_batches=None, episode=None):
"""Adds the policy logits, VF preds, and advantages to the trajectory."""

batch = moa_postprocess_trajectory(policy, sample_batch)
batch = postprocess_advantages(policy, batch)
return batch


def actor_critic_loss(policy, model, dist_class, train_batch):
logits, _ = model.from_batch(train_batch)
action_dist = dist_class(logits, model)
policy.loss = A3CLoss(
action_dist,
train_batch[SampleBatch.ACTIONS],
train_batch[Postprocessing.ADVANTAGES],
train_batch[Postprocessing.VALUE_TARGETS],
model.value_function(),
policy.config["vf_loss_coeff"],
policy.config["entropy_coeff"],
)

moa_loss = setup_moa_loss(logits, policy, train_batch)
policy.loss.total_loss += moa_loss.total_loss

# store this for future statistics
policy.moa_loss = moa_loss.total_loss

return policy.loss.total_loss


def add_value_function_fetch(policy):
fetch = {SampleBatch.VF_PREDS: policy.model.value_function()}
fetch.update(moa_fetches(policy))
return fetch


class ValueNetworkMixin(object):
def __init__(self):
@make_tf_callable(self.get_session())
def value(ob, prev_action, prev_reward, *state):
model_out, _ = self.model(
{
SampleBatch.CUR_OBS: tf.convert_to_tensor([ob]),
SampleBatch.PREV_ACTIONS: tf.convert_to_tensor([prev_action]),
SampleBatch.PREV_REWARDS: tf.convert_to_tensor([prev_reward]),
"is_training": tf.convert_to_tensor(False),
},
[tf.convert_to_tensor([s]) for s in state],
tf.convert_to_tensor([1]),
)
return self.model.value_function()[0]

self._value = value


def stats(policy, train_batch):
base_stats = {
"cur_lr": policy.cur_lr,
"policy_loss": policy.loss.pi_loss,
"policy_entropy": policy.loss.entropy,
"var_gnorm": tf.global_norm([x for x in policy.model.trainable_variables()]),
"vf_loss": policy.loss.vf_loss,
"cur_influence_reward_weight": tf.cast(
policy.cur_influence_reward_weight_tensor, tf.float32
),
SOCIAL_INFLUENCE_REWARD: train_batch[SOCIAL_INFLUENCE_REWARD],
EXTRINSIC_REWARD: train_batch[EXTRINSIC_REWARD],
"moa_loss": policy.moa_loss,
}
return base_stats


def grad_stats(policy, train_batch, grads):
return {
"grad_gnorm": tf.global_norm(grads),
"vf_explained_var": explained_variance(
train_batch[Postprocessing.VALUE_TARGETS], policy.model.value_function()
),
}


def clip_gradients(policy, optimizer, loss):
grads_and_vars = optimizer.compute_gradients(loss, policy.model.trainable_variables())
grads = [g for (g, v) in grads_and_vars]
grads, _ = tf.clip_by_global_norm(grads, policy.config["grad_clip"])
clipped_grads = list(zip(grads, policy.model.trainable_variables()))
return clipped_grads


def setup_mixins(policy, obs_space, action_space, config):
ValueNetworkMixin.__init__(policy)
LearningRateSchedule.__init__(policy, config["lr"], config["lr_schedule"])
setup_moa_mixins(policy, obs_space, action_space, config)


def build_a3c_moa_trainer(moa_config):
tf.keras.backend.set_floatx("float32")
trainer_name = "MOAA3CTrainer"
moa_config["use_gae"] = False

a3c_tf_policy = build_tf_policy(
name="A3CAuxTFPolicy",
get_default_config=lambda: moa_config,
loss_fn=actor_critic_loss,
stats_fn=stats,
grad_stats_fn=grad_stats,
gradients_fn=clip_gradients,
postprocess_fn=postprocess_a3c_moa,
extra_action_fetches_fn=add_value_function_fetch,
before_loss_init=setup_mixins,
mixins=[ValueNetworkMixin, LearningRateSchedule] + get_moa_mixins(),
)

trainer = build_trainer(
name=trainer_name,
default_policy=a3c_tf_policy,
default_config=moa_config,
validate_config=validate_config,
)

return trainer
13 changes: 13 additions & 0 deletions algorithms/common_funcs_baseline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
class BaselineResetConfigMixin(object):
@staticmethod
def reset_policies(policies, new_config):
for policy in policies:
policy.entropy_coeff_schedule.value = lambda _: new_config["entropy_coeff"]
policy.config["entropy_coeff"] = new_config["entropy_coeff"]
policy.lr_schedule.value = lambda _: new_config["lr"]
policy.config["lr"] = new_config["lr"]

def reset_config(self, new_config):
self.reset_policies(self.optimizer.policies.values(), new_config)
self.config = new_config
return True
Loading