# Scripts for Training

In this notebook, we show the main scripts used to train agents employed in the experiments.
- Use the flag `-t` to set a tag for the trained model. Default is 'no_tag'. For the LunarLander task, two special key words, `plus4` and `shapedVVHA`, should be used to properly use the `eval_lunarT.py` script, as observed ahead. 
- Use the flag `--seed` to set a seed for the experiment. Otherwise, it will default to '7'.
- If model weights folder is not empty, the script will not override and an error will be returned. Remove the folder before.

## 1. LunarLander with Bomb

**a) Train the baseline agent (original environment without bomb)**
- Four dummy states are included in order to be able to retrain in the environment wiht bomb:

`python3 train1_lunarT_SAC_baseline_4a.py -t mytag_plus4`

**Obs**: you should include `plus4` in the run tag `-t` to indicate that four dummy states are added, and properly evaluate the model with `eval_lunarT.py`. 

**b) Retrain the SAC agent in the environment with bomb**

- No shaping (default):

`python3 train2_lunarTB_SAC_4c_4d.py -t mytag`

- Circle shaping:

`python3 train2_lunarTB_SAC_4c_4d.py -t mytag --shaping circle`

- Conservative shaping:

`python3 train2_lunarTB_SAC_4c_4d.py -t mytag --shaping VVHA`


**c) Retrain the SAC-I agent in the environment with bomb**

- No shaping + inhibitory policy network (selector):

`python3 train4s_lunarTB_SACI_selector.py -t mytag`

- Circle shaping:

`python3 train3_lunarTB_SACI_shapedC_5bcd.py -t mytag`

- Conservative shaping (VVHA):

`python3 train4_lunarTB_SACI_shapedVVHA_5e.py -t mytag_shapedVVHA`

**Obs**: you should include `shapedVVHA` in the run tag `-t` to properly evaluate the model with `eval_lunarT.py`. This is because in the SAC-I agent with VVHA shaping, state is only updated in the inhibitory states.

**d) Training SAC and SAC-I agents with different bomb frequencies (conflict levels)**
- Use the flag `--stopPct` to specify the stop frequency:

`python3 train2_lunarTB_SAC_4c_4d.py -t mytag --shaping VVHA --stopPct 0.75`

`python3 train4_lunarTB_SACI_shapedVVHA_5e.py -t mytag_shapedVVHA --stopPct 0.75`


## 2. Mixed version of BipedalWalkerHardcore-v3

**a) Retrain SAC agent from baseline**

- Train the agent in the original BipedalWalkerHardcore-v3

`python3 train5_bipedaMix_SAC_shapedR_3a.py -t mytag --stop_pct 1.0`

- Train the agent in a mixed version of BipedalWalkerHardcore-v3 with 90% hardcore

`python3 train5_bipedaMix_SAC_shapedR_3a.py -t mytag --stop_pct 0.9`

**b) Train SAC agent from scratch**

- Train the agent in a mixed version of BipedalWalkerHardcore-v3 with 90% hardcore

`python3 train5_bipedaMix_SAC_shapedR_3a.py -t mytag --stop_pct 0.9 --from_scratch`

**c) Retrain SAC-I agent from baseline**

- Train the agent in a mixed version of BipedalWalkerHardcore-v3 with 90% hardcore

`python3 train6_bipedalMix_SACI_shapedR.py -t mytag --stop_pct 0.9 --no_bonus`

**d) Retrain SAC-I agent from baseline + Inhibitory policy network (adaptive)**

- Train the agent in the original BipedalWalkerHardcore-v3

`python3 train7_bipedalMix_SACI_adaptive.py -t mytag --stop_pct 1.0 --no_bonus`

- Train the agent in a mixed version of BipedalWalkerHardcore-v3 with 90% hardcore

`python3 train7_bipedalMix_SACI_adaptive.py -t mytag --stop_pct 0.9 --no_bonus`


### (EXTRA)

**Train SAC-I with additional 'bonus' shaping rewarding getting out of stuck position. The SAC-I agent benefits from this extra shaping, but not SAC. The SAC learns to get rewards from stuck positions. Check the Evaluation script to look at this SAC agent.**


**c) Retrain SAC-I agent from baseline (include bonus reward by default)**

- Train the agent in the original BipedalWalkerHardcore-v3

`python3 train6_bipedalMix_SACI_shapedR.py -t mytag --stop_pct 1.0`

- Train the agent in a mixed version of BipedalWalkerHardcore-v3 with 90% hardcore

`python3 train6_bipedalMix_SACI_shapedR.py -t mytag --stop_pct 0.9`


**e) Train SAC-I agent from scratch (include bonus reward by default)**

- Train the agent in a mixed version of BipedalWalkerHardcore-v3 with 90% hardcore

`python3 train6_bipedalMix_SACI_shapedR.py -t mytag --stop_pct 0.9 --from_scratch`