## Which hyperparameters to tune?

- Choosing wrappers was enough in the inventory management problem.
- In other problems, wrappers alone may not lead to success.

### Hyperparameter tuning

- Deep RL algorithms (like PPO) have hyperparameters

#### Hyperparameters in Deep Learning

- Network size of a fully connected network
- Learning rate
- Activation function
- Optimizer (e.g. vanilla SGD, RMSProp, ADAM etc.)

| Hyperparameter | `rllib` default value | `stable-baselines3` name | search space |
| --- | --- | --- | --- |
| **Model hyperparams** | | | |
| `fcnet_hiddens`    | `[256, 256]` | `net_arch` | `[[64, 64], [256, 256]]` |
| `fcnet_activation` | `"tanh"`   | `activation_fn` | `["tanh", "relu"]` |
| **SGD hyperparams** | | | |
| `train_batch_size` | `4000` | `n_steps` | `[8, 16, 32, 64, 128, 256, 512, 1024, 2048]` |
| `sgd_minibatch_size` | `128` | `batch_size` | `[8, 16, 32, 64, 128, 256, 512]` |
| `num_sgd_iter` | `30` | `n_epochs` | `[1, 5, 10, 20]` |
| `lr` | `5e-5` | `learning_rate` | `[1e-3, 5e-4, 1e-4, 5e-5, 1e-5]` |
| **Common RL hyperparams** | | | |
| `gamma` | `0.99` | `gamma` | `[0.9, 0.95, 0.98, 0.99, 0.995, 0.999, 0.9999]` |
| `exploration_config` $\rightarrow$ `type` | `"StochasticSampling"` | | |
| **Algorithm specific hyperparams** | | |
| `clip_param` | `0.3` | `clip_range` | `[0.1, 0.2, 0.3, 0.4]` |
| `kl_coeff` | `0.2` | | |
| `kl_target` | `0.01` | | |
| `lambda` | `1.0` | `gae_lambda` | `[0.8, 0.9, 0.92, 0.95, 0.98, 0.99, 1.0]` |
| `vf_loss_coeff` | `1.0` | `vf_coef` | |
| `vf_clip_param` | `10.0` | | |
| `entropy_coeff` | `0.0` | `ent_coef` | `loguniform(0.00000001, 0.1)` |

- default values may not be optimal for learning in a particular environment.
- hyperparameter tuning might give us a boost in performance.