In the same way as we used data-driven optimization to tune the gains
In the first part a relatively simple neural network controller in PyTorch is built.
In order to implement and apply the Reinforce algorithm, the following steps are performed:
- Create a policy network that uses transfer learning
- Create an auxiliary function that selects control actions out of the distribution
- Create an auxilary function that runs multiple episodes per epoch
- Finally, put all the pieces together into a function that computes the Reinforce algorithm