A toy demo based on the following paper:
Mania, H., Guy, A., & Recht, B. (2018). Simple random search provides a competitive approach to reinforcement learning. Retrieved from http://arxiv.org/abs/1803.07055
The basic random search in Mania et al. 2018 (see alg 1), which is basically the finite difference method.
Here's the learning curve on a 5x5 grid world, where the agent is trained to find the goal while avoiding the punishment.
Here's a sample path on the grid world.
- red dot: reward
- black dot: punishment