Evolutionary Strategies in PyTorch
TABLE OF CONTENTS
Your system needs all the prerequisites for the minimal installation of OpenAI gym. These will differ by operating system, so please refer to the gym repository for detailed instructions for your build. You also need to install the PyTorch distribution of your choice. You can trigger CUDA ops by passing in
--cuda to the training examples.
pip install pytorch_es
You will find the strategy classes (one as of now) within
evolutionary_strategies/strategies. These classes are designed to be used with PyTorch models and take two parameters: a function to get a reward and a list of PyTorch Variables that correspond to parameter layers. This can be achieved in the following manner:
import copy from functools import partial from pytorch_es import EvolutionModule def get_reward(model, weights): """ This function runs your model and generates a reward """ cloned_model = copy.deepcopy(model) for i, param in enumerate(cloned_model.parameters()): try: param.data = weights[i] except: param.data = weights[i].data # run environment and return reward as an integer or float return 100 model = generate_pytorch_model() # EvolutionModule runs the population in a ThreadPool, so # if you need to inject other arguments, you can do that # using the partial tool partial_func = partial(get_reward, model=model) mother_parameters = list(model.parameters()) es = EvolutionModule( mother_parameters, partial_func, population_size=100, sigma=0.1, learning_rate=0.001, reward_goal=200, consecutive_goal_stopping=20, threadcount=10, cuda=cuda, render_test=True )
- parameters (list of PyTorch Variables)
- reward_function => float (runs episode and returns a reward)
- consecutive_goal_stopping=None (stops after n tests consecutively return rewards equal-to or greater-than goal)
- save_path=None (path to save weights at test times)
- print_step=10 (frequency with which to run test and save weights)
You can run the examples in the following manner:
python examples/cartpole/train_pytorch.py --weights_path cartpole_weights.p
Solved in 1200~ iterations: population=100, sigma=0.01, learning_rate=0.001.
Solved in 200 iterations: population=10, sigma=0.1, learning_rate=0.001.