Implementation of various Evolution Strategies, such as GA, PEPG, CMA-ES and OpenAI's ES using common interface.
CMA-ES is wrapping around pycma.
Using Evolution Strategies Library
To use es.py, please check out the
The basic concept is:
solver = EvolutionStrategy() while True: # ask the ES to give us a set of candidate solutions solutions = solver.ask() # create an array to hold the solutions. # solver.popsize = population size rewards = np.zeros(solver.popsize) # calculate the reward for each given solution # using your own evaluate() method for i in range(solver.popsize): rewards[i] = evaluate(solutions[i]) # give rewards back to ES solver.tell(rewards) # get best parameter, reward from ES reward_vector = solver.result() if reward_vector > MY_REQUIRED_REWARD: break
Parallel Processing Training with MPI
Please read Evolving Stable Strategies article for more demos and use cases.
To use the training tool (relies on MPI):
python train.py bullet_racecar -n 8 -t 4
will launch training jobs with 32 workers (using 8 MPI processes). the best model will be saved as a .json file in log/. This model should train in a few minutes on a 2014 MacBook Pro.
If you have more compute and have access to a 64-core CPU machine, I recommend:
python train.py name_of_environment -e 16 -n 64 -t 4
This will calculate fitness values based on an average of 16 random runs, on 256 workers (64 MPI processes x 4). In my experience this works reasonably well for most tasks inside
After training, to run pre-trained models:
python model.py bullet_ant log/name_of_your_json_file.json
bullet_ant pybullet environment. PEPG.
Another example: to run a minitaur duck model, run this locally:
python model.py bullet_minitaur_duck zoo/bullet_minitaur_duck.cma.256.json
Custom Minitaur Env.
In the .hist.json file, and on the screen output, we track the progress of training. The ordering of fields are:
- generation count
- time (seconds) taken so far
- average fitness
- worst fitness
- best fitness
- average standard deviation of params
- average timesteps taken
- max timesteps taken
plot_training_progress.ipynb in an IPython notebook, you can plot the traning logs for the
.hist.json files. For example, in the
Bullet Ant training progress.
You need to install mpi4py, pybullet, gym etc to use various environments. Also roboschool/Box2D for some of the OpenAI gym envs.
On Windows, it is easiest to install mpi4py as follows:
- Download and install mpi_x64.Msi from the HPC Pack 2012 MS-MPI Redistributable Package
- Install a recent Visual Studio version with C++ compiler
- Open a command prompt
git clone https://github.com/mpi4py/mpi4py cd mpi4py python setup.py install
Modify the train.py script and replace mpirun with mpiexec and -np with -n