Python implementation of policy search and model training using $\alpha$-divergence minimization for Bayesian neural networks with latent variables. See:
Depeweg, Stefan, et al. "Learning and policy search in stochastic dynamical systems with bayesian neural networks." arXiv preprint arXiv:1605.07127 (2016).
Requires the standard libraries for theano-based models and Lasagne (I use 0.2.dev)
Insert industrialbenchmark_python in environment/:
Download python version of industrialbenchmark.
Move to environment/industrialbenchmark
Generate batch of state transitions:
cd environment/ python make_data.py
will generate a training and test set stored in environment/out
cd experiments/ python train_model.py 0.5
Will train a BNN using bb-alpha with alpha=0.5
After training model will be stored in experiments/models as pickle file
Code will run on GPU/CPU. Parameters are chosen conservatively for GPU use. Consider decreasing sample size to 25 for CPU use.
Expected training time (i5-6600K CPU @ 4.0GHz, GTX 1060): CPU:
50 samples: 21.5 hours
25 samples: 10.5 hours
50 samples: 3.5 hours
25 samples: 2.0 hours
cd experiments/ python train_controller.py 0.5
Will train a policy using model from step 2 (required a model exists in models/)
After training the policy will be stored in experiments/controller as pickle file.
Code will run on CPU. For GPU use one should pass only indexes to train_func using givens. In our experiments no speedup was obtaiend from GPU use.
An example policy evaluation script is given in environment/eval_pol.py
cd environment/ ipython from eval_pol import evaluate results = evalute('../experiments/controller/AD_1.0.p')
Some helpful tips:
will sample n_samples from q(W) and resample the input noise.
For prediction use:
m,v = model.predict(np.tile(X,[n_samples,1,1]))
where X is n x d
m is n_samples x n x d
v is n_samples x n x d (constant output noise variance)