Added 3 more gym environment examples. Small changes to pilco.py, mgp… #23

kyr-pol · 2019-03-08T21:05:39Z

…r.py and additions rewards.py, expained further in the pull request.

Added 3 extra tasks:

a pendulum swing up
a double inverted pendulum stabilisation (mujoco)
a swimmer robot (mujoco)
Each task is solved in a separate file.

For the swing up task, I modified the gym environment's initial conditions, setting the pendulum in the bottom position without velocity. PILCO in general needs a specific starting state to successfully plan from.

For the double pendulum task, a wrapper is used that terminates the episode when the pendulum reaches the limits of its state space, since this creates non-smooth behaviour that is hard for PILCO to model. Additionally, angles in rads are calculated from the sin and cos representation, reducing the state space dimensions (think of this as a much simpler version of the state augmentation the original PILCO uses).

For the swimmer, a wrapper is also used, that augments the state space by one state, that is actually the accumulated reward. In the original gym version the reward function is using a hidden state, which violates PILCO assumptions. Still no hidden information is accessed by PILCO, just the formulation is made compatible with its assumptions. Furthermore, I added a composite reward function, that includes penalties for putting the robot's joints to their limits (in terms of angles), again in order to maintain smooth behaviour that is easy for the GP to model.

On another note, I fixed the noise in some of the runs which helps conditioning, and I also added a pretty uninformative prior on the lengthscales and variances, just to penalise extreme values that otherwise occur in the higher dimensional tasks (this is something the original PILCO does too).

…r.py and additions rewards.py, expained further in the pull request

kyr-pol · 2019-03-08T21:08:43Z

I think we should add an option in the PILCO constructor for priors, because they have to be defined before the model is compiled (afaik), and for the moment I have hard coded them (they are general enough that they probably help with all environments, but still not best practice).

codecov-io · 2019-03-08T21:09:38Z

Codecov Report

Merging #23 into master will decrease coverage by 4.81%.
The diff coverage is 47.22%.

@@            Coverage Diff            @@
##           master     #23      +/-   ##
=========================================
- Coverage   95.12%   90.3%   -4.82%     
=========================================
  Files           7       7              
  Lines         328     361      +33     
=========================================
+ Hits          312     326      +14     
- Misses         16      35      +19

Impacted Files	Coverage Δ
pilco/models/pilco.py	`93.33% <100%> (+0.39%)`	⬆️
pilco/models/mgpr.py	`100% <100%> (ø)`	⬆️
pilco/rewards.py	`61.11% <26.92%> (-32%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c923040...659f0e7. Read the comment docs.

kyr-pol · 2019-03-08T21:11:57Z

Possibly the extra reward functions etc, if we think they are env specific, can be kept in the swimmer.py file.

Also for the slight change in the policy optimisation function: I don't think we have to always cold start the optimisation by randomising, we can run it once using the last values as initialisation, and then randomise.

nrontsis · 2019-03-14T11:31:16Z

Amazing work; I will work on it later this week.

I definitely agree about the priors; an easy to use interface might be a great selling point for our implementation.

Furthermore, I think that we should:

Extract parts of the code that is reoccurring to relevant functions that will be defined only once and used by all the examples.
Write a Readme/Jupyter notebook detailing the challenges of each environment. This would be a great resource for someone wanting to use PILCO and/or our implementation.

After this is done, we could include the environments in unit tests, requiring any new versions of the library to solve them. This would allow automated testing of new ideas without requiring to manually try their validity on real world examples.

kyr-pol · 2019-03-20T15:53:24Z

Did some work on these two points, check the added notebook, it's in progress, but what do you think of a structure more or less like that? I though it'd be helpful for users getting started, stuck at a task with an pilco running but not seemingly learning.

We could add at the end information more specific to what we did in the examples we included too.

Added 3 more gym environment examples. Small changes to pilco.py, mgp…

d1aa0de

…r.py and additions rewards.py, expained further in the pull request

kyr-pol mentioned this pull request Mar 11, 2019

Can it be applied to 'Pendulum-v0'?? && memory problem #21

Closed

kyr-pol added 2 commits March 19, 2019 22:35

Minor refactoring in examples

2000ca6

Added a tips/toubleshooting file, in progress

59bf7ef

kyr-pol and others added 4 commits March 21, 2019 20:08

Updated tips notebook

865acfa

Added some working code for the model and reward fucntion sections

4f41141

Some additions/fixes in the notebook

ad2b05b

Merge branch 'master' into more_envs_pr

659f0e7

nrontsis merged commit 2bf469b into master Mar 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added 3 more gym environment examples. Small changes to pilco.py, mgp… #23

Added 3 more gym environment examples. Small changes to pilco.py, mgp… #23

kyr-pol commented Mar 8, 2019

kyr-pol commented Mar 8, 2019

codecov-io commented Mar 8, 2019 •

edited

Loading

kyr-pol commented Mar 8, 2019

nrontsis commented Mar 14, 2019

kyr-pol commented Mar 20, 2019

Added 3 more gym environment examples. Small changes to pilco.py, mgp… #23

Added 3 more gym environment examples. Small changes to pilco.py, mgp… #23

Conversation

kyr-pol commented Mar 8, 2019

kyr-pol commented Mar 8, 2019

codecov-io commented Mar 8, 2019 • edited Loading

Codecov Report

kyr-pol commented Mar 8, 2019

nrontsis commented Mar 14, 2019

kyr-pol commented Mar 20, 2019

codecov-io commented Mar 8, 2019 •

edited

Loading