-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can it be applied to 'Pendulum-v0'?? && memory problem #21
Comments
Hi Wonchul, I've worked with the Pendulum-v0 environment too. Firstly let me note that on the more_envs branch we are working on several extra environments (we have good results in a version Pendulum-v0, inverted double pendulum and the Swimmer), we are gonna merge some of the work there on master soon after cleaning it up a bit, but you can take a look if you are looking for extra applications. The problem with the Pendulum-v0 is that although the dynamics are pretty simple, it's hard for PILCO to predict a trajectory, because the initial angle of the pendulum can be anything (I think it is initialised uniformly). That makes planning with a normally distributed prediction for every time-step, as PILCO does, not that useful. What I did was change this initialisation, from the gym source code, to a starting position with the pendulum at the bottom, with a reasonably small amount of starting uncertainty (~0.1). Now this is an easier task than what the original gym one, but since the pendulum swing-up task is a standard control benchmark, we might still want to solve it this way (the version used in the original PILCO paper is like this too). Now for the memory issues, I have encountered them too, there a few things you can do, and they are related to T, the number of time-steps in the planning horizon and N, the number of runs and subsequently the number of data points you are working with.
Also be ware that the default reward function and initial state of PILCO won't work for Pendulum-v0. Copying from
SUBS is the subsampling rate, target and weights are the reward function parameters. With these parameters I've had consistent good performance. |
Hi, Kyriakos.
First, I want to thank you for sharing your experience.
I surely want to take a look before you complete the work, if you don't
mind.
What I want to do is only to understand how PILCO exactly works on the
level of implementation using openai gym, not mujoco, because I don't have
license...
And, for the memory problem, is there any way to assign memory in advance
with gpflow as tensorflow?
For example, there is something I could manage the memory using 'config' in
tensorflow.
Thank you.
Wonchul Kim.
2019년 2월 21일 (목) 오후 9:20, Kyriakos Polymenakos <notifications@github.com>님이
작성:
… Hi Wonchul, I've worked with the Pendulum-v0 environment too.
Firstly let me note that on the more_envs branch we are working on several
extra environments (we have good results in a version Pendulum-v0, inverted
double pendulum and the Swimmer), we are gonna merge some of the work there
on master soon after cleaning it up a bit, but you can take a look if you
are looking for extra applications.
The problem with the Pendulum-v0 is that although the dynamics are pretty
simple, it's hard for PILCO to predict a trajectory, because the initial
angle of the pendulum can be anything (I think it is initialised
uniformly). That makes planning with a normally distributed prediction for
every time-step, as PILCO does, not that useful. What I did was change this
initialisation, from the gym source code, to a starting position with the
pendulum at the bottom, with a reasonably small amount of starting
uncertainty (~0.1). Now this is an easier task than what the original gym
one, but since the pendulum swing-up task is a standard control benchmark,
we might still want to solve it this way (the version used in the original
PILCO paper is like this too).
Now for the memory issues, I have encountered them too, there a few things
you can do, and they are related to T, the number of time-steps in the
planning horizon and N, the number of runs and subsequently the number of
data points you are working with.
- Reduce time horizon, possibly by using subsampling. During planning,
a number of matrices are created and held in memory simultaneously,
proportional to the number of time steps in the planning horizon. You might
want to decrease that number. If you feel a longer time horizon is needed,
you can use subsampling, basically repeating each action for m time-steps,
and only showing PILCO the state every m time-steps. That way you can plan
ahead long enough, without the memory problems. There is a simple way to
implement that by changing the rollout function in inverted_pendulum.py
.
- Use sparse GPs. By using the num_induced_points argument when you
call the PILCO constructor you can set the number of data points used for
the GP inference.
Also be ware that the default reward function and initial state of PILCO
won't work for Pendulum-v0. Copying from inverted_pendulum.py in
more_envs:
# NEEDS a different initialisation than the one in gym (change the reset() method),
# to (m_init, S_init)
SUBS=3
bf = 30
maxiter=50
max_action=2.0
target = np.array([1.0, 0.0, 0.0])
weights = np.diag([2.0, 2.0, 0.3])
m_init = np.reshape([-1.0, 0, 0.0], (1,3))
S_init = np.diag([0.01, 0.05, 0.01])
T = 40
J = 4
N = 8
restarts = True
SUBS is the subsampling rate, target and weights are the reward function
parameters. With these parameters I've had consistent good performance.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#21 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AT7jwQYN-0ccVrsVvqHtBl9jINGKaIiPks5vPo8XgaJpZM4bGbIO>
.
|
Sure, it makes sense to want to use it without mujoco. For the memory problem yes, you can do it the standard tensorflow way. What we use when running on a GPU is something like:
before making the environment. I am not sure this is gonna help, since it restricts the memory tf is taking, but if you are running out of it, it probably means that tf used up all available memory and it still wasn't enough. If you have many data points and/or long planning horizons, especially in higher dimensional problems, think about subsampling or sparse GPs. |
Thanks for your advice!
I will try it!!
If you tried PILCO to other environments including pendulum-vo, could I see
the code?
…On Fri, 22 Feb 2019 at 1:08 AM Kyriakos Polymenakos < ***@***.***> wrote:
Sure, it makes sense to want to use it without mujoco.
For the memory problem yes, you can do it the standard tensorflow way.
What we use when running on a GPU is something like:
config = tf.ConfigProto()
gpu_id = kwargs.get('gpu_id', "1")
config.gpu_options.visible_device_list = gpu_id
config.gpu_options.per_process_gpu_memory_fraction = 0.80
sess = tf.Session(graph=tf.Graph(), config=config)
with sess:
before making the environment.
I am not sure this is gonna help, since it restricts the memory tf is
taking, but if you are running out of it, it probably means that tf used up
all available memory and it still wasn't enough. If you have many data
points and/or long planning horizons, especially in higher dimensional
problems, think about subsampling or sparse GPs.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#21 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AT7jwWWqWiInTBxBEJuQ3QvAJN3O5Pqcks5vPsR1gaJpZM4bGbIO>
.
|
Hey @wonchul-kim you might wanna check the pull request I added here, it should be more clear and easier to make sense than the more_envs branch I mentioned above. It includes 3 extra environments, including the Pendulum-v0. |
I am closing this for now, if there are more questions feel free to re-open it. |
Hi!
I tried to apply it with Pendulum-v0 environment.
However, I don't think it worked well at all.
Could you give me some advice?
And, when I run it, there came some error b/c of memory shortage.
is there something I can manage the memory?
The text was updated successfully, but these errors were encountered: