Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made continuous RL environments' API consistent #1941

Merged
merged 18 commits into from Jul 3, 2019

Conversation

@favre49
Copy link
Member

commented Jun 25, 2019

I also added an isTerminal() method for Pendulum environment

*/
Pendulum(const double maxAngularVelocity = 8,
const double maxTorque = 2.0,
const double dt = 0.05) :
const double dt = 0.05,
const double angleThreshold = M_PI/6) :

This comment has been minimized.

Copy link
@zoq

zoq Jun 25, 2019

Member

Is that some default value you selected? Also there is a missing space between the / operator.

This comment has been minimized.

Copy link
@favre49

favre49 Jun 25, 2019

Author Member

Yes, is this too lenient? perhaps M_PI/12 is better?

This comment has been minimized.

Copy link
@zoq

zoq Jun 25, 2019

Member

I was mainly just wondering where this value comes from, but I agree M_PI/12 might be better.

This comment has been minimized.

Copy link
@favre49

favre49 Jun 25, 2019

Author Member

Also, should I update HISTORY.MD? We didn't do it for PR #1931 so I'm not sure when we do this?

This comment has been minimized.

Copy link
@zoq

zoq Jun 25, 2019

Member

Good point, I forgot about the HISTORY document, would be great to update the file as well.

This comment has been minimized.

Copy link
@manish7294

manish7294 Jun 26, 2019

Member

I'm not sure I understand, have I made a mistake in my terminal condition?
No, there is notthing wrong there. As per my understanding Marcus wants to highlight the point that, as of now we don't terminate even when we reaches the terminal state, we just now have a function to lookup whether we arein terminal state or not. So, It could be a good to introduce maximum number of steps to act as a factor for deciding termination point.

I agree. I think if we do that for the Pendulum environment, we should do this for all environments.
Sounds good. If we all agree on this, than maybe the change can just be done within this PR.

This comment has been minimized.

Copy link
@zoq

zoq Jun 26, 2019

Member

the issue I see here is that we don't terminate if we are above the threshold

I'm not sure I understand, have I made a mistake in my terminal condition?

No issue with the. terminal condition, just an issue at which point we stop, but that applies for all tasks.

I agree. I think if we do that for the Pendulum environment, we should do this for all environments.

Sounds good. If we all agree on this, than maybe the change can just be done within this PR.

Fine with me, but I think the default should either be no limit or a pretty high number. What do you think?

This comment has been minimized.

Copy link
@favre49

favre49 Jun 26, 2019

Author Member

I think we could make the default value 0, in which case we won't make a check for termination based on number of time steps.

This comment has been minimized.

Copy link
@favre49

favre49 Jun 27, 2019

Author Member

Do you think we should maybe add a Log::Info stating the reason for termination of the episode, since now there are two possibilities?

This comment has been minimized.

Copy link
@zoq

zoq Jun 27, 2019

Member

Good idea.

@zoq zoq removed the s: unanswered label Jun 25, 2019
favre49 added 5 commits Jun 25, 2019
… number of time steps.
@favre49

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2019

I fixed the multiple pole cart environments, they were meant to have one column more in data. I've changed the for loops appropriately as well.

I also made the changes we talked about in the Pendulum environment. I also noticed the environment does not include a doneReward, so I included that as well. If how I have implemented this in the Pendulum environment is fine, I will extend it to the other environments as well.

Also, since the change in the Action struct would qualify as non-backwards compatible, should I make a change in HISTORY.md to include these in 4.0.0?

const double dt = 0.05,
const double angleThreshold = M_PI / 12,
const double doneReward = 0.0,
const size_t maxTimeSteps = 0) :

This comment has been minimized.

Copy link
@zoq

zoq Jun 27, 2019

Member

Not sure about the name, do you think maxSteps or maxIterations could work as well?

@zoq

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

About the backwards compatible, I think that is reasonable, don't see an easy workaround to support both.

bool done = IsTerminal(nextState);

// Do not reward the agent if time ran out.
if (done && maxTimeSteps != 0 && timeStepsPerformed >= maxTimeSteps)

This comment has been minimized.

Copy link
@manish7294

manish7294 Jun 27, 2019

Member

Should we terminate the moment we reached maximum iterations allowed? Let me know your thoughts on this?

This comment has been minimized.

Copy link
@manish7294

manish7294 Jun 27, 2019

Member

Oh, I see you are doing the same. I thought done is being taken from angle terminal condition. Ignore the above comment.

@@ -37,7 +37,7 @@ BOOST_AUTO_TEST_SUITE(RLComponentsTest)
*/
BOOST_AUTO_TEST_CASE(SimplePendulumTest)
{
const Pendulum task = Pendulum();
Pendulum task = Pendulum();

This comment has been minimized.

Copy link
@zoq

zoq Jun 27, 2019

Member

Might be a good idea to test terminates after the maximal steps as well, we could use a super low value, to keep the runtime low.

@favre49

This comment has been minimized.

Copy link
Member Author

commented Jun 28, 2019

I extended what I did with Pendulum to other environments, and added tests as well. Besides that, I noticed some environments don't have doneReward, so I added that. Finally, I removed the typedef from Acrobot seeing as this PR will remove backwards compatibility anyway.

@zoq

This comment has been minimized.

Copy link
Member

commented Jun 28, 2019

Great, thanks for looking into it.

return State((arma::randu<arma::colvec>(4) - 0.5) / 5.0);
}

/**
* This function checks if the acrobot has reached the terminal state.
*
* @param state The current State.
* @return true if state is a terminal state, otherwise false. *

This comment has been minimized.

Copy link
@zoq

zoq Jun 28, 2019

Member

We can remove the * at the end.

* @param angleThreshold The region about the upright position where the
* state is considered terminal.
* @param doneReward The reward recieved by the agent on success.
* @param maxTimeSteps The number of time steps after which the episode

This comment has been minimized.

Copy link
@zoq

zoq Jun 28, 2019

Member

Can you change every occurrence of maxTimeSteps with maxSteps?

bool done = IsTerminal(nextState);

// Do not reward the agent if time ran out.
if (done && maxSteps != 0 && timeStepsPerformed >= maxSteps)

This comment has been minimized.

Copy link
@zoq

zoq Jun 28, 2019

Member

Since we are going with maxSteps I think it makes sense to call this stepsPerformed�?

{
if (maxSteps != 0 && timeStepsPerformed >= maxSteps)
{
Log::Info << "Episode terminated due to the maximum number of time steps"

This comment has been minimized.

Copy link
@zoq

zoq Jun 28, 2019

Member

Should we go with number of steps?

}

//! Get the number of time steps performed
size_t TimeStepsPerformed() const { return timeStepsPerformed; }

This comment has been minimized.

Copy link
@zoq

zoq Jun 28, 2019

Member

See comment above.

@@ -37,7 +37,7 @@ BOOST_AUTO_TEST_SUITE(RLComponentsTest)
*/
BOOST_AUTO_TEST_CASE(SimplePendulumTest)
{
const Pendulum task = Pendulum();
Pendulum task = Pendulum(8, 2, 0.05, M_PI/12, 0, 5);

This comment has been minimized.

Copy link
@zoq

zoq Jun 28, 2019

Member

Picky style comment, missing space between /.

@@ -57,15 +65,24 @@ BOOST_AUTO_TEST_CASE(SimplePendulumTest)
*/
BOOST_AUTO_TEST_CASE(SimpleContinuousMountainCarTest)
{
const ContinuousMountainCar task = ContinuousMountainCar();
ContinuousMountainCar task = ContinuousMountainCar(-1.2, 0.6, 0.45, -0.07,

This comment has been minimized.

Copy link
@zoq

zoq Jun 28, 2019

Member

Might be a good idea to introduce a method to set the maximal number of steps? That way we could use the default constructor.

favre49 added 4 commits Jun 30, 2019
Copy link
Member

left a comment

Great work adapting the environments! I think it is ready to merge.

@@ -104,7 +104,9 @@ class CartPole
* @param tau The time interval.
* @param thetaThresholdRadians The maximum angle.
* @param xThreshold The maximum position.
* @param doneReward Reward recieved on termination.
* @param doneReward Reward recieved by agent on success.

This comment has been minimized.

Copy link
@manish7294

manish7294 Jul 1, 2019

Member

picky comment!, I think we should keep the description of doneReward as "Reward recieved by agent on termination"

This comment has been minimized.

Copy link
@favre49

favre49 Jul 1, 2019

Author Member

I made this change since now the termination of an episode can happen because the maximum number of steps have been taken (which could be a success in cases like cart pole but a failure in other cases) or any other termination case we have included. I changed the description to make it more clear, since it may not be evident to the user.

This comment has been minimized.

Copy link
@manish7294

manish7294 Jul 1, 2019

Member

Still I think termination word deals with both the cases unbiasedly. Though, no worries, you can keep it that way. :)

@mlpack-bot
mlpack-bot bot approved these changes Jul 2, 2019
Copy link

left a comment

Second approval provided automatically after 24 hours. 👍

@manish7294 manish7294 merged commit c62088a into mlpack:master Jul 3, 2019
4 of 6 checks passed
4 of 6 checks passed
Static Code Analysis Checks Build finished.
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
LaTeX Documentation Checks Build finished.
Details
Memory Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@manish7294

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

Great work, glad to merge this in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.