New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acrobat game for Reinforcement learning Environments #1329

Merged
merged 33 commits into from Apr 16, 2018

Conversation

Projects
None yet
3 participants
@mirraaj
Contributor

mirraaj commented Mar 23, 2018

https://gym.openai.com/envs/Acrobot-v1/
Implementation of Acrobat using Runge-Kutta method for computing new states.
@zoq , @rcurtin you might want to have a look

mirraaj added some commits Mar 22, 2018

@zoq

This comment has been minimized.

Member

zoq commented Mar 23, 2018

Can you test the new environment against an existing implementation?

@zoq

Just made some higher level comment at this point, will look into the actual implementation once the environment is tested.

#ifndef MLPACK_METHODS_RL_ENVIRONMENT_ACROBAT_HPP
#define MLPACK_METHODS_RL_ENVIRONMENT_ACROBAT_HPP
#define PIE 3.14159265358979323846

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

You can use M_PI.

#define MLPACK_METHODS_RL_ENVIRONMENT_ACROBAT_HPP
#define PIE 3.14159265358979323846
#include <math.h>

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

cmath is already include if you include prereqs.hpp.

#define PIE 3.14159265358979323846
#include <math.h>
#include <random>

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

Maybe we should include core.hpp instead of prereqs.hpp which includes random?

This comment has been minimized.

@mirraaj

mirraaj Mar 23, 2018

Contributor

I missed out core.hpp . I have added the changes

static constexpr size_t dimension = 4;
private :
arma::colvec data;

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

Would be great if you could add a comment here.

* @param max_vel2 max angular velocity of link2.
*/
Acrobat(const double gravity = 9.81,
const double link_length1 = 1.0,

This comment has been minimized.

@zoq
// Add noise to the Torque
std::default_random_engine generator;
std::uniform_real_distribution<double> distribution(-0.1, +0.1);
double torque = double(Action - 1) + distribution(generator);

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

Do you mind to use

inline double Random(const double lo, const double hi)
here. I think that would make the expression cleaner.

* @param state Current State
* @param torque Torque Applied
*/
arma::colvec dsdt(arma::colvec state,

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

Please use upper camel casing for method names.

{
double theta1 = state.Theta1();
double theta2 = state.Theta2();
return bool ((-cos(theta1)-cos(theta1+theta2)) > 1.0);

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

Should we directly use state.Theta1() instead of make a copy first?

{
double theta1 = state.Theta1();
double theta2 = state.Theta2();
return bool ((-cos(theta1)-cos(theta1+theta2)) > 1.0);

This comment has been minimized.

@zoq

zoq Mar 23, 2018

Member

No need to make an explicit cast here.

mirraaj added some commits Mar 23, 2018

@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Mar 23, 2018

I tried to test the environment with existing q_learning_test.hpp file in mlpack/tests. It seems working fine. Should I write some specific code to test it ?

@zoq

This comment has been minimized.

Member

zoq commented Mar 23, 2018

If you could write a test that uses the new environment, that would be great, if you like you can add a new test to q_learning_test.hpp.

@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Mar 23, 2018

@zoq I have wriiten a sample test in q_learning_test.cpp

@mirraaj mirraaj closed this Mar 23, 2018

@mirraaj mirraaj reopened this Mar 23, 2018

@zoq

This comment has been minimized.

Member

zoq commented Mar 24, 2018

Great, can you make the test part of the PR?

mirraaj added some commits Mar 24, 2018

Added Acrobat test
Convergence
@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Mar 24, 2018

@zoq done :)

mirraaj added some commits Mar 24, 2018

mirraaj added some commits Mar 29, 2018

Revert Changes in Acrobat Game
Removed 50 trials
@zoq

This comment has been minimized.

Member

zoq commented Apr 2, 2018

I'll have to check the internal logic, but in the meantime it would be great if you could take a look at the style guidelines, e.g. https://github.com/mlpack/mlpack/wiki/DesignGuidelines#line-length-and-wrapping.

@@ -1,6 +1,6 @@
/**
* @file q_learning_test.hpp
* @author Shangtong Zhang
* @author Shangtong Zhang and Rohan Raj

This comment has been minimized.

@rcurtin

rcurtin Apr 2, 2018

Member

You can just add a second line @author Rohan Raj here; I think Doxygen will not parse multiple authors on one line correctly.

This comment has been minimized.

@mirraaj

mirraaj Apr 2, 2018

Contributor

Should it be like?
@author Shangtong Zhang
@author Rohan Raj

This comment has been minimized.

@rcurtin

rcurtin Apr 2, 2018

Member

Right, exactly.

This comment has been minimized.

@mirraaj

mirraaj Apr 2, 2018

Contributor

I am waiting for this file to be reviewed . Once it is done, I will add all the changes accordingly. Thanks

@zoq

The logic looks good to me.

* Torque is action number - 1.
* {0,1,2} -> {-1,0,1}
*/
double torque = double(action - 1) + mlpack::math::Random(-0.1, 0.1);

This comment has been minimized.

@zoq

zoq Apr 3, 2018

Member

We could directly return the torque. This applies for the other functions as well.

This comment has been minimized.

@mirraaj

mirraaj Apr 3, 2018

Contributor

Ideally I was hoping this review. I have a habit of naming the variable and then returning it. It looks clean and understandable to me.

I have visited mlpack structure and it seems values are returned directly. I will add these changes .

return returnValues;
};
/**
* @param value scalar value to wrap

This comment has been minimized.

@zoq

zoq Apr 3, 2018

Member

Would be great if you could add a comment what this function does.

This comment has been minimized.

@mirraaj

mirraaj Apr 3, 2018

Contributor

Sure I will add it once the code is reviewed.

This comment has been minimized.

@zoq

zoq Apr 3, 2018

Member

The internal logic looks correct to me.

This comment has been minimized.

@mirraaj

mirraaj Apr 3, 2018

Contributor

Do you think this is ready to be merged after adding all the reviews?

* @param Action Action Taken
* @param nextState nextState
*/
arma::colvec Rk4(const arma::colvec state,

This comment has been minimized.

@zoq

zoq Apr 3, 2018

Member

Do you think we could use boost odeint here?

This comment has been minimized.

@mirraaj

mirraaj Apr 3, 2018

Contributor

Do you mind giving me an example. I am not sure how you see this implementation . @zoq

This comment has been minimized.

@zoq

zoq Apr 5, 2018

Member

I haven't really looked into it yet, but https://www.boost.org/doc/libs/master/libs/numeric/odeint/doc/html/index.html might be helpful, this is just an idea, I don't mind to go with the current implementation.

* @param minimum minimum range of bound
* @param maximum maximum range of bound
*/
double Bound(double value,

This comment has been minimized.

@zoq

zoq Apr 3, 2018

Member

Not sure, an extra function is necessary here, since the actual implementation is quite simple and it's only used twice. Let me know what you think.

This comment has been minimized.

@mirraaj

mirraaj Apr 3, 2018

Contributor

I think it is needed here. There are chances that angle may go out of bound. To bring back we need a wrap function. Like 5pi/2 is ideally pi/2. If we not use the wrap feature state space will unnecessarily increase?

How do you see this?

This comment has been minimized.

@zoq

zoq Apr 5, 2018

Member

Agreed, but do you think we need an extra function, or could we directly use std::min(std::max(value, minimum), maximum);.

sin(theta2) - phi2) / (m2 * pow(lc2, 2) + I2 - pow(d2, 2) / d1);
double ddtheta1 = -(d2 * ddtheta2 + phi1) / d1;
arma::colvec returnValues = {dtheta1, dtheta2, ddtheta1, ddtheta2};

This comment has been minimized.

@zoq

zoq Apr 3, 2018

Member

It might be faster, to initalize returnValues once, and use the parameter directly.

This comment has been minimized.

@mirraaj

mirraaj Apr 3, 2018

Contributor

I will add this in my final commit. Please convey me if the code looks correct to you.

This comment has been minimized.

@mirraaj

mirraaj Apr 4, 2018

Contributor

@zoq I do not get how you want this initialisation to take place.

This comment has been minimized.

@zoq

zoq Apr 5, 2018

Member

I was thinking about:

arma::colvec values(4);
values(0) = ...;
values(1) = ...;

return values;
arma::colvec Dsdt(arma::colvec state,
const double torque) const
{
double m1 = linkMass1;

This comment has been minimized.

@zoq

zoq Apr 3, 2018

Member

Not sure creating an alias for each parameter helps to improve the readability. I guess, we could also just use linkLength1 instead of lc1. Let me know what you think.

This comment has been minimized.

@mirraaj

mirraaj Apr 3, 2018

Contributor

I wanted the code to be readable and hence I used the name. I feel the code is more readable to me. However I will alter it as per your suggesstion

@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 5, 2018

@zoq, @rcurtin I think this commit is ready to be merged.

@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 6, 2018

Dear @rcurtin , @zoq . I have been working on using boost odient library to integrate RK4 method in Acrobat system. To use such implementation, I would have to include boost odient library (https://www.boost.org/doc/libs/1_55_0/boost/numeric/odeint.hpp) . Now this can be done in two ways. I can directly include this header file in my Acrobat file and do away with it.But there might be some version issues that might come because of this (https://github.com/mlpack/mlpack/blob/master/src/mlpack/core/boost_backport/boost_backport_serialization.hpp) . The second option is to add support for boost odient on mlpack.

Please guide me how should I proceed. Is it required to add support for odient in mlpack to continue with runge kutta's method. If odient is already included in mlpack, please provide me sources of the file. Thanks

@zoq

This comment has been minimized.

Member

zoq commented Apr 6, 2018

Hello, can you clarify what you mean with "But there might be some version issues that might come because of this[...] ", I'm not sure I see the problem, I guess including boost/numeric/odeint.hpp inside the env is just fine.

@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 6, 2018

Dear @zoq , I think problem might be here. I am using boost for the first time.
If a users boost_version < 105600, will it cause any problem to them?

https://github.com/mlpack/mlpack/blob/master/src/mlpack/core/boost_backport/boost_backport_serialization.hpp
#if BOOST_VERSION < 105600
// Backported unordered_map.
#include "mlpack/core/boost_backport/unordered_map.hpp"
#else
// Boost's version.
#include <boost/serialization/unordered_map.hpp>
#endif

#if BOOST_VERSION == 105800
/**

  • Boost versions 1.58.0 and earlier have a different vector serialization
  • behaivor as compared to later versions. Notably, loading a
  • std::vectorarma::mat does not clear the vector before the load
  • in v1.58 and earlier; while in the later versions, the vector is cleared
  • before loading. This causes some tests related to serialization to fail
  • with versions 1.58. This backport solves the issue.
    */
    #ifdef BOOST_SERIALIZATION_VECTOR_HPP
    #pragma message "Detected Boost version is 1.58. Including
    boost/serialization/vector.hpp before mlpack/core.hpp can cause problems. It
    should only be necessary to include mlpack/core.hpp and not
    boost/serialization/vector.hpp."
    #endif
    #include "mlpack/core/boost_backport/collections_load_imp.hpp"
    #include "mlpack/core/boost_backport/collections_save_imp.hpp"
    #include "mlpack/core/boost_backport/vector.hpp"
    #else
    #include <boost/serialization/vector.hpp>
    #endif
@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 6, 2018

Directly importing the library is the best option. But I am confused with the boost_backport implementation on mlpack.

@rcurtin

This comment has been minimized.

Member

rcurtin commented Apr 6, 2018

I'd like to avoid another Boost dependency if possible, so if you can check to make sure that there is not another way to solve the problem that would be great.

If we do have to use boost's odeint implementation, it is not a problem to simply #include <...> the file from your code. It's a header-only library so we don't need to change any link options. Be careful to make sure that you only include the library where you need it, since including boost headers can make compilation take a long time.

As long as the support you are using from the Boost library is available in the oldest version of Boost that we support (I believe 1.49, you can check the readme to see), then there is no need to backport anything.

@zoq

This comment has been minimized.

Member

zoq commented Apr 6, 2018

The extra dependency is a good point, let's go with the current implementation.

@zoq

This comment has been minimized.

Member

zoq commented Apr 6, 2018

If you can handle the style and comment issues, I think this is ready to be merged.

Style Changes
Added reviews
@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 6, 2018

@zoq I think now it is ready to be merged.

@zoq

This comment has been minimized.

Member

zoq commented Apr 7, 2018

Hello @luffy1996, can you make sure each function has a description.

Comments Added
File is reviewed
@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 7, 2018

Dear @zoq, @rcurtin All the reviews have been added. I have tried to explain all the functions briefly. I'd like this game to be merged. Thanks

@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 10, 2018

Dear @zoq , @rcurtin Do you have any other suggestions ? I have added all reviews from my side. Thanks

@zoq

zoq approved these changes Apr 10, 2018

Thanks for the contribution; I'll go ahead and merge this in 3 days to leave time for any other comments. I'll fix the remaining style issue once this is merged.

@mirraaj

This comment has been minimized.

Contributor

mirraaj commented Apr 16, 2018

@zoq , Do I need to add something more here. I guess this is set to be merged. Thanks

@zoq zoq merged commit e44e13f into mlpack:master Apr 16, 2018

5 checks passed

Memory Checks
Details
Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zoq

This comment has been minimized.

Member

zoq commented Apr 16, 2018

Thanks for the contribution; made some style fixes in 6df8d63.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment