Skip to content

RLSimion

BorjaFG edited this page Feb 26, 2019 · 28 revisions

RLSimion is a C++ app that allows to use Reinforcement Learning algorithms on continuous control problems. It features several built-in simulated environments:

  • Mountain car: classical problem in the control literature. The car must be driven to the top of a hill
  • Swing-up pendulum: another classical control problem in which the goal is to swing the pendulum until it stands up
  • Double pendulum: similar problem with two pendulums
  • Balancing pole: a car beneath a pole must be driven without allowing the pole to fall
  • Underwater vehicle: an underwater vehicle must be controlled to follow the setpoint
  • Pitch control: it simulates the pitching control of an airplane that must follow a setpoint
  • Robot control: a robot must be driven toward the goal position
  • Push-box 1: one robot push a box toward the goal position
  • Push-box 2: two robots push a box toward the goal position
  • Pull-box 1: one robot linked to a box pull the box toward the goal position
  • Pull-box 2: two robots linked to a box by a rope pull the box toward the goal position
  • Rain car: simple test environment in which the goal is to drive a car underneath a tree as fast as possible
  • Wind-turbine: a 2-mass model of a Variable-Speed Wind Turbine in which the agent controls the blade pitch and the generator's torque to keep the generated electrical power as close as possible to the nominal value
  • FAST Wind-turbine: realistical model that simulates a Variable-Speed Wind Turbine based on OpenFASTv2

Arguments

Usage: RLSimion.exe <configFile> [-pipe=<pipename>] [-requirements] [-local]

  • configFile: The name of the parameter configuration file (the extension should be .simion.exp, that is, an experimental unit with no forks).
  • pipe: If we are monitoring the experiment, RLSimion will try to connect with the monitoring process via a pipe named <pipename>.
  • requirements: If set, instead of running the experiment, the process will output the requirements of the experiment via the standard output. This is used from Badger before sending experiments to remote machines to determine the required files.
  • local: If set, a graphical window will be created to visualize the experiment in real-time.

Basic algorithm

Reinforcement Learning (RL) agents learn from interaction with the environment (world), following this basic algorithm

foreach(episode)
  reset world
  foreach(step)
    s= world.state()
    a= agent.selectAction()
    r= world.executeAction(a)
    s_p= world.state()
    agent.update(s,a,s_p,r)

Experiment definition

An experiment consists of a series of episodes, which may be used for training or evaluation. Usually, agents for a number of episodes and then the policy learned so far is evaluated (no learning occurs during evaluation). Once the number of training episodes is set (may be 0 if we only want to evaluate an agent/controller), it must be decided how often evaluations will be done (after how many training episodes), and finally, how many episodes will be used for evaluation. Certain worlds may require more complex evaluations: for example, when evaluating a wind-turbine controller we may want to measure performance with different wind speeds. In such cases, the following method must be called from the constructor of the world to override the default number of episodes per evaluation (1):

SimionApp::get()->pEpisode->setNumEpisodesPerEvaluation(nEpisodesPerEvaluation)

Evaluation will always take place at the very beginning of the learning process, at the end (unless the number of training episodes is 0) and every n training episodes.

Consider the following example: the number of training episodes is 5, evaluation is done every 2 training episodes, and each evaluation consists of a single episode. The total number of episodes would be 5 + ((5/2)+1)*1= 8. The values returned by the following CExperiment methods would be:

Episode numbering example 1

On the other hand, if we use 2 episodes per evaluation, the number of episodes would be 5 + ((5/2)+1)*2= 11. The following values would be obtained:

Episode numbering example 2

Binding parameters with GUI (Badger)

Parameters that are to be set from GUI application (Badger) are embbeded within the source code. They get their value from an XML node (ConfigNode) on construction, and they are defined using of the following classes in parameters.h:.

  • DOUBLE_PARAM : real number. For example:
    m_startOffset = DOUBLE_PARAM(pConfigNode, "Start-Offset", "Normalized time from which the schedule will begin [0...1]", 0.0);
    The first parameter is the configuration node passed to the constructor (a node in the xml hierarchy), the second is the name given to the parameter (the one shown in Badger), the third is an explanation of the parameter to be shown as a tooltip, and the last one is the default value (the one given if this parameter is not present in the configuration file).
  • INT_PARAM : integer number.
  • BOOL_PARAM : boolean value.
  • STRING_PARAM : a string.
  • DIR_PATH_PARAM : a directory.
  • FILE_PATH_PARAM : a file.
  • ENUM_PARAM : a value from an enumerated type. The enumerated types (i.e, Interpolation) are declared in parameters.h, and then the variable of the enumerated type is declared in whichever class instantiates it using the template:
    m_interpolation = ENUM_PARAM<Interpolation>(pConfigNode, "Interpolation", "Interpolation type", interpolation::linear);
  • STATE_VARIABLE : a reference to a state variable. Using it instead of a string allows us to validate the variable names in Badger.
    m_hVariable = STATE_VARIABLE(pConfigNode,"Variable", "The state variable");
  • ACTION_VARIABLE : a reference to an action variable.
  • CHILD_OBJECT : an instance of any class that can be created using a regular constructor with only one argument of type CConfigNode:
    m_pVFunction = CHILD_OBJECT<LinearStateVFA>(pConfigNode, "VFunction", "The Value-function");
  • CHILD_OBJECT_FACTORY : an instance of a class that inherits from a base class implementing a getInstance() function of the form:
    static std::shared_ptr<T> getInstance(ConfigNode* pConfigNode); These objects are instantiated:
    m_pAlphaV = CHILD_OBJECT_FACTORY <NumericValue>(pConfigNode, "Alpha-v", "Learning gain used by the critic");
  • MULTI_VALUE / MULTI_VALUE_FACTORY / MULTI_VALUE_SIMPLE_PARAM : These three types of parameters allow us to instantiate a set of objects of a given class.
  • CHOICE : this is a utility object type that is meant to simplify the parsing of the getInstance() function required to use ..._FACTORY objects. For a proper parsing, it has to have the form:
    std::shared_ptr<Controller> Controller::getInstance(CConfigNode* pConfigNode) { return CHOICE<Controller>(pConfigNode, "Controller", "The specific controller to be used", { {"PID",CHOICE_ELEMENT_NEW<PIDController>}, {"LQR",CHOICE_ELEMENT_NEW<LQRController>}, {"Jonkman",CHOICE_ELEMENT_NEW<WindTurbineJonkmanController>}, {"Vidal",CHOICE_ELEMENT_NEW<WindTurbineVidalController>}, {"Boukhezzar",CHOICE_ELEMENT_NEW<WindTurbineBoukhezzarController>}, {"Extended-Vidal",CHOICE_ELEMENT_NEW<ExtendedWindTurbineVidalController>}, {"Extended-Boukhezzar",CHOICE_ELEMENT_NEW<ExtendedWindTurbineBoukhezzarController>} }); }

Statistics

We can tell the logger to save the value of a given variable (it will be saved each timestep) using:
template <typename T> void Logger::addVarToStats<T>(const char* key, const char* subkey, T& variable);

For example:
SimionApp::get()->pLogger->addVarToStats<double>("TD-error", "TD-error", m_td);

Of course, we are passing references to variables, so they need to be alive all the experiment (no local variables).

Clone this wiki locally