Skip to content

Commit

Permalink
Merge pull request #136 from pockerman/refactor_to_v1_api
Browse files Browse the repository at this point in the history
Add new C++ example documentation
  • Loading branch information
pockerman committed Mar 26, 2022
2 parents e7c94bc + d858f13 commit 864ebe5
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 1 deletion.
86 changes: 86 additions & 0 deletions docs/source/ExamplesCpp/rl/rl_example_7.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
Policy iteration on ``FrozenLake-v0`` (C++)
=========================================================


Code
----

.. code-block::
#include "cubeai/base/cubeai_types.h"
#include "cubeai/rl/algorithms/dp/policy_iteration.h"
#include "cubeai/rl/trainers/rl_serial_agent_trainer.h"
#include "cubeai/rl/policies/uniform_discrete_policy.h"
#include "cubeai/rl/policies/stochastic_adaptor_policy.h"
#include "gymfcpp/gymfcpp_types.h"
#include "gymfcpp/frozen_lake_env.h"
#include "gymfcpp/time_step.h"
#include <boost/python.hpp>
#include <iostream>
.. code-block::
namespace rl_example_7
{
using cubeai::real_t;
using cubeai::uint_t;
using cubeai::rl::policies::UniformDiscretePolicy;
using cubeai::rl::policies::StochasticAdaptorPolicy;
using cubeai::rl::algos::dp::PolicyIteration;
using cubeai::rl::algos::dp::PolicyIterationConfig;
using cubeai::rl::RLSerialAgentTrainer;
using cubeai::rl::RLSerialTrainerConfig;
typedef gymfcpp::TimeStep<uint_t> time_step_type;
}
.. code-block::
int main() {
using namespace rl_example_7;
Py_Initialize();
auto gym_module = boost::python::import("__main__");
auto gym_namespace = gym_module.attr("__dict__");
gymfcpp::FrozenLake<4> env("v0", gym_namespace);
env.make();
UniformDiscretePolicy policy(env.n_states(), env.n_actions());
StochasticAdaptorPolicy policy_adaptor(env.n_states(), env.n_actions(), policy);
PolicyIterationConfig config;
config.gamma = 1.0;
config.n_policy_eval_steps = 100;
config.tolerance = 1.0e-8;
PolicyIteration<gymfcpp::FrozenLake<4>, UniformDiscretePolicy, StochasticAdaptorPolicy> algorithm(config,
policy, policy_adaptor);
RLSerialTrainerConfig trainer_config = {10, 10000, 1.0e-8};
RLSerialAgentTrainer<gymfcpp::FrozenLake<4>,
PolicyIteration<gymfcpp::FrozenLake<4>,
UniformDiscretePolicy,
StochasticAdaptorPolicy>> trainer(trainer_config, algorithm);
auto info = trainer.train(env);
std::cout<<info<<std::endl;
// save the value function into a csv file
policy_itr.save("policy_iteration_frozen_lake_v0.csv");
return 0;
}
Results
-------

3 changes: 2 additions & 1 deletion docs/source/rl_examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@ The following is a list of reinforcement learning examples a user can go through
.. toctree::
:maxdepth: 2

Examples/Dummy/dummy_gent_example
ExamplesCpp/rl/rl_example_0
Examples/Dummy/dummy_gent_example
ExamplesCpp/rl/rl_example_6
Examples/Dp/iterative_policy_evaluation_frozen_lake_v0
Examples/Dp/policy_iteration_frozen_lake_v0
ExamplesCpp/rl/rl_example_7
Examples/Dp/value_iteration_frozen_lake_v0
Examples/Td/q_learning_frozen_lake_v0
Examples/Td/q_learning_cliff_walking_v0.rst
Expand Down

0 comments on commit 864ebe5

Please sign in to comment.