Implements a policy optimization technique via Markovian score climbing
Create a conda environment
conda create -n NAME python=3.10
Then head to the cloned repository and execute
pip install -e .
A policy learning example on a simple pendulum environment
python examples/feedback/rb_csmc_pendulum.py