Skip to content


Repository files navigation


This repository contains an implementation of reinforcement learning based on: * Proximal Policy Optimization with a Critic Network as a baseline and with a Generalized Advantage Estimation The agent being training is a creature with 4 arms and 4 forearms. It has a 20 double-jointed arms. The agent rewards are * +0.03 times body velocity in the goal direction. * +0.01 times body direction alignment with goal direction. This environment is simliar to the crawler of Unity.
The action space is continuous [-1.0, +1.0] and consists of 20 values, corresponding to target rotations for joints.
The environment is considered as solved if the average score of the 20 agents is +30 for 100 consecutive episodes.
A video of a trained agent can be found by clicking on the image here below

  • TO UPDATE --> PPO: Video

Content of this repository

  • folder agents: contains the implementation of
    • a Gaussian Actor Critic network for the PPO
    • an implementation of a Proximal Policy Optimization
  • folder weights:
    • weights of the Gaussian Actor Critic Network that solved this environment with PPO
  • Notebooks
    • jupyter notebook Continuous_Control-PPO-LeakyReLU.ipynb: run this notebook to train the agents using PPO


To run the codes, follow the next steps:

  • Create a new environment:
    • Linux or Mac:
     conda create --name drlnd python=3.6
     source activate drlnd
    • Windows:
     conda create --name drlnd python=3.6 
     activate drlnd
  • Perform a minimal install of OpenAI gym
    • If using Windows,
      • download swig for windows and add it the PATH of windows
      • install Microsoft Visual C++ Build Tools
    • then run these commands
     pip install gym
     pip install gym[classic_control]
     pip install gym[box2d]
  • Install the dependencies under the folder python/
	cd python
	pip install .
  • Fix an issue of pytorch 0.4.1 to allow backpropagate the torch.distribution.normal function up to its standard deviation parameter
    • change the line 69 of Anaconda3\envs\drlnd\Lib\site-packages\torch\distributions\
# old line
# tensor_idxs = [i for i in range(len(values)) if values[i].__class__.__name__ == 'Tensor']
# new line
tensor_idxs = [i for i in range(len(values)) if isinstance(values[i], torch.Tensor)]
  • Create an IPython kernel for the drlnd environment
	python -m ipykernel install --user --name drlnd --display-name "drlnd"
jupyter notebook
  • Once started, change the kernel through the menu Kernel>Change kernel>drlnd
  • If necessary, inside the ipynb files, change the path to the unity environment appropriately


About solving the crawl environment from unity ml






No releases published
