Skip to content

About solving the reacher from Unity using a variation of a Deep Deterministic Policy Gradients with parallel agents and Proximal Policy Optimization with Generalized Advantage Estimation

Notifications You must be signed in to change notification settings

handria-ntoanina/unity-ml-reacher

Repository files navigation

unity-ml-reacher

This repository contains an implementation of reinforcement learning based on: * DDPG but using parallel agents to solve the unity reacher environment * Proximal Policy Optimization with a Critic Network as a baseline and with a Generalized Advantage Estimation It has a 20 double-jointed arms. Each one has to reach a target. Whenever one arm reaches its target, a reward of up to +0.1 is received. This environment is simliar to the reacher of Unity.
The action space is continuous [-1.0, +1.0] and consists of 4 values for 4 torques to be applied to the two joints.
The environment is considered as solved if the average score of the 20 agents is +30 for 100 consecutive episodes.
A video of a trained agent can be found by clicking on the image here below

  • DDPG: Video
  • PPO: Video

Content of this repository

  • report.pdf: a document that describes the details of implementation of the DDPG, along with ideas for future work
  • report-ppo.pdf: a document that describes the details of implementation of the PPO
  • folder agents: contains the implementation of
    • a parallel DDPG using one network shared by all agents
    • a parallel DDPG with multiple network
    • an Actor-Critic network model using tanh as activation
    • a ReplayBuffer
    • an ActionNoise that disturb the output of the actor network to promote exploration
    • a ParameterNoise that disturb the weight of the actor network to promote exploration
    • an Ornstein-Uhlenbeck noise generator
    • an implementation of a Proximal Policy Optimization
    • a Gaussian Actor Critic network for the PPO
  • folder started_to_converge: weights of a network that started to converge but slowly
  • folder weights:
    • weights of the network trained with DDPG that solved this environment. It contains as well the history of the weights.
    • weights of the Gaussian Actor Critic Network that solved this environment with PPO
  • folder research:
    • Cozmo25 customized the source code of ShangTong to solve the reacher using PPO
    • this folder contains one file all.py that has only the code necessary by the PPO
    • compare.ipynb to compare the performance between that implementation and the ppo.py of this repository
  • Notebooks
    • jupyter notebook Continuous_Control.ipynb: run this notebook to train the agents using DDPG
    • jupyter notebook noise.ipynb: use this notebook to optimize the hyperparameter of the noise generator to check that its output would not limit the exploration
    • jupyter notebook view.ipynb: a notebook that can load the different saved network weights trained with DDPG and visualize the agents
    • jupyter notebook Continuous_Control-PPO.ipynb: a notebook to train an agent using PPO and then to view the trained agent

Requirements

To run the codes, follow the next steps:

  • Create a new environment:
    • Linux or Mac:
     conda create --name ddpg python=3.6
     source activate ddpg
    • Windows:
     conda create --name ddpg python=3.6 
     activate ddpg
  • Perform a minimal install of OpenAI gym
    • If using Windows,
      • download swig for windows and add it the PATH of windows
      • install Microsoft Visual C++ Build Tools
    • then run these commands
     pip install gym
     pip install gym[classic_control]
     pip install gym[box2d]
  • Install the dependencies under the folder python/
	cd python
	pip install .
  • Fix an issue of pytorch 0.4.1 to allow backpropagate the torch.distribution.normal function up to its standard deviation parameter
    • change the line 69 of Anaconda3\envs\drlnd\Lib\site-packages\torch\distributions\utils.py
# old line
# tensor_idxs = [i for i in range(len(values)) if values[i].__class__.__name__ == 'Tensor']
# new line
tensor_idxs = [i for i in range(len(values)) if isinstance(values[i], torch.Tensor)]
  • Create an IPython kernel for the ddpg environment
	python -m ipykernel install --user --name ddpg --display-name "ddpg"
jupyter notebook
  • Once started, change the kernel through the menu Kernel>Change kernel>ddpg
  • If necessary, inside the ipynb files, change the path to the unity environment appropriately

About

About solving the reacher from Unity using a variation of a Deep Deterministic Policy Gradients with parallel agents and Proximal Policy Optimization with Generalized Advantage Estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published