unity-ml-reacher

This repository contains an implementation of reinforcement learning based on: * DDPG but using parallel agents to solve the unity reacher environment * Proximal Policy Optimization with a Critic Network as a baseline and with a Generalized Advantage Estimation It has a 20 double-jointed arms. Each one has to reach a target. Whenever one arm reaches its target, a reward of up to +0.1 is received. This environment is simliar to the reacher of Unity.
The action space is continuous [-1.0, +1.0] and consists of 4 values for 4 torques to be applied to the two joints.
The environment is considered as solved if the average score of the 20 agents is +30 for 100 consecutive episodes.
A video of a trained agent can be found by clicking on the image here below

DDPG:
PPO:

Content of this repository

report.pdf: a document that describes the details of implementation of the DDPG, along with ideas for future work
report-ppo.pdf: a document that describes the details of implementation of the PPO
folder agents: contains the implementation of
- a parallel DDPG using one network shared by all agents
- a parallel DDPG with multiple network
- an Actor-Critic network model using tanh as activation
- a ReplayBuffer
- an ActionNoise that disturb the output of the actor network to promote exploration
- a ParameterNoise that disturb the weight of the actor network to promote exploration
- an Ornstein-Uhlenbeck noise generator
- an implementation of a Proximal Policy Optimization
- a Gaussian Actor Critic network for the PPO
folder started_to_converge: weights of a network that started to converge but slowly
folder weights:
- weights of the network trained with DDPG that solved this environment. It contains as well the history of the weights.
- weights of the Gaussian Actor Critic Network that solved this environment with PPO
folder research:
- Cozmo25 customized the source code of ShangTong to solve the reacher using PPO
- this folder contains one file all.py that has only the code necessary by the PPO
- compare.ipynb to compare the performance between that implementation and the ppo.py of this repository
Notebooks
- jupyter notebook Continuous_Control.ipynb: run this notebook to train the agents using DDPG
- jupyter notebook noise.ipynb: use this notebook to optimize the hyperparameter of the noise generator to check that its output would not limit the exploration
- jupyter notebook view.ipynb: a notebook that can load the different saved network weights trained with DDPG and visualize the agents
- jupyter notebook Continuous_Control-PPO.ipynb: a notebook to train an agent using PPO and then to view the trained agent

Requirements

To run the codes, follow the next steps:

Create a new environment:

Linux or Mac:

 conda create --name ddpg python=3.6
 source activate ddpg

Windows:

 conda create --name ddpg python=3.6 
 activate ddpg

Perform a minimal install of OpenAI gym
- If using Windows,
  - download swig for windows and add it the PATH of windows
  - install Microsoft Visual C++ Build Tools
- then run these commands
```
 pip install gym
 pip install gym[classic_control]
 pip install gym[box2d]
```
Install the dependencies under the folder python/

	cd python
	pip install .

Fix an issue of pytorch 0.4.1 to allow backpropagate the torch.distribution.normal function up to its standard deviation parameter
- change the line 69 of Anaconda3\envs\drlnd\Lib\site-packages\torch\distributions\utils.py

# old line
# tensor_idxs = [i for i in range(len(values)) if values[i].__class__.__name__ == 'Tensor']
# new line
tensor_idxs = [i for i in range(len(values)) if isinstance(values[i], torch.Tensor)]

Create an IPython kernel for the ddpg environment

	python -m ipykernel install --user --name ddpg --display-name "ddpg"

Download the Unity Environment (thanks to Udacity) which matches your operating system
Start jupyter notebook from the root of this python codes

jupyter notebook

Once started, change the kernel through the menu Kernel>Change kernel>ddpg
If necessary, inside the ipynb files, change the path to the unity environment appropriately

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unity-ml-reacher

Content of this repository

Requirements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
agents		agents
others		others
python		python
research		research
started_to_converge		started_to_converge
weights		weights
.gitignore		.gitignore
Continuous_Control-PPO-LeakyReLU.ipynb		Continuous_Control-PPO-LeakyReLU.ipynb
Continuous_Control-PPO.ipynb		Continuous_Control-PPO.ipynb
Continuous_Control.ipynb		Continuous_Control.ipynb
README.md		README.md
View.ipynb		View.ipynb
compare.ipynb		compare.ipynb
noise.ipynb		noise.ipynb
report.pdf		report.pdf
report_reacher-ppo.pdf		report_reacher-ppo.pdf
unity-environment.log		unity-environment.log

handria-ntoanina/unity-ml-reacher

Folders and files

Latest commit

History

Repository files navigation

unity-ml-reacher

Content of this repository

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages