Skip to content

Project #2 in Continuous Control - Udacity Deep Reinforcement Learning Nano Degree

Notifications You must be signed in to change notification settings

roeeSch/udacity_drlnd_p2_continuous_cntrl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Continuous Control DRL Project

This repository documents my solution to project #2 in the Udacity nano degree in Deep Reinforcement Learning. The project goal is to write a deep reinforcement learning algorithm which solves the reacher environment.

Overview

In this environment, a double-jointed arm is controlled to move its hand (blue ball at the tip of the arm) to a moving target location (green moving ball). A reward is provided for each step that the agent's hand is in the goal location. Thus, the goal of the agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm as well as the position of the target. Each action is a vector with 4 numbers, corresponding to torque applicable at each of the two joints. Every entry in the action vector should be a number between -1 and 1.

In the above animation you can see a similar environment with 10 trained agents (arms) tracking the target green spheres. The environment consisting of multiple agents is desirable when training.

The chosen algorithm

According to Proximal Policy Optimization Algorithms, the PPO algorithm studied in the course is a great choice for solving this environment as can be seen in the following figure:

Picture taken from Proximal Policy Optimization Algorithms

General Approach

Since finding code that solves the above environment is not a huge challenge and not the most educational, I chose to take advantage of this opportunity to study the affect of variable neural network size on performance and learning rate. The code in this repository is based on Jeremi Kaczmarczyk and ShangtongZhang git repository and most of the hyper-parameters were basically untouched and met the basis explained in PPO Hyperparameters and Ranges.

The details of the implementation and the test I made are summarized in the report.md (in this git repository). The main outcome is the following graph:

The graph shows the learning progession of different agents that differ from each other only by their neural-network hidden layer size.

One can see the hidden layer size effect on learning rate. What entreeged me the most is that light (small) neural networks reach the same final performance (more or less) as much heavier ones but in much more episodes.

If you would like to reproduce the results above and\or study other parameter affects (such as learning rate, network architecture, other hyper-parameters), follow the following steps.

Step 1: Activate the Environment

Follow the instructions in the DRLND GitHub repository to set up your Python environment. These instructions can be found in README.md at the root of the repository. By following these instructions, you will install PyTorch, the ML-Agents toolkit, and a few more Python packages required to complete the project.

(For Windows users) The ML-Agents toolkit supports Windows 10. While it might be possible to run the ML-Agents toolkit using other versions of Windows, it has not been tested on other versions. Furthermore, the ML-Agents toolkit has not been tested on a Windows VM such as Bootcamp or Parallels.

Install Additional packages:

$ pip install tqdm
$ pip install scipy

Step 2: Download the Unity Environment

For this project, you will not need to install Unity - this is because we have already built the environment for you, and you can download it from one of the links below. You need only select the environment that matches your operating system:

Version 1: One (1) Agent

Version 2: Twenty (20) Agents

Then unzip (or decompress) the file. Rename directory to Reacher_Linux_1agent or Reacher_Linux_multAgents .

(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

Step 3

Clone this repository and run the following.

To reproduce the results:

$python reacher-ppo/Solution.py --numEpisodes 300 --gridSrchHidden 32 64 96 128 160 192 224 256 288

For plotting former results:

$python reacher-ppo/Solution.py --pltLrn

For a deeper understanding of this project refer to the report.md file.

About

Project #2 in Continuous Control - Udacity Deep Reinforcement Learning Nano Degree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published