Skip to content

sea-bass/drlnd-control-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Reinforcement Learning - Continuous Control Project

Implementation of continuous action-space Proximal Policy Optimization (PPO) agent for "Continuous Control" project in Udacity's Deep Reinforcement Learning Nanodegree.

By Sebastian Castro, 2020


Project Introduction

This project uses the Reacher environment from Unity ML-Agents.

This environment consists of 20 identical simulated robot arms which must place their end effector inside spheres that move around them. The spheres, which are normally blue, are colored green when the arms are positioned inside them. The arms have two joints with 2 degrees of freedom each, which can be actuated with torques.

Environment animation

The specifics of the environment are:

  • State: 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm.
  • Actions: A vector with 4 elements, with each element corresponding to joint torques that can have any continuous value between -1.0 and 1.0.
  • Reward: The agent receives +0.1 reward each time step that the arm's end effector is inside the target goal location defined by the sphere around it.

As per the project specification, an agent is considered to have "solved" the problem if the average reward over all the agents exceeds 30 by the end of an episode.

To see more details about the PPO agent implementation, and training results, refer to the Report included in this repository.


Getting Started

To get started with this project, first you should perform the setup steps in the Udacity Deep Reinforcement Learning Nanodegree Program GitHub repository. Namely, you should

  1. Install Conda and create a Python 3.6 virtual environment
  2. Install OpenAI Gym
  3. Clone the Udacity repo and install the Python requirements included
  4. Download the Reacher Unity files appropriate for your operating system and architecture (Linux, Mac OSX, Win32, Win64)

Once you have performed this setup, you should be ready to run the reacher_ppo.ipynb Jupyter Notebook in this repo. This notebook contains all the steps needed to define and train a DQN Agent to solve this environment.

About

Implementation of continuous action space PPO agent for Udacity's Deep Reinforcement Learning Nanodegree

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published