Deep Reinforcement Learning - Continuous Control Project

Implementation of continuous action-space Proximal Policy Optimization (PPO) agent for "Continuous Control" project in Udacity's Deep Reinforcement Learning Nanodegree.

By Sebastian Castro, 2020

Project Introduction

This project uses the Reacher environment from Unity ML-Agents.

This environment consists of 20 identical simulated robot arms which must place their end effector inside spheres that move around them. The spheres, which are normally blue, are colored green when the arms are positioned inside them. The arms have two joints with 2 degrees of freedom each, which can be actuated with torques.

The specifics of the environment are:

State: 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm.
Actions: A vector with 4 elements, with each element corresponding to joint torques that can have any continuous value between -1.0 and 1.0.
Reward: The agent receives +0.1 reward each time step that the arm's end effector is inside the target goal location defined by the sphere around it.

As per the project specification, an agent is considered to have "solved" the problem if the average reward over all the agents exceeds 30 by the end of an episode.

To see more details about the PPO agent implementation, and training results, refer to the Report included in this repository.

Getting Started

To get started with this project, first you should perform the setup steps in the Udacity Deep Reinforcement Learning Nanodegree Program GitHub repository. Namely, you should

Install Conda and create a Python 3.6 virtual environment
Install OpenAI Gym
Clone the Udacity repo and install the Python requirements included
Download the Reacher Unity files appropriate for your operating system and architecture (Linux, Mac OSX, Win32, Win64)

Once you have performed this setup, you should be ready to run the reacher_ppo.ipynb Jupyter Notebook in this repo. This notebook contains all the steps needed to define and train a DQN Agent to solve this environment.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
media		media
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Report.md		Report.md
reacher_ppo.ipynb		reacher_ppo.ipynb
trained_weights.pth		trained_weights.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

media

media

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Report.md

Report.md

reacher_ppo.ipynb

reacher_ppo.ipynb

trained_weights.pth

trained_weights.pth

Repository files navigation

Deep Reinforcement Learning - Continuous Control Project

Project Introduction

Getting Started

About

Releases

Packages

Languages

License

sea-bass/drlnd-control-project

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning - Continuous Control Project

Project Introduction

Getting Started

About

Resources

License

Stars

Watchers

Forks

Languages