Skip to content

sea-bass/drlnd-multiagent-project

Repository files navigation

Deep Reinforcement Learning - Collaboration and Competition Project

In this notebook, we have implemented the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm for the "Collaboration and Competition" project of the Udacity Deep Reinforcement Learning Nanodegree program.

By Sebastian Castro, 2020


Project Introduction

This project uses a version of the Tennis environment in Unity ML-Agents.

This environment consists of two tennis players, or agents, each of which has its own local set of observations, actions, and rewards. The specifics are discussed below, but the environment is structured such that a "good" game consists of an infinite volley where both players are constantly hitting the ball back to each other without scoring.

Trained Agents Playing Tennis

The reinforcement learning specifics for each agent are:

  • State: 24 variables (8 observations stacked for 3 subsequent time steps) corresponding to position and velocity of the ball and racket.
  • Actions: A vector with 2 elements -- one for moving towards/away from the net and another for jumping. Both are continuous variables between -1.0 and 1.0.
  • Reward: The agent receives +0.1 reward each time it hits the ball over the net, and -0.01 if it lets a ball hit the ground or go out of bounds. This is what incentivizes the agents to play forever rather than scoring, unlike your typical game of tennis.

As per the project specification, both agents are considered to have "solved" the problem if the maximum return of the 2 agents is greater than 0.5 over a sustained 100-episode average.

To see more details about the MADDPG agent implementation, network and training hyper parameters, and results, refer to the Report included in this repository.


Getting Started

To get started with this project, first you should perform the setup steps in the Udacity Deep Reinforcement Learning Nanodegree Program GitHub repository. Namely, you should

  1. Install Conda and create a Python 3.6 virtual environment
  2. Install OpenAI Gym
  3. Clone the Udacity repo and install the Python requirements included
  4. Download the Tennis Unity files appropriate for your operating system and architecture (Linux, Mac OSX, Win32, Win64)

Once you have performed this setup, you should be ready to run the tennis_maddpg.ipynb Jupyter Notebook in this repo. This notebook contains all the steps needed to define and train MADDPG agents to solve this environment.

About

Implementation of multi-agent DDPG algorithm for Udacity's Deep Reinforcement Learning Nanodegree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published