Deep Reinforcement Learning

This repository contains solutions to Deep Reinforcement Learning Problems. Different types of agents are utilitzed according to the properties of the problems.

Detailed Readmes can be found in each project folder.

Environments

There are three environments that agents were developed on. Each increase in level of difficulty, from discrete action space, to continous actions and finally to multi-agent problems.

Banana Environment

In this challenge a single agent has to collect yellow bananas, while avoiding purple ones.

More information in this folder

Continous Reacher Environment

In this challenge a single agent has to maintain it's end effector on a moving target. Each step that the end effector spends in the target location results in positive rewards.

More information in this folder

Multi-Agent continous Tennis Environment

In this environment two agents play tennis. Each agent receives positive rewards for hitting the ball over the net, and a smaller negative reward if the ball falls on their side.

More information in this folder

Agents

Deep Q-Learning Agent

This repo contains implementation of two DQN Agents in PyTorch:

a base Agent, with a Replay Buffer, a seperate target Q-Network, with a 2 hidden layer deep network
an Agent built on top of the base Agent, which utilizes Prioritized Replay.

Deep Deterministic Policy Gradient

This repo contains implementation of a DDPG Agent in PyTorch.

The DDPG architecture is considered by many to be an Actor-Critic method. In the learning step the agent selects a next step to calculate the Temporal Difference (which isbiased in regards of the actual value) with a Policy network (which has large variance in regards of the actual value). This way both bias and variance is decreased.

The DDPG agent was able to solve the environment under 250episodes.

Multi-Agent Deep Deterministic Policy Gradient

The DDPG Agent has been extended to support multi-agent environments.

The DDPG agent was able to solve the environment under 2000episodesand reached amaximumoverallscoreaveragedover 100 episodeof +2.05 by episode 2655. To speed up the hyperparameter tuning phase, an abstract training loop was utilized,that could calculate permutations of hyperparameters during evening hours when electricity cost are lower.

References

Environment and Agents were both based on starter codes from the Udacity Deep Reinforcement Learning Nanodegree. Github repo can be found here.

Special thanks to Miguel Morales, for writing such a comprehensive book on Deep Reinforcement Learning. I have taken many clarifications on theory and practice when developing the agents, his book can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DDPG Agent - Continous Multi-Agent Environment		DDPG Agent - Continous Multi-Agent Environment
DDPG Agent - Continous Reacher Environment		DDPG Agent - Continous Reacher Environment
DQN Agent - Navigation Environment		DQN Agent - Navigation Environment
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
banana_screenshot.png		banana_screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning

Table of contents

Environments

Banana Environment

Continous Reacher Environment

Multi-Agent continous Tennis Environment

Agents

Deep Q-Learning Agent

Deep Deterministic Policy Gradient

Multi-Agent Deep Deterministic Policy Gradient

References

About

Releases

Packages

Languages

License

szemyd/deep-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning

Table of contents

Environments

Banana Environment

Continous Reacher Environment

Multi-Agent continous Tennis Environment

Agents

Deep Q-Learning Agent

Deep Deterministic Policy Gradient

Multi-Agent Deep Deterministic Policy Gradient

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages