Skip to content

This project uses the pytorch package to implement DQN and DDPG models to automate the LunarLander-v2 and LunarLanderContinuous-v2 games.

Notifications You must be signed in to change notification settings

secondlevel/Deep-Q-Network-and-Deep-Deterministic-Policy-Gradient

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep-Q-Network-and-Deep-Deterministic-Policy-Gradient (Deep Learning and Practice homework 6)

The demo video can be seen in this link

You can get some detailed introduction and experimental results in this link.

This task is to implement two deep reinforcement algorithms by completing the following two tasks:

(1) solve LunarLander-v2 using deep Q-network (DQN)

(2) solve LunarLanderContinuous-v2 using deep deterministic policy gradient (DDPG).


Hardware

Operating System: Windows 10

CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

GPU: NVIDIA GeForce GTX TITAN X

Requirement

In this work, you can use the following two option to build the environment.

First option (recommend)

$ conda env create -f environment.yml

Second option

$ conda create --name Summer python=3.8 -y
$ conda activate Summer
$ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch
$ conda install numpy
$ conda install matplotlib -y 
$ conda install pandas -y
$ pip install torchsummary
$ pip install gym

System Architecture

You can see the detailed algorithm description in DQN and DDPG.

Directory Tree

In this project, all you need to do is to git clone this respository.

You don't need to download another file.

├─ dqn-example.py
├─ ddpg-example.py
├─ dqn.pth
├─ ddpg.pth
├─ environment.yml
├─ report.pdf
└─ README.md

Training

In the training step, you can train two different model like DQN and DDPG.

DQN

There have two step to train the DQN model.

The first step is config the DQN training parameters through the following argparse.

## arguments ##
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('-d', '--device', default='cuda')
parser.add_argument('-m', '--model', default='dqn.pth')
parser.add_argument('--logdir', default='log/dqn')
# train
parser.add_argument('--warmup', default=10000, type=int)
parser.add_argument('--episode', default=2000, type=int)
parser.add_argument('--capacity', default=10000, type=int)
parser.add_argument('--batch_size', default=128, type=int)
parser.add_argument('--lr', default=.0005, type=float)
parser.add_argument('--eps_decay', default=.995, type=float)
parser.add_argument('--eps_min', default=.01, type=float)
parser.add_argument('--gamma', default=.99, type=float)
parser.add_argument('--freq', default=4, type=int)
parser.add_argument('--target_freq', default=1000, type=int)
# test
parser.add_argument('--test_only', action='store_true')
parser.add_argument('--render', action='store_true')
parser.add_argument('--seed', default=20200519, type=int)
parser.add_argument('--test_epsilon', default=.001, type=float)
args = parser.parse_args()

The second step is run the command below.

python dqn-example.py --test_only

DDPG

There have two step to train the DDPG model.

The first step is config the DDPG training parameters through the following argparse.

## arguments ##
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('-d', '--device', default='cuda')
parser.add_argument('-m', '--model', default='ddpg.pth')
parser.add_argument('--logdir', default='log/ddpg')
# train
parser.add_argument('--warmup', default=50000, type=int)
parser.add_argument('--episode', default=2800, type=int)
parser.add_argument('--batch_size', default=128, type=int)
parser.add_argument('--capacity', default=500000, type=int)
parser.add_argument('--lra', default=1e-3, type=float)
parser.add_argument('--lrc', default=1e-3, type=float)
parser.add_argument('--gamma', default=.99, type=float)
parser.add_argument('--tau', default=.005, type=float)
# test
parser.add_argument('--test_only', action='store_true')
parser.add_argument('--render', action='store_true')
parser.add_argument('--seed', default=20200519, type=int)
args = parser.parse_args()

The second step is run the command below.

python ddpg-example.py --test_only

Testing

You can get some detailed introduction and experimental results in this link.

In the training step, you also can evaluate two different model like DQN and DDPG.

DQN

There have two step to evaluate the DQN model.

The first step is config the DQN testing parameters(same to training) through the following argparse. Especially with the evaluate model name dqn.pth.

## arguments ##
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('-d', '--device', default='cuda')
parser.add_argument('-m', '--model', default='dqn.pth')
parser.add_argument('--logdir', default='log/dqn')
# train
parser.add_argument('--warmup', default=10000, type=int)
parser.add_argument('--episode', default=2000, type=int)
parser.add_argument('--capacity', default=10000, type=int)
parser.add_argument('--batch_size', default=128, type=int)
parser.add_argument('--lr', default=.0005, type=float)
parser.add_argument('--eps_decay', default=.995, type=float)
parser.add_argument('--eps_min', default=.01, type=float)
parser.add_argument('--gamma', default=.99, type=float)
parser.add_argument('--freq', default=4, type=int)
parser.add_argument('--target_freq', default=1000, type=int)
# test
parser.add_argument('--test_only', action='store_true')
parser.add_argument('--render', action='store_true')
parser.add_argument('--seed', default=20200519, type=int)
parser.add_argument('--test_epsilon', default=.001, type=float)
args = parser.parse_args()

The second step is run the command below.

python dqn-example.py

DDPG

There have two step to evaluate the DDPG model.

The first step is config the DDPG testing parameters(same to training) through the following argparse. Especially with the evaluate model name ddpg.pth.

## arguments ##
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('-d', '--device', default='cuda')
parser.add_argument('-m', '--model', default='ddpg.pth')
parser.add_argument('--logdir', default='log/ddpg')
# train
parser.add_argument('--warmup', default=50000, type=int)
parser.add_argument('--episode', default=2800, type=int)
parser.add_argument('--batch_size', default=128, type=int)
parser.add_argument('--capacity', default=500000, type=int)
parser.add_argument('--lra', default=1e-3, type=float)
parser.add_argument('--lrc', default=1e-3, type=float)
parser.add_argument('--gamma', default=.99, type=float)
parser.add_argument('--tau', default=.005, type=float)
# test
parser.add_argument('--test_only', action='store_true')
parser.add_argument('--render', action='store_true')
parser.add_argument('--seed', default=20200519, type=int)
args = parser.parse_args()

The second step is run the command below.

python ddpg-example.py

Evaluate Result

Then you will get the best result like this, each of the values were the average reward in ten times.

DQN DDPG
average reward 269.35 285.51

Reference

About

This project uses the pytorch package to implement DQN and DDPG models to automate the LunarLander-v2 and LunarLanderContinuous-v2 games.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages