Deep RL - DQN and Policy Gradient

This repository holds deep RL solutions for solving OpenAI's gym environments: TAXI and ACROBOT.

Scripts Usage:

All the files below have arguments which can be changed (but all set by default to our choice of parameters).
To see all arguments for each script run: <SCRIPT NAME>.py --help
Example for running a script: python train_taxi_dqn.py

Taxi environment:

Train scripts:

To train the model that solves the TAXI environment, run the following scripts:
Using the DQN method: train_taxi_dqn.py
Using the Vanilla Policy Gradient method: train_taxi_pg.py

The models architecture is specified in: model_taxi.py

Evaluation scripts:

To test the models that solves the TAXI environment, run the following scripts:
DQN: eval_taxi_dqn.py
Vanilla Policy Gradient: eval_taxi_pg.py

These scripts use the saved models: dqn_taxi_model.pkl and pg_taxi_model.pkl.

To see the accumulated reward graphs, aas a function of training episode, use the data in:
DQN: eval_reward_dqn_taxi.npy
Policy Gradients: eval_reward_pg_taxi.npy
These files hold the accumulated reward achieved for evaluation, when we ran evaluation every 10 training episodes. Plots can be seen using: plot.py --path <PATH TO NPY FILE>

Acrobot:

As in taxi, we provide a train script (using DQN) and a test script:
Train script: train_acrobot.py
Test script: eval_reward_pg_taxi.npy

The model trained is acrobot_model.pkl, and the architecture is in model_acrobot.py.

To see the accumulated reward graphs, use the data in:
Test: acrobot_reward_eval.npy
Train: acrobot_reward_train.npy
The test file holds the accumulated reward achieved for evaluation, when we ran evaluation every 20 training episodes.
The train file holds the accumulated reward achieved for every train episode.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
README.md		README.md
acrobot.gif		acrobot.gif
acrobot_model.pkl		acrobot_model.pkl
acrobot_reward_eval.npy		acrobot_reward_eval.npy
acrobot_reward_train.npy		acrobot_reward_train.npy
dqn_taxi_model.pkl		dqn_taxi_model.pkl
eval_acrobot.py		eval_acrobot.py
eval_reward_dqn_taxi.npy		eval_reward_dqn_taxi.npy
eval_reward_pg_taxi.npy		eval_reward_pg_taxi.npy
eval_taxi_dqn.py		eval_taxi_dqn.py
eval_taxi_pg.py		eval_taxi_pg.py
model_acrobot.py		model_acrobot.py
model_taxi.py		model_taxi.py
pg_taxi_model.pkl		pg_taxi_model.pkl
plot.py		plot.py
train_acrobot.py		train_acrobot.py
train_taxi_dqn.py		train_taxi_dqn.py
train_taxi_pg.py		train_taxi_pg.py
utils.py		utils.py

orilinial/RL-DQN

Folders and files

Latest commit

History

Repository files navigation

Deep RL - DQN and Policy Gradient

Scripts Usage:

Taxi environment:

Train scripts:

Evaluation scripts:

Acrobot:

Example for acrobot results:

About

Resources

Stars

Watchers

Forks

Languages