Skip to content

orilinial/RL-DQN

Repository files navigation

Deep RL - DQN and Policy Gradient

This repository holds deep RL solutions for solving OpenAI's gym environments: TAXI and ACROBOT.

Scripts Usage:

All the files below have arguments which can be changed (but all set by default to our choice of parameters).
To see all arguments for each script run: <SCRIPT NAME>.py --help
Example for running a script: python train_taxi_dqn.py

Taxi environment:

Train scripts:

To train the model that solves the TAXI environment, run the following scripts:
Using the DQN method: train_taxi_dqn.py
Using the Vanilla Policy Gradient method: train_taxi_pg.py

The models architecture is specified in: model_taxi.py

Evaluation scripts:

To test the models that solves the TAXI environment, run the following scripts:
DQN: eval_taxi_dqn.py
Vanilla Policy Gradient: eval_taxi_pg.py

These scripts use the saved models: dqn_taxi_model.pkl and pg_taxi_model.pkl.

To see the accumulated reward graphs, aas a function of training episode, use the data in:
DQN: eval_reward_dqn_taxi.npy
Policy Gradients: eval_reward_pg_taxi.npy
These files hold the accumulated reward achieved for evaluation, when we ran evaluation every 10 training episodes. Plots can be seen using: plot.py --path <PATH TO NPY FILE>

Acrobot:

As in taxi, we provide a train script (using DQN) and a test script:
Train script: train_acrobot.py
Test script: eval_reward_pg_taxi.npy

The model trained is acrobot_model.pkl, and the architecture is in model_acrobot.py.

To see the accumulated reward graphs, use the data in:
Test: acrobot_reward_eval.npy
Train: acrobot_reward_train.npy
The test file holds the accumulated reward achieved for evaluation, when we ran evaluation every 20 training episodes.
The train file holds the accumulated reward achieved for every train episode.


Example for acrobot results:

alt text

About

Deep RL - DQN and Policy Gradient

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages