This repository holds deep RL solutions for solving OpenAI's gym environments: TAXI and ACROBOT.
All the files below have arguments which can be changed (but all set by default to our choice of parameters).
To see all arguments for each script run: <SCRIPT NAME>.py --help
Example for running a script: python train_taxi_dqn.py
To train the model that solves the TAXI environment, run the following scripts:
Using the DQN method: train_taxi_dqn.py
Using the Vanilla Policy Gradient method: train_taxi_pg.py
The models architecture is specified in: model_taxi.py
To test the models that solves the TAXI environment, run the following scripts:
DQN: eval_taxi_dqn.py
Vanilla Policy Gradient: eval_taxi_pg.py
These scripts use the saved models: dqn_taxi_model.pkl
and pg_taxi_model.pkl
.
To see the accumulated reward graphs, aas a function of training episode, use the data in:
DQN: eval_reward_dqn_taxi.npy
Policy Gradients: eval_reward_pg_taxi.npy
These files hold the accumulated reward achieved for evaluation, when we ran evaluation every 10 training episodes.
Plots can be seen using: plot.py --path <PATH TO NPY FILE>
As in taxi, we provide a train script (using DQN) and a test script:
Train script: train_acrobot.py
Test script: eval_reward_pg_taxi.npy
The model trained is acrobot_model.pkl
, and the architecture is in model_acrobot.py
.
To see the accumulated reward graphs, use the data in:
Test: acrobot_reward_eval.npy
Train: acrobot_reward_train.npy
The test file holds the accumulated reward achieved for evaluation, when we ran evaluation every 20 training episodes.
The train file holds the accumulated reward achieved for every train episode.