Implementation of Double DQN with Pytorch.
in case if you understand persian and wanna find out more, checkout my Virgool post :D
clone project with:
git clone git@github.com:mhyrzt/D2QN.git
for training D2QN run following command in terminal:
python trainer.py
for running a simulation with trained ANN:
python play.py
for storing, plotting & logging history of rewards and epsilon.
add
→ for adding new values for reward and epsilonlog
→for logging last reward and episode from arrays.plot
→ for plotting epsilon and reward arrays
the main purpose of this class is to implement Epsilon-Greedy for exploration and exploitation. it takes two arguments:
- gym environment: for taking random action.
- torch ANN model: to predict best action.
epsilon = Epsilon(env, model)
-
_rand
→ this method generate a random floating number in range of 0 and 1. -
get_action
→ predict best action based on ANN model. -
take_action
→ based on random number return a random action or the best action from model. -
decrease
→decrease amount of epsilon by multiplying it with a constant.
for storing info and stats of each step also known as experience:
- Current State
- Action
- Reward
- Next State
- Is Terminal (done)
as arguments it takes two number:
-
max_len → maximum number of experience to store.
-
batch_size → number of experience for random sampling.
buffer = ReplyBuffer(5_000, 128)
add
→ sotre a new experience.sample
→ random sampling withself.batch_size
can_sample
→ check if sampling is possible or not.
our ANN model for predicting Q-values. as arguments it takes 3 parameters:
- shape of state which represent input dimension.
- number of possible action which represents output dimension.
- an array of numbers which represent hidden layers and their size.
model = Model(4, 2, (32, 32, 32))
copy
→ create a copy from model and return itsave
→ save model to a file.load
→ load model from file.
main implementation of D2QN which utilize all classes above to make it work