Deep RL - Hindsight Experience Replay (HER)

This repository holds deep RL solutions for solving the bit flipping enviroment using the hindsight experience replay.
View the original paper here.

Bit-Flip Environment

In this environment, we are given with a starting state with is a binary vector of size n, and a goal state of size n.
In each action, the user can flip one of the bits in the current state. For each step, the user gets a reward '-1',
and for a step which makes the current state equal to the goal, the user gets a reward '0'.
The environment is written in the bit_flip_env.py file.

Bit-Flip Dynamic Environment

To check the ability of HER to deal with dynamic environments, we added this option to the bit flipping domain.
This means that with every step the user makes, with probability 0.3, one of the goal's bits would flip,
making it harder to predict. The goal's flipped bit is chosen with uniform probability.

Hindsight Experience Replay (HER)

The algorithm, described in details here by Andrychowicz et al. can deal with sparse binary rewards (as we get in the bit flipping domain.
The problem with sparse rewards, is that for very large state spaces, we might never get a succesful episode, making it very hard to learn.
In this algorithm, we create new "fake" episodes from unsuccesful ones, by chaging their original goal to one of the states they actually reached.
This way, we add successes to the experience replay buffer, and can learn from them. It is basically the same as learning from mistakes.

Hindsight Experience Replay with Dynamical goals (DHER)

The concept here is very similar to HER, and is described here by Fang et al.
This algorithm takes also into account that the goal made some trasitions over time, and uses its trajectory to learn how to reach it.

Scripts Usage:

All the files below have arguments which can be changed (but all set by default to our choice of parameters).
To see all arguments for each script run: <SCRIPT NAME>.py --help
Example for running a script: python main.py

Train scripts:

To train the model that solves the bit flipping environment, run the following scripts: main.py
. Note that the argument --state-size <NUMBER> is neccesary, in order to see the effect of the different sizes on the model.
Adding the argument --HER or --DHER would use the respective algorithms.
Adding the argument --dynamic would use the dynamical mode of the environment. The models architecture is specified in: dqn.py

Evaluation scripts:

To test the models run the following scripts: evaluate_model.py with the relevant --state-size argument.
We added a trained model in the bit_flip_model.pkl file, with the size n=10.

Results

In the above figure, we show how the state size affects the success ratio of the different algorithms.
As can be seen, using HER, allows us to overcome the binary sparse reward problem, and maintain high success rate even for very high state spaces.
This even works when compared to the normal DQN, with added Reward Shaping.

Example

In the following example, we can ovserve how the domain is solved step by step using the HER algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Example.png		Example.png
README.md		README.md
Results.png		Results.png
bit_flip_env.py		bit_flip_env.py
bit_flip_model.pkl		bit_flip_model.pkl
dqn.py		dqn.py
evaluate_model.py		evaluate_model.py
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep RL - Hindsight Experience Replay (HER)

Bit-Flip Environment

Bit-Flip Dynamic Environment

Hindsight Experience Replay (HER)

Hindsight Experience Replay with Dynamical goals (DHER)

Scripts Usage:

Train scripts:

Evaluation scripts:

Results

Example

About

Releases

Packages

Languages

orilinial/RL-HER

Folders and files

Latest commit

History

Repository files navigation

Deep RL - Hindsight Experience Replay (HER)

Bit-Flip Environment

Bit-Flip Dynamic Environment

Hindsight Experience Replay (HER)

Hindsight Experience Replay with Dynamical goals (DHER)

Scripts Usage:

Train scripts:

Evaluation scripts:

Results

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages