This projects applies Synaptic Intelligence (https://arxiv.org/abs/1703.04200) to batch-RL. A neural network is used to approximate Q-value. This network is learnt using batch-RL with experience replay, regularized with Synaptic Intelligence.
Synaptic Intelligence reduces the amount of experience required per batch. It also improves the rate of convergence of batch-RL.