Fixes two bugs with the replay buffer:
- If the batch size did not divide the buffer size cleanly, the buffer would not be regarded as full, and would ignore many valid transitions in sampling
- Sampling is now without replacement, as documented in the docstring