Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why custom ringbuffer used instead of deque? #165

Closed
kirkscheper opened this issue Jan 31, 2018 · 16 comments
Closed

Why custom ringbuffer used instead of deque? #165

kirkscheper opened this issue Jan 31, 2018 · 16 comments
Labels

Comments

@kirkscheper
Copy link

kirkscheper commented Jan 31, 2018

https://github.com/matthiasplappert/keras-rl/blob/d9e3b64a20f056030c02bfe217085b7e54098e48/rl/memory.py#L42 states:
Do not use deque to implement the memory. This data structure may seem convenient but it is way too slow on random access. Instead, we use our own ring buffer implementation.

Not sure why this is stated. I ran this quick test:
`
ring = RingBuffer(maxlen = 1000)

start_time = time.time()
for i in range(100000):
    ring.append(i)
print("--- %s seconds ---" % (time.time() - start_time))

d = deque(maxlen=1000)
start_time = time.time()
for i in range(100000):
    d.append(i)
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
for i in range(100000):
    l = ring[i%1000-1]
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
for i in range(100000):
    l = d[i%1000]
print("--- %s seconds ---" % (time.time() - start_time))

and got this result:
--- 0.0608789920807 seconds ---
--- 0.00712609291077 seconds ---
--- 0.0430929660797 seconds ---
--- 0.00617289543152 seconds ---
`

You can see that the custom ringbuffer is significantly slower than the deque at adding or removing. Can someone explain to me what the comment is referring to exactly and why the ringbuffer is used?

@matthiasplappert
Copy link
Collaborator

Was this benchmark done using Python 3 or Python 2.7? I remember that I had trouble with deque and Python 2.7 in the past, but I may be wrong.

@kirkscheper
Copy link
Author

It was with python 2.7, with python 3 I get this:
--- 0.07599091529846191 seconds ---
--- 0.01183772087097168 seconds ---
--- 0.060945987701416016 seconds ---
--- 0.014749526977539062 seconds ---

Here the deque is a bit slower but still much faster then the ring buffer.

@matthiasplappert
Copy link
Collaborator

Could you do a benchmark with truly random access? It seems like you go through the buffer in sequential order. Instead, try something where you randomly draw indices.

If this test turns out to show the same trend, feel free to remove the custom RingBuffer with deque.

@kirkscheper
Copy link
Author

kirkscheper commented Feb 5, 2018

with this code:

idxs = []
r = xrange(0, 1000)
for i in range(100000):
  idxs.append(random.sample(r, 1))
  
ring = RingBuffer(maxlen = 1000)
start_time = time.time()
for i in range(100000):
    ring.append(i)
print("--- %s seconds ---" % (time.time() - start_time))

d = deque(maxlen=1000)
start_time = time.time()
for i in range(100000):
    d.append(i)
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
for i in range(100000):
    l = ring[idxs[i][0]]
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
for i in range(100000):
    l = d[idxs[i][0]]
print("--- %s seconds ---" % (time.time() - start_time))

I get this for python2.7:
--- 0.0610370635986 seconds ---
--- 0.00898694992065 seconds ---
--- 0.0499200820923 seconds ---
--- 0.0144419670105 seconds ---

and for python 3:
--- 0.07166552543640137 seconds ---
--- 0.012346744537353516 seconds ---
--- 0.06459164619445801 seconds ---
--- 0.015827417373657227 seconds ---

Indeed the random access in Python2.7 is about half the speed of sequential access but still faster than the ring buffer.

@matthiasplappert
Copy link
Collaborator

Cool, feel free to update the code to use deque. I'm not sure anymore why I thought deque was too slow but it seems like this was clearly not true.

@random-user-x
Copy link
Contributor

@matthiasplappert, @kirkscheper Please let me know if anyone is working on this. I will be glad to finish this as soon as possible.

@RaphaelMeudec
Copy link
Contributor

@Luffy1996 Feel free to work on this, I don't think anyone it tackling this issue at the moment!

@random-user-x
Copy link
Contributor

@RaphaelMeudec Give me a week. My semester exams are going on. I would love to work on this from next week.

@RaphaelMeudec
Copy link
Contributor

No pressure, good luck on your exams!

@buckleytoby
Copy link

hey @kirkscheper ! I'm interested in the deque implementation, do you have any progress from above?

@kirkscheper
Copy link
Author

All my changes are here: https://github.com/kirkscheper/keras-rl/tree/recurrent-ddpg

I have been playing around with lots of different things that are probably not directly ready for a pull request.

@stale
Copy link

stale bot commented Jan 5, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 5, 2019
@stale stale bot closed this as completed Jan 12, 2019
@sperazza
Copy link

the comments above, with performance testing are correct, deque is faster, for appends. but the slowdown comes with random.sample(dq,batch_size), when you are sampling a large buffer. the implementation given for RingBuffer (via SequentialMemory), in both python 2.7 and 3+ is about 10x faster for Sampling.

my performance runs for sampling( size 1,000,000) were:
random sampling of deque: 5.98 secs
random sampling of RingBuffer .574 secs

so... use the RingBuffer(SequentialMemory class), it samples way faster, and is on about the same magnitude for append().

@matthiasplappert
Copy link
Collaborator

Ah yes, I remember that this was the reason for creating this custom implementation since fast random sampling is very important.

Thanks for looking into it.

@UPUPGOO
Copy link

UPUPGOO commented Mar 17, 2019

@sperazza I did an experiment as you said, but I got an opposite result.
With this code:

obs = np.resize(np.arange(9), (3, 3)).astype(np.uint8)
a = 0
r = 1.0
done = False

ring = SequentialMemory(limit=10000,window_length=4)
start_time = time.time()
for i in range(10000):
    ring.append(obs, a, r, done)
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
for i in range(10000):
    ring.sample(batch_size=32)
print("--- %s seconds ---" % (time.time() - start_time))

And I got this for RingBuffer:
--- 0.039894819259643555 seconds ---
--- 15.629229307174683 seconds ---
When I replace RingBuffer with deque in SequentialMemory class like this:

self.actions = deque(maxlen=limit)
self.rewards = deque(maxlen=limit)
self.terminals = deque(maxlen=limit)
self.observations = deque(maxlen=limit)

I got this:
--- 0.031911611557006836 seconds ---
--- 9.875584602355957 seconds ---
I run code with Python3.5
So I am confused. Is there some problems with my code?
Thanks.

@clarisli
Copy link

clarisli commented Jun 5, 2019

@UPUPGOO if you read the RingBuffer class

keras-rl/rl/memory.py

Lines 45 to 83 in e6efb0d

class RingBuffer(object):
def __init__(self, maxlen):
self.maxlen = maxlen
self.data = deque(maxlen=maxlen)
def __len__(self):
return self.length()
def __getitem__(self, idx):
"""Return element of buffer at specific index
# Argument
idx (int): Index wanted
# Returns
The element of buffer at given index
"""
if idx < 0 or idx >= self.length():
raise KeyError()
return self.data[idx]
def append(self, v):
"""Append an element to the buffer
# Argument
v (object): Element to append
"""
self.data.append(v)
def length(self):
"""Return the length of Deque
# Argument
None
# Returns
The lenght of deque element
"""
return len(self.data)

and the SequentialMemory class

keras-rl/rl/memory.py

Lines 158 to 195 in e6efb0d

class SequentialMemory(Memory):
def __init__(self, limit, **kwargs):
super(SequentialMemory, self).__init__(**kwargs)
self.limit = limit
# Do not use deque to implement the memory. This data structure may seem convenient but
# it is way too slow on random access. Instead, we use our own ring buffer implementation.
self.actions = RingBuffer(limit)
self.rewards = RingBuffer(limit)
self.terminals = RingBuffer(limit)
self.observations = RingBuffer(limit)
def sample(self, batch_size, batch_idxs=None):
"""Return a randomized batch of experiences
# Argument
batch_size (int): Size of the all batch
batch_idxs (int): Indexes to extract
# Returns
A list of experiences randomly selected
"""
# It is not possible to tell whether the first state in the memory is terminal, because it
# would require access to the "terminal" flag associated to the previous state. As a result
# we will never return this first state (only using `self.terminals[0]` to know whether the
# second state is terminal).
# In addition we need enough entries to fill the desired window length.
assert self.nb_entries >= self.window_length + 2, 'not enough entries in the memory'
if batch_idxs is None:
# Draw random indexes such that we have enough entries before each index to fill the
# desired window length.
batch_idxs = sample_batch_indexes(
self.window_length, self.nb_entries - 1, size=batch_size)
batch_idxs = np.array(batch_idxs) + 1
assert np.min(batch_idxs) >= self.window_length + 1
assert np.max(batch_idxs) < self.nb_entries
assert len(batch_idxs) == batch_size

Note that the RingBuffer has been updated using a deque, and SequentialMemory is using RingBuffer, so which means both are using deques.

Now if you read the code from this function - sample_batch_indexes()

keras-rl/rl/memory.py

Lines 14 to 42 in e6efb0d

def sample_batch_indexes(low, high, size):
"""Return a sample of (size) unique elements between low and high
# Argument
low (int): The minimum value for our samples
high (int): The maximum value for our samples
size (int): The number of samples to pick
# Returns
A list of samples of length size, with values between low and high
"""
if high - low >= size:
# We have enough data. Draw without replacement, that is each index is unique in the
# batch. We cannot use `np.random.choice` here because it is horribly inefficient as
# the memory grows. See https://github.com/numpy/numpy/issues/2764 for a discussion.
# `random.sample` does the same thing (drawing without replacement) and is way faster.
try:
r = xrange(low, high)
except NameError:
r = range(low, high)
batch_idxs = random.sample(r, size)
else:
# Not enough data. Help ourselves with sampling from the range, but the same index
# can occur multiple times. This is not good and should be avoided by picking a
# large enough warm-up phase.
warnings.warn('Not enough entries to sample without replacement. Consider increasing your warm-up phase to avoid oversampling!')
batch_idxs = np.random.random_integers(low, high - 1, size=size)
assert len(batch_idxs) == size
return batch_idxs

You will notice that now the code has been updated to always sampling from a list, so you were not doing the same deque/list sampling comparison as @sperazza did up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants