What's the best way to deal with variable episode length? #49

uduse · 2021-06-16T05:59:11Z

A solution provided in Acme pads all episodes to the maximum episode length, but I'm afraid that this will drastically reduce the performance if the episode length has a high variance. For example, if the maximum episode length is 300, while most episodes end at 15 steps, this could be a 20 folds decrease in data throughout. For my tasks, it's ok to always have a batch size of one (sampling one episode at a time), would this change the requirement of all items must have the same shape?

What are the best ways to deal with this situation?

Here are some ideas that came to my mind but I'm not sure if they can solve the problem:

create multiple tables corresponds to various padding length, and insert to different tables based the actual episode lengths
used delta-encoding

Related: #19 #47

ebrevdo · 2021-06-23T19:22:56Z

You can create one table which contains variable lengths. Both the old Writer/Sampler and the new Trajectory Writer/Sampler allow it (if you provide a signature to the dataset, simply set the outer/time dimension to None in Python).

But now you additionally have the problem of batching and training from a stream of variable episode lengths.

You have roughly 3 options from a tf.data perspective:

padded-batch (what you're using)
bucket-by-sequence-length (what you're suggesting except move the bucketing into tf.data streams and out of tables; but using multiple tables works too!)
chunk-and-shuffle (chop all episodes into fixed sized pieces and shuffle those, not bothering to train on full eps)

Chunk-and-shuffle often works well in dense reward scenarios. Here's an example:

  stream = dataset.unbatch()
  # Since we're generally reading from a never-ending replay buffer, we
  # can drop remainder here and get the benefit of usually having known
  # episode lengths and batch sizes.
  chunks = stream.batch(chunk_size, drop_remainder=True)
  # reshuffle_each_iteration will probably never be used since replay
  # buffers are based on never-ending datasets, but we add it just in case.
  shuffled = chunks.shuffle(
      shuffle_buffer_size, reshuffle_each_iteration=True, seed=seed)
  batched = shuffled.batch(batch_size, drop_remainder=True)
  return batched

The bucket-by-sequence-length approach can be combined with padding-up-to-fixed-lengths-per-bucket (allowing fixed shaped graphs, which is useful if you're training with TPUs). The tf.data function that helps you do this is bucket_by_sequence_length.

In all cases, you can accelerate by creating multiple reverb datasets (flexible_batch_size=1 in each) and using tf.data's interleave (ACME's reverb dataset code does this).

ebrevdo closed this as completed Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the best way to deal with variable episode length? #49

What's the best way to deal with variable episode length? #49

uduse commented Jun 16, 2021

ebrevdo commented Jun 23, 2021

What's the best way to deal with variable episode length? #49

What's the best way to deal with variable episode length? #49

Comments

uduse commented Jun 16, 2021

ebrevdo commented Jun 23, 2021