Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the best way to deal with variable episode length? #49

Closed
uduse opened this issue Jun 16, 2021 · 1 comment
Closed

What's the best way to deal with variable episode length? #49

uduse opened this issue Jun 16, 2021 · 1 comment

Comments

@uduse
Copy link
Contributor

uduse commented Jun 16, 2021

A solution provided in Acme pads all episodes to the maximum episode length, but I'm afraid that this will drastically reduce the performance if the episode length has a high variance. For example, if the maximum episode length is 300, while most episodes end at 15 steps, this could be a 20 folds decrease in data throughout. For my tasks, it's ok to always have a batch size of one (sampling one episode at a time), would this change the requirement of all items must have the same shape?

What are the best ways to deal with this situation?

Here are some ideas that came to my mind but I'm not sure if they can solve the problem:

  • create multiple tables corresponds to various padding length, and insert to different tables based the actual episode lengths
  • used delta-encoding

Related: #19 #47

@ebrevdo
Copy link
Collaborator

ebrevdo commented Jun 23, 2021

You can create one table which contains variable lengths. Both the old Writer/Sampler and the new Trajectory Writer/Sampler allow it (if you provide a signature to the dataset, simply set the outer/time dimension to None in Python).

But now you additionally have the problem of batching and training from a stream of variable episode lengths.

You have roughly 3 options from a tf.data perspective:

  • padded-batch (what you're using)
  • bucket-by-sequence-length (what you're suggesting except move the bucketing into tf.data streams and out of tables; but using multiple tables works too!)
  • chunk-and-shuffle (chop all episodes into fixed sized pieces and shuffle those, not bothering to train on full eps)

Chunk-and-shuffle often works well in dense reward scenarios. Here's an example:

  stream = dataset.unbatch()
  # Since we're generally reading from a never-ending replay buffer, we
  # can drop remainder here and get the benefit of usually having known
  # episode lengths and batch sizes.
  chunks = stream.batch(chunk_size, drop_remainder=True)
  # reshuffle_each_iteration will probably never be used since replay
  # buffers are based on never-ending datasets, but we add it just in case.
  shuffled = chunks.shuffle(
      shuffle_buffer_size, reshuffle_each_iteration=True, seed=seed)
  batched = shuffled.batch(batch_size, drop_remainder=True)
  return batched

The bucket-by-sequence-length approach can be combined with padding-up-to-fixed-lengths-per-bucket (allowing fixed shaped graphs, which is useful if you're training with TPUs). The tf.data function that helps you do this is bucket_by_sequence_length.

In all cases, you can accelerate by creating multiple reverb datasets (flexible_batch_size=1 in each) and using tf.data's interleave (ACME's reverb dataset code does this).

@ebrevdo ebrevdo closed this as completed Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants