Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Eliminate dummy batches and init_cuda_buffer. #3732

Merged
merged 1 commit into from Jun 21, 2021
Merged

Conversation

stephenroller
Copy link
Contributor

Patch description
Eliminate the dreaded _dummy_batch and _init_cuda_buffer.

Users have complained about the presence of these methods. When debugging, the first batch being a dummy causes some confusion (why am I getting gibberish data?) and the manual implementation of a _dummy_batch is annoying (why is my model breaking on the first pass?)

Adopt Fairseq's newer approach:

  • Cache the very first batch we ever seen. If it's invalid, we should be raising an exception anyway. If it's valid, it will be good enough
  • Do NOT initialize the cuda buffer. Just keep that batch around
  • If we need to recover from an OOM, used the cached dummy batch as the data.

Testing steps
CI.

Copy link
Contributor

@klshuster klshuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making this change. I'm approving but did have one perhaps pertinent question

Cache a batch to be used as a dummy during _fake_forward_pass.
"""
if not hasattr(self, '_dummy_batch'):
self._dummy_batch = batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this unnecessarily use up memory? what if the first batch is huge for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a worse problem for images. For text, imagine it's a 2x2048x1024 LongTensor (2x for content and label; bs 2048; string length 1024). A long is 8 bytes. So we'll be wasting 32MiB of memory.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants