Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set batch size of training? #21

Closed
desperadoola opened this issue Dec 8, 2019 · 5 comments
Closed

How to set batch size of training? #21

desperadoola opened this issue Dec 8, 2019 · 5 comments

Comments

@desperadoola
Copy link

When I try to change the batch size using --gin_param="sequences_per_batch=128" or --gin_param="tokens_per_batch=65536", the batch size seems always to be 32?

INFO:tensorflow:serialize_num_microbatches: tokens_per_microbatch_per_replica=2048 batch_dim=Dimension(name='batch', size=32) sequence_length={'inputs': 512, 'targets': 114} batch_per_replica=4 num_microbatches=1 I1208 11:05:22.407459 140391696871040 utils.py:1440] serialize_num_microbatches: tokens_per_microbatch_per_replica=2048 batch_dim=Dimension(name='batch', size=32) sequence_length={'inputs': 512, 'targets': 114} batch_per_replica=4 num_microbatches=1

@desperadoola
Copy link
Author

I successfully set it using --gin_param="utils.run.batch_size=('tokens_per_batch', 65536)"

@craffel
Copy link
Collaborator

craffel commented Dec 8, 2019

Good work!

@craffel craffel closed this as completed Dec 8, 2019
@desperadoola
Copy link
Author

Is there any instruction on how to set tokens_per_microbatch_per_replica ?

@craffel
Copy link
Collaborator

craffel commented Dec 25, 2019

You should only need to set tokens_per_microbatch_per_replica to something other than None if you want to use a batch size which is too large to fit in memory. Our training code will automatically split up too-large batches into microbatches and accumulate gradients so that the full batch size is computed.

@nshazeer FYI

@desperadoola
Copy link
Author

Thanks, but still, I don't figure out serialize_num_microbatches and mtf.tensor_dim_to_size_per_split in mesh_tensorflow transformer.

Is the default value tokens_per_microbatch_per_replica=2048 ok for different settings, for example, when we change the model size and use different TPU?

rodrigonogueira4 added a commit to castorini/pygaggle that referenced this issue Feb 4, 2021
The newer versions of the T5 library simply ignore `--gin_param="tokens_per_batch = 65536" \`:
google-research/text-to-text-transfer-transformer#21
rodrigonogueira4 added a commit to castorini/pygaggle that referenced this issue Feb 4, 2021
The newer versions of the T5 library simply ignore `--gin_param="tokens_per_batch = 65536" \`:
google-research/text-to-text-transfer-transformer#21
rodrigonogueira4 added a commit to castorini/pygaggle that referenced this issue Feb 4, 2021
The newer versions of the T5 library simply ignore --gin_param="tokens_per_batch = 65536" \:
google-research/text-to-text-transfer-transformer#21
rodrigonogueira4 added a commit to castorini/pygaggle that referenced this issue Feb 4, 2021
The newer versions of the T5 library simply ignore --gin_param="tokens_per_batch = 65536" \:
google-research/text-to-text-transfer-transformer#21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants