Skip to content

Commit

Permalink
Fix block_size picking in megatron_lm_gpt_pretraining.py (#2342)
Browse files Browse the repository at this point in the history
Only cap `block_size` to 1024 if `tokenizer.model_max_length` is actually greater than 1024.
  • Loading branch information
nilq committed Jan 18, 2024
1 parent c7d11d7 commit 14d7c3f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion examples/by_feature/megatron_lm_gpt_pretraining.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,7 @@ def tokenize_function(examples):
f"The tokenizer picked seems to have a very large `model_max_length` ({tokenizer.model_max_length}). "
"Picking 1024 instead. You can change that default value by passing --block_size xxx."
)
block_size = 1024
block_size = 1024
else:
if args.block_size > tokenizer.model_max_length:
logger.warning(
Expand Down

0 comments on commit 14d7c3f

Please sign in to comment.