-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
https://github.com/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb
I think when we do tokenizer.pad in collator , its a slow operation so there is warning that suggests that when we do tokenizer( )
we can always padding =True there .
Doing it inside collator slows the training, any way we can use padding option of tokenizer directly ?
accepted_keys = ["input_ids", "attention_mask", "label"]
features = [{k: v for k, v in encoded_datasets["train"][i].items() if k in accepted_keys} for i in range(10)]
batch = DataCollatorForMultipleChoice(tokenizer)(features)
Metadata
Metadata
Assignees
Labels
No labels