Set pad_token_id to tokenizer.pad_token_id if not set on command line #118

patrickhwood · 2023-05-12T20:38:53Z

The hf_chat.py program emits this warning message before each chat response:

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:0 for open-end generation.

Fixed by setting pad_token_id to tokenizer.eos_token_id if not set on the command line.

… of the The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. message that appears every time in the chat loop. Tested in the mosaicml/pytorch docker container.

alextrott16

Thanks @patrickhwood!

@samhavens I'll defer approval to you because you know the script better, but the change here is what we do in hf_generate.py and it doesn't seem to cause any trouble.

samhavens

Thank you!

… of the (#118) The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. message that appears every time in the chat loop. Tested in the mosaicml/pytorch docker container. Co-authored-by: Pat Wood <Pat.Wood@efi.com>

efii added 2 commits May 12, 2023 16:30

Merge branch 'main' into br1

b4cb74a

patrickhwood changed the title ~~Br1~~ Set pad_token_id to tokenizer.pad_token_id if not set on command line May 12, 2023

alextrott16 reviewed May 12, 2023

View reviewed changes

vchiley requested a review from samhavens May 15, 2023 16:28

samhavens approved these changes May 16, 2023

View reviewed changes

samhavens merged commit b2450db into mosaicml:main May 16, 2023
6 checks passed

bmosaicml pushed a commit that referenced this pull request Jun 6, 2023

Upgrade llm example to Composer 0.12.1 (#118)

f6daadd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set pad_token_id to tokenizer.pad_token_id if not set on command line #118

Set pad_token_id to tokenizer.pad_token_id if not set on command line #118

patrickhwood commented May 12, 2023

alextrott16 left a comment

samhavens left a comment

Set pad_token_id to tokenizer.pad_token_id if not set on command line #118

Set pad_token_id to tokenizer.pad_token_id if not set on command line #118

Conversation

patrickhwood commented May 12, 2023

alextrott16 left a comment

Choose a reason for hiding this comment

samhavens left a comment

Choose a reason for hiding this comment