GPTNeo: importing model with padded vocab size should truncate wte #11078

leogao2 · 2021-04-06T04:56:48Z

Environment info

transformers version: latest from master

Who can help

Information

Model I am using (Bert, XLNet ...): GPTNeo

Script: https://github.com/huggingface/transformers/blob/master/src/transformers/models/gpt_neo/convert_gpt_neo_mesh_tf_to_pytorch.py

Some GPTNeo models are trained with a vocab size greater than the actual used vocab size (i.e 50304 in config when the actual vocab size is 50257) where all tokens after the first i.e 50257 are unused. These models cannot currently be converted using the script because there is no way to cut the extra embeddings out of wte.

The text was updated successfully, but these errors were encountered:

leogao2 added a commit to leogao2/transformers that referenced this issue Apr 6, 2021

GPTNeo: handle padded wte (huggingface#11078)

666b527

leogao2 mentioned this issue Apr 6, 2021

GPTNeo: handle padded wte (#11078) #11079

Merged

LysandreJik assigned patil-suraj Apr 6, 2021

patil-suraj closed this as completed in #11079 Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTNeo: importing model with padded vocab size should truncate wte #11078

GPTNeo: importing model with padded vocab size should truncate wte #11078

leogao2 commented Apr 6, 2021

GPTNeo: importing model with padded vocab size should truncate wte #11078

GPTNeo: importing model with padded vocab size should truncate wte #11078

Comments

leogao2 commented Apr 6, 2021

Environment info

Who can help

Information