Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set #5

swanserquack · 2023-04-12T18:34:12Z

Today I was trying to convert the 16B model using the python script but kept runing into the error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 926: character maps to <undefined>

having a look into it, the error seems to occur as the encoding of the file is not set to UTF-8 when it is opened by the script. Tested change on my personal machine and it converted the model fine without any errors

ravenscroftj · 2023-04-12T19:50:28Z

Nice spot thank you - the vocab.json files for all of the models should be utf-8 encoded I think so I'm happy to accept this change. I definitely need to add some automated regression tests for PRs in future :)

Set UTF-8 encoding on vocab.json

ce3eb80

ravenscroftj merged commit 5d2ea0a into ravenscroftj:main Apr 12, 2023
1 of 2 checks passed

swanserquack deleted the fix_vocab_encoding branch April 12, 2023 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set #5

Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set #5

swanserquack commented Apr 12, 2023

ravenscroftj commented Apr 12, 2023

Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set #5

Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set #5

Conversation

swanserquack commented Apr 12, 2023

ravenscroftj commented Apr 12, 2023