Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set #5

Merged
merged 1 commit into from
Apr 12, 2023

Conversation

swanserquack
Copy link
Contributor

Today I was trying to convert the 16B model using the python script but kept runing into the error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 926: character maps to <undefined>

having a look into it, the error seems to occur as the encoding of the file is not set to UTF-8 when it is opened by the script. Tested change on my personal machine and it converted the model fine without any errors

@ravenscroftj
Copy link
Owner

Nice spot thank you - the vocab.json files for all of the models should be utf-8 encoded I think so I'm happy to accept this change. I definitely need to add some automated regression tests for PRs in future :)

@ravenscroftj ravenscroftj merged commit 5d2ea0a into ravenscroftj:main Apr 12, 2023
1 of 2 checks passed
@swanserquack swanserquack deleted the fix_vocab_encoding branch April 12, 2023 19:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants