Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when I run main of privateGPT.py I get this error: gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' ... #77

Closed
Amarbo opened this issue May 12, 2023 · 8 comments

Comments

@Amarbo
Copy link

Amarbo commented May 12, 2023

No description provided.

@mmike87
Copy link

mmike87 commented May 12, 2023

I see those with some of my training files, too - I just ignore them for now and the model still seems to answer inquiries.

@Amarbo
Copy link
Author

Amarbo commented May 12, 2023

I follow exactly your instructions, so what can I do? Should I install another model? I yes which one please?
I have a PC windows 11 with 16 GB RAM memory.

@imartinez
Copy link
Collaborator

That's just a warning. There are some unsupported characters in your source files. But that won't prevent the model from working. You'll see the answer right after those warnings.

@bobhairgrove
Copy link

In the sample text, it seems that everything is encoded properly as UTF-8 text. However, there are "fancy quotes" at several places in the document, and somewhere along the toolchain this isn't being parsed properly.

If I open the sample text in an editor such as Geany, I cannot convert it to ISO-8859-1 without errors; however, it is possible to convert it to Windows 1252 and back to Unicode (i.e. UTF-8).

So which element in the toolchain doesn't understand UTF-8?

@bobhairgrove
Copy link

BTW, after getting several of these errors, the script is killed on my system, as others have also reported.

@bobhairgrove
Copy link

This needs to be reopened, IMHO -- how are we going to do other languages besides English if the program cannot handle Unicode?

@hodanli
Copy link

hodanli commented May 14, 2023

i get the same error with the example text

@thvi
Copy link

thvi commented May 15, 2023

Today, the only language supported by privateGPT is English?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants