-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'cp950' codec error from load_dataset('xtreme', 'tydiqa') #347
Comments
It should be in if self.config.name == "tydiqa" or self.config.name.startswith("MLQA") or self.config.name == "SQuAD":
with open(filepath) as f:
data = json.load(f) Could you try to add the encoding parameter: open(filepath, encoding='utf-8') |
Hello @jerryIsHere :) Did it work ? |
@lhoestq sorry for being that late, I found 4 copy of xtreme.py. I did the changes as what has been told to all of them. |
Could you provide a better error message so that we can make sure it comes from the opening of the |
@lhoestq I said that I found 4 copy of xtreme.py and add the 「, encoding='utf-8'」 parameter to the open() function |
Hi there ! |
Hello ! |
Sorry for not responding for about a month. I think the encoding issue for windows isn't limited to the open() function call specific to few dataset, but actually in the entire library, depends on the machine / os you use. |
Since #481 we shouldn't have other issues with encodings as they need to be set to "utf-8" be default. Closing this one, but feel free to re-open if you gave other questions |
I guess the error is related to python source encoding issue that my PC is trying to decode the source code with wrong encoding-decoding tools, perhaps :
https://www.python.org/dev/peps/pep-0263/
I guess the error was triggered by the code " module = importlib.import_module(module_path)" at line 57 in the source code: nlp/src/nlp/load.py / (https://github.com/huggingface/nlp/blob/911d5596f9b500e39af8642fe3d1b891758999c7/src/nlp/load.py#L51)
Any ideas?
p.s. tried the same code on colab, that runs perfectly
The text was updated successfully, but these errors were encountered: