New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread()
doesn't support unicode in file names on Windows
#3400
Comments
Sorry to forget to explain that this error only exists in windows. My Chinese path is normal on linux. |
On Windows we use |
fread()
doesn't support unicode in file names on Windows
非常感谢你的回复。 I made some attempts, and found that the problem may be that the path was read by the I tried it this way: import pandas as pd
import datatable as dt
import sys
print('defaultencoding: ' + sys.getdefaultencoding())
print('stdout.encoding: ' + sys.stdout.encoding)
print('stdin.encoding: ' + sys.stdin.encoding)
test_file = 'D:/测试.csv'
pd_df = pd.read_csv('D:/test.csv', encoding='utf-8', low_memory=False)
dt_df = dt.Frame(dt_df)
dt_df.to_csv(test_file) output
然后输出文件是 print('D:/测试.csv'.encode('utf-8').decode('gbk')) output
可以确认这就是编码的识别不正确。但我不知道如何配置 It can be confirmed that the identification of the code is incorrect. But I don't know how to configure the identification code of 'dt'. At present, I can only use the local method: read and save files in the form of 'dt. fread ('D:/test. csv') '. If so, I want to know where the 'dt' code configuration file is read from, and whether the configuration file can be modified manually.
|
The simplest workaround is to rename your file to use only ASCII characters. To support unicode file names on WIndows we need to make changes to datatable source code. |
try this:
|
@TimothyZero you can even try with open(f'中文.csv', encoding='utf_8_sig', mode='w') as f: # utf_8_sig for Excel on windows
DT = dt.fread(f) |
我尝试了使用with open方法来解决读取文件包含中文路径的问题,但是这带来了文件读取耗时的显著增长 |
我刚刚开始尝试使用datatable,发现如果文件中含有中文路径,将会出现IOError。
然而同一个文件,在全英文路径下则不会出现这样的问题。
报错信息附在最后。
我不知道,是否已存在了解决方案,我尝试搜过,但没有找到解决方案。
My English is not good. I use machine translation:
I just tried to use datatable, and found that if the file contains a Chinese path, an IOError will appear.
However, for the same file, this problem will not occur in the full English path.
The error information is attached at the end.
I don't know whether there is a solution. I tried to search, but I didn't find a solution.
The text was updated successfully, but these errors were encountered: