Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练好后测试显示乱码 #2

Open
fire717 opened this issue Mar 19, 2018 · 4 comments
Open

训练好后测试显示乱码 #2

fire717 opened this issue Mar 19, 2018 · 4 comments

Comments

@fire717
Copy link

fire717 commented Mar 19, 2018

我是在windows下跑的,跑完后测试时的样例句子显示:
鐣 鍗 鍚 渚

然后我encode为gbk又显示[b'\xe7\x95', b'\xe5\x8d', b'\xe5\x90', b'\xe4\xbe']

最后我在linux环境下测试,同样显示:鐣 鍗 鍚 渚

求问作者的训练环境和测试环境(不会是因为不该在windows下训练吧。。。)

@fire717
Copy link
Author

fire717 commented Mar 19, 2018

我知道了,在extract_conv.py里open时应该加一个encoding:'utf-8'
不知道作者的环境,我是win10+py3 以及ubuntu+py3 改了之后都可以了

@qhduan
Copy link
Owner

qhduan commented Mar 19, 2018

因为windows默认编码不是utf-8,其他文件都是

所以windows默认会有点问题

@fire717
Copy link
Author

fire717 commented Mar 19, 2018

好吧 可能是我先下到windows再传到ubuntu的 也不行

@yaleimeng
Copy link

可能你在Windows下打开编辑过,再保存会改编码的。我也是下到windows再传到ubuntu解压缩的,执行demo没问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants