Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于语料问题 #26

Closed
ZenXir opened this issue Apr 1, 2023 · 3 comments
Closed

关于语料问题 #26

ZenXir opened this issue Apr 1, 2023 · 3 comments

Comments

@ZenXir
Copy link

ZenXir commented Apr 1, 2023

从网盘新下载了 merge.json 语料
发现原来是 663M 现在变成389M了
是什么原因语料变小 只保留了70W+条 大佬?

@Facico
Copy link
Owner

Facico commented Apr 1, 2023

@ZenXir 改成utf-8格式了

@Facico Facico closed this as completed Apr 1, 2023
@Facico
Copy link
Owner

Facico commented Apr 1, 2023

ascii格式表示长度要长

@ZenXir
Copy link
Author

ZenXir commented Apr 1, 2023

确实 刚看了下我之前转成utf8格式的 确实也是389M 哈哈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants