Skip to content
This repository has been archived by the owner on Aug 3, 2022. It is now read-only.

请问你的中文输入数据格式是什么样的?每行多个词,词之间空格间隔么? #13

Open
danglei912 opened this issue Apr 11, 2017 · 4 comments

Comments

@danglei912
Copy link

No description provided.

@indiejoseph
Copy link
Owner

@danglei912
Copy link
Author

谢谢,顺利看到示例数据,但是还有一点小疑问,希望能够得到您的解答。
我见您所给示例中句子里会有一些标点符号,如逗号,空格。但load_data_and_labels()函数中并未对标点符号做清洗,请问这些词在后续建立(词,索引)后,应该会稍微影响模型训练精度吧, 毕竟逗号,空格也作为一个词参与训练了?

@indiejoseph
Copy link
Owner

因這個用細dataset, 我沒有做清洗,但你可以參考我另一個 project
https://github.com/indiejoseph/chinese-char-rnn/blob/master/utils.py

@danglei912
Copy link
Author

好的,非常感谢,我学习下。

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants