We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No description provided.
The text was updated successfully, but these errors were encountered:
可以根据Knover/README.md( https://github.com/PaddlePaddle/Knover/blob/master/README.md )的提示准备好语料,可以使用sentencepiece工具( https://github.com/google/sentencepiece )处理生成词表,格式可以参照./package/dialog_en/voca.txt与./package/dialog_en/spm.model;或者使用已有的中文词表,如果是使用其他的Tokenizer(不是sentencepiece tokenizer),可以通过修改./utils/tokenization.py,参考SentencePiecieTokenizer的实现实现对应的Tokenizer(比如叫BasicTokneizer),在配置中的train_args中指定Tokenizer即可(加一行train_args="--tokenizer BasicTokenizer")
Knover/README.md
./package/dialog_en/voca.txt
./package/dialog_en/spm.model
./utils/tokenization.py
SentencePiecieTokenizer
BasicTokneizer
train_args="--tokenizer BasicTokenizer"
Knover/utils/tokenization.py
Line 124 in 15d5279
Sorry, something went wrong.
No branches or pull requests
No description provided.
The text was updated successfully, but these errors were encountered: