We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The properties file contains the properties:
stopWordListName = data/CN.nw.wordlist.txt endWordListName = data/CN.endlist.txt forbiddenCharListName = data/CN.charlist.txt stopThreshold = 50 forbiddenThreshold = 800 minAV = 5 minCount = 3 minDocumentCount = 5 terminologyThreshold = 0.6
Could you document what each parameter is? I think the first 3 are obvious, but the others could use some explanation.
Thanks!
The text was updated successfully, but these errors were encountered:
Also, if there are other properties squirreled away somewhere that can affect the results, that would be useful as well!
Sorry, something went wrong.
I will document the parameters, but what matters most is using a Chinese word segmenter that works well on your data.
Agreed. Bad segmentation is impossible to recover from.
Updated README.md. Thank you!
ivanhe
No branches or pull requests
The properties file contains the properties:
stopWordListName = data/CN.nw.wordlist.txt
endWordListName = data/CN.endlist.txt
forbiddenCharListName = data/CN.charlist.txt
stopThreshold = 50
forbiddenThreshold = 800
minAV = 5
minCount = 3
minDocumentCount = 5
terminologyThreshold = 0.6
Could you document what each parameter is? I think the first 3 are obvious, but the others could use some explanation.
Thanks!
The text was updated successfully, but these errors were encountered: