Document properties #2

craigpfeifer · 2015-10-14T16:48:07Z

The properties file contains the properties:

stopWordListName = data/CN.nw.wordlist.txt
endWordListName = data/CN.endlist.txt
forbiddenCharListName = data/CN.charlist.txt
stopThreshold = 50
forbiddenThreshold = 800
minAV = 5
minCount = 3
minDocumentCount = 5
terminologyThreshold = 0.6

Could you document what each parameter is? I think the first 3 are obvious, but the others could use some explanation.

Thanks!

craigpfeifer · 2015-10-14T17:37:19Z

Also, if there are other properties squirreled away somewhere that can affect the results, that would be useful as well!

ivanhe · 2015-10-14T18:26:29Z

I will document the parameters, but what matters most is using a Chinese word segmenter that works well on your data.

craigpfeifer · 2015-10-14T18:32:24Z

Agreed. Bad segmentation is impossible to recover from.

ivanhe · 2015-10-14T21:09:04Z

Updated README.md. Thank you!

ivanhe added the enhancement label Oct 14, 2015

ivanhe self-assigned this Oct 14, 2015

ivanhe closed this as completed Oct 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document properties #2

Document properties #2

craigpfeifer commented Oct 14, 2015

craigpfeifer commented Oct 14, 2015

ivanhe commented Oct 14, 2015

craigpfeifer commented Oct 14, 2015

ivanhe commented Oct 14, 2015

Document properties #2

Document properties #2

Comments

craigpfeifer commented Oct 14, 2015

craigpfeifer commented Oct 14, 2015

ivanhe commented Oct 14, 2015

craigpfeifer commented Oct 14, 2015

ivanhe commented Oct 14, 2015