Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

特定的省市分不出来 #30

Open
WenyuLei opened this issue Mar 14, 2018 · 1 comment
Open

特定的省市分不出来 #30

WenyuLei opened this issue Mar 14, 2018 · 1 comment

Comments

@WenyuLei
Copy link

console.log(segment.doSegment('2015-2016学年新疆乌鲁木齐九十八中七年级
(上)期中数学试卷', {stripStopword: true,stripPunctuation: true}));
[ { w: '2015', p: 4194304 },
{ w: '2016', p: [ 4194304 ] },
{ w: '学年', p: 1048576 },
{ w: '新疆乌鲁木齐', p: 1048576 },
{ w: '九十八', p: 6291456 },
{ w: '中', p: 33558528 },
{ w: '七年级', p: 2097152 },
{ w: '上', p: 33554432 },
{ w: '期中', p: 16384 },
{ w: '数学试卷', p: 1048576 } ]

不能将省(新疆)和市(乌鲁木齐)分出来,但是新疆的其他城市却能分出来(比如:新疆克拉玛依),而且分词的词典中也有新疆、乌鲁木齐这些词,不知道是不是一个bug

@bluelovers
Copy link

想要讓他們分成兩個詞的話 在字典內 刪除 新疆乌鲁木齐 這各詞就好

另外這裡是我 fork 的版本
https://github.com/bluelovers/node-segment

有興趣的話 試用看看

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants