New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
自定义字典 中英文混合间隔为空格时bug #300
Comments
请问可以容许自订辞典有空格後,是否仍然无法分出含空格的词 上列四个词(有一个词是空格) 而不是 Edu Trust认证 一个词 请问是设定上的问题吗? 谢谢 |
fix:fxsjy#300 when jieba.cut(sentence,HMM=False),chinese and english characters mixed with whitespace can be also ouput. userdict: Edu Trust认证 2000 jieba.cut("我通过了Edu Trust认证",HMM=False) output:我, 通过, 了, Edu Trust认证
tags =jieba.analyse.extract_tags("我通过了Edu Trust认证") print(", ".join(tags)) output: Edu Trust认证, 通过
很好的一个patch —— 至少解决了这个积年累月的问题 |
請問有補丁可以修復嗎? (从文件读取+自定義詞典的方式) @fxsjy ,@cavonchen , 感謝, 急用... |
再請教英文組合字中間有空格可以也分詞出來嗎? 例如: "這是 This is an 一個 Apple Macbook, 我在自定字典內定義Apple Macbook", 我在自定典中有定義"Apple Macbook", 出來結果要像下面: Apple Macbook 請各位前輩幫一下忙, 謝謝 @fxsjy , @cavonchen |
中英交雜有痘點符號的也沒法切出來, 例如: "小王 , 小白, 小張" |
|
@summer1988 前辈 有空可以修下吗? 谢谢了 |
例如:Edu Trust认证 2000
使用jieba.load_userdict('xx.dict')无法读取,tracback:
ValueError: invalid dictionary entry in htopics/summary/user.dict at Line 979: Edu Trust认证 2000
是否是结巴读取自定义文件时,每一行属性分割时使用的spilt,从左开始分割,
我觉得是不是应该从右开始分割并取固定的个数:rsplit('Edu Trust 2000 nv', n)
The text was updated successfully, but these errors were encountered: