New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对用户传入的已进行分词处理的数据进行二次分词以便提高准确性 #126

Merged
merged 1 commit into from Apr 21, 2018

Conversation

Projects
None yet
3 participants
@mozillazg
Owner

mozillazg commented Apr 21, 2018

PR 描述

对用户传入的已进行分词处理的数据进行二次分词以便提高准确性,因为用户的分词结果不一定有对应的词组数据,二次分词后可能有对应的词组数据。

比如 你要重新考虑 这个句子:

用户分词结果: ['你', '要', '重新考虑']
二次分词结果: ['你', '要', '重新', '考虑']

没有 重新考虑 这个词组的拼音数据,但是有 重新 这个词组的拼音数据

待办事项

  • 符合代码规范
  • 单元测试
  • 文档
对用户传入的已进行分词处理的数据进行二次分词以便提高准确性
因为用户的分词结果不一定有对应的词组数据,二次分词后可能有对应的词组数据。

比如:`你要重新考虑`

用户分词结果: `['你', '要', '重新考虑']`
二次分词结果: `['你', '要', '重新', '考虑']`

没有 `重新考虑`` 这个词组的拼音数据,但是有 `重新` 这个词组的拼音数据
@mozillazg

This comment has been minimized.

Owner

mozillazg commented Apr 21, 2018

@bors-homu

This comment has been minimized.

Collaborator

bors-homu commented Apr 21, 2018

📌 Commit 717ce93 has been approved by mozillazg

@bors-homu

This comment has been minimized.

Collaborator

bors-homu commented Apr 21, 2018

⌛️ Testing commit 717ce93 with merge e8fec9d...

bors-homu added a commit that referenced this pull request Apr 21, 2018

Auto merge of #126 - mozillazg:pre-seg-improve, r=mozillazg
对用户传入的已进行分词处理的数据进行二次分词以便提高准确性

## PR 描述

对用户传入的已进行分词处理的数据进行二次分词以便提高准确性,因为用户的分词结果不一定有对应的词组数据,二次分词后可能有对应的词组数据。

比如 `你要重新考虑` 这个句子:

用户分词结果: `['你', '要', '重新考虑']`
二次分词结果: `['你', '要', '重新', '考虑']`

没有 `重新考虑` 这个词组的拼音数据,但是有 `重新` 这个词组的拼音数据

## 待办事项

* [x] 符合代码规范
* [x] 单元测试
* [x] 文档
@coveralls

This comment has been minimized.

coveralls commented Apr 21, 2018

Coverage Status

Coverage decreased (-0.2%) to 99.058% when pulling 717ce93 on pre-seg-improve into 4072b88 on develop.

@codecov

This comment has been minimized.

codecov bot commented Apr 21, 2018

Codecov Report

Merging #126 into develop will decrease coverage by 0.18%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #126      +/-   ##
===========================================
- Coverage    99.24%   99.05%   -0.19%     
===========================================
  Files           20       20              
  Lines          530      531       +1     
===========================================
  Hits           526      526              
- Misses           4        5       +1
Impacted Files Coverage Δ
pypinyin/utils.py 100% <ø> (ø) ⬆️
pypinyin/contrib/mmseg.py 100% <ø> (ø) ⬆️
pypinyin/core.py 99.02% <100%> (-0.98%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4072b88...717ce93. Read the comment docs.

@bors-homu

This comment has been minimized.

Collaborator

bors-homu commented Apr 21, 2018

☀️ Test successful - status-travis
Approved by: mozillazg
Pushing e8fec9d to develop...

@bors-homu bors-homu merged commit 717ce93 into develop Apr 21, 2018

5 of 6 checks passed

coverage/coveralls Coverage decreased (-0.2%) to 99.058%
Details
codecov/patch 100% of diff hit (target 99.24%)
Details
codecov/project Absolute coverage decreased by -0.18% but relative coverage increased by +0.75% compared to 4072b88
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
homu Test successful
Details

@mozillazg mozillazg deleted the pre-seg-improve branch Apr 22, 2018

Repository owner deleted a comment from coveralls Apr 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment