Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对用户传入的已进行分词处理的数据进行二次分词以便提高准确性 #126

Merged
merged 1 commit into from Apr 21, 2018

Conversation

@mozillazg
Copy link
Owner

@mozillazg mozillazg commented Apr 21, 2018

PR 描述

对用户传入的已进行分词处理的数据进行二次分词以便提高准确性,因为用户的分词结果不一定有对应的词组数据,二次分词后可能有对应的词组数据。

比如 你要重新考虑 这个句子:

用户分词结果: ['你', '要', '重新考虑']
二次分词结果: ['你', '要', '重新', '考虑']

没有 重新考虑 这个词组的拼音数据,但是有 重新 这个词组的拼音数据

待办事项

  • 符合代码规范
  • 单元测试
  • 文档
因为用户的分词结果不一定有对应的词组数据,二次分词后可能有对应的词组数据。

比如:`你要重新考虑`

用户分词结果: `['你', '要', '重新考虑']`
二次分词结果: `['你', '要', '重新', '考虑']`

没有 `重新考虑`` 这个词组的拼音数据,但是有 `重新` 这个词组的拼音数据
@mozillazg
Copy link
Owner Author

@mozillazg mozillazg commented Apr 21, 2018

@bors-homu
Copy link
Collaborator

@bors-homu bors-homu commented Apr 21, 2018

📌 Commit 717ce93 has been approved by mozillazg

@bors-homu
Copy link
Collaborator

@bors-homu bors-homu commented Apr 21, 2018

Testing commit 717ce93 with merge e8fec9d...

bors-homu added a commit that referenced this issue Apr 21, 2018
对用户传入的已进行分词处理的数据进行二次分词以便提高准确性

## PR 描述

对用户传入的已进行分词处理的数据进行二次分词以便提高准确性,因为用户的分词结果不一定有对应的词组数据,二次分词后可能有对应的词组数据。

比如 `你要重新考虑` 这个句子:

用户分词结果: `['你', '要', '重新考虑']`
二次分词结果: `['你', '要', '重新', '考虑']`

没有 `重新考虑` 这个词组的拼音数据,但是有 `重新` 这个词组的拼音数据

## 待办事项

* [x] 符合代码规范
* [x] 单元测试
* [x] 文档
@coveralls
Copy link

@coveralls coveralls commented Apr 21, 2018

Coverage Status

Coverage decreased (-0.2%) to 99.058% when pulling 717ce93 on pre-seg-improve into 4072b88 on develop.

@codecov
Copy link

@codecov codecov bot commented Apr 21, 2018

Codecov Report

Merging #126 into develop will decrease coverage by 0.18%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #126      +/-   ##
===========================================
- Coverage    99.24%   99.05%   -0.19%     
===========================================
  Files           20       20              
  Lines          530      531       +1     
===========================================
  Hits           526      526              
- Misses           4        5       +1
Impacted Files Coverage Δ
pypinyin/utils.py 100% <ø> (ø) ⬆️
pypinyin/contrib/mmseg.py 100% <ø> (ø) ⬆️
pypinyin/core.py 99.02% <100%> (-0.98%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4072b88...717ce93. Read the comment docs.

@bors-homu
Copy link
Collaborator

@bors-homu bors-homu commented Apr 21, 2018

☀️ Test successful - status-travis
Approved by: mozillazg
Pushing e8fec9d to develop...

@bors-homu bors-homu merged commit 717ce93 into develop Apr 21, 2018
5 of 6 checks passed
@mozillazg mozillazg deleted the pre-seg-improve branch Apr 22, 2018
Repository owner deleted a comment from coveralls Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants