Skip to content

fix(punctuator): space should not be translated#1167

Merged
ksqsf merged 1 commit into
masterfrom
fix-punct-space
May 8, 2026
Merged

fix(punctuator): space should not be translated#1167
ksqsf merged 1 commit into
masterfrom
fix-punct-space

Conversation

@ksqsf
Copy link
Copy Markdown
Member

@ksqsf ksqsf commented May 4, 2026

follow-up #980

ctx->PushInput(' ') will trigger input update and in turn translation. During segmentation, . is tagged punct_number, but is tagged 'abc'. Sometimes, the translation result is not empty, and this can cause some additional text to be committed.

@ksqsf ksqsf requested a review from lotem May 4, 2026 15:34
@ksqsf ksqsf force-pushed the fix-punct-space branch from e81252a to 2ebf182 Compare May 4, 2026 15:47
@ksqsf ksqsf changed the title fix(punct_segmentor): space should not be translated fix(abc_segmentor): space should not be translated May 4, 2026
@lotem
Copy link
Copy Markdown
Member

lotem commented May 6, 2026

老師,我看懂是做什麼用了。
空格要不要轉換不敢一概而論,可能有空格要轉換的方案。
小數點後打空格,看來 ctx->PushInput(' ') 不中——這裏可能要改成加入一個指定的 Segment

@ksqsf ksqsf force-pushed the fix-punct-space branch from 2ebf182 to e81252a Compare May 6, 2026 08:36
@ksqsf ksqsf changed the title fix(abc_segmentor): space should not be translated fix(punct_segmentor): space should not be translated May 6, 2026
Copy link
Copy Markdown
Member

@lotem lotem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

強大
多測試

@boomker
Copy link
Copy Markdown

boomker commented May 7, 2026

这个 Bug 在我这里也复现了,数字后紧跟点号按空格上屏会额外多输出"/(",这明显不符合预期。

@ksqsf
Copy link
Copy Markdown
Member Author

ksqsf commented May 7, 2026

我怀疑词库编译也有问题,正常来说用 查询词库不应该得到结果吧。

@ksqsf ksqsf force-pushed the fix-punct-space branch from e81252a to c480033 Compare May 8, 2026 10:18
@ksqsf
Copy link
Copy Markdown
Member Author

ksqsf commented May 8, 2026

之前的写法有个问题是 punct_segmentor 必须放到 abc_segmentor 之前才能生效(即使朙月拼音也把 abc_segmentor 放到 punct_segmentor 之前了),换了个写法兼容了 punct_segmentor 在后面的写法。

@ksqsf ksqsf changed the title fix(punct_segmentor): space should not be translated fix(punctuator): space should not be translated May 8, 2026
@ksqsf ksqsf merged commit aa32d48 into master May 8, 2026
20 checks passed
@ksqsf ksqsf deleted the fix-punct-space branch May 8, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants