HanLP1.x 文本推荐逻辑错误 #1718

QAQ516284797 · 2022-04-09T17:12:34Z

Describe the bug
在 Suggester 类的suggest方法中对于不同评价器求和出错，Suggester 位于package com.hankcs.hanlp.suggest;
scoreMap.put(entry.getKey(), score / max + entry.getValue() * scorer.boost);
其中max表示当前评价器的最优得分，被错误的除在了之前评价器得分之和上，导致最终推荐结果不准确
应改为
scoreMap.put(entry.getKey(), score + entry.getValue() * scorer.boost/ max);
Code to reproduce the issue
测试代码

   public static void main(String[] args)
    {
        Suggester suggester = new Suggester();
        String[] titleArray =
            (
                "wuqi\n" +"服务器"
            ).split("\\n");
        for (String title : titleArray)
        {
            suggester.addSentence(title);
        }
        System.out.println(suggester.suggest("务器", 1));    
    }
}

实际运行结果推荐为wuqi

Describe the current behavior
因为对于评价器分数求和错误导致了模型得分发生偏差，更倾向于选择拼音检查得分高的句子

Expected behavior
期望运行结果应如下所示，每轮得到的评分应该位于（0,1）之间，最后测试结果应该为服务器

---IdVectorScorer------
当前总得分 1.0 当前评价器给出得分 1.1457421786266213E-10 当前评价器给出得分最大值 1.1457421786266213E-10 候选句子内容 服务器 本轮增长分数 1.0
---EditDistanceScorer------
当前总得分 2.0 当前评价器给出得分 0.5 当前评价器给出得分最大值 0.5 候选句子内容 服务器 本轮增长分数 1.0
当前总得分 0.4 当前评价器给出得分 0.2 当前评价器给出得分最大值 0.5 候选句子内容 wuqi 本轮增长分数 0.4
---PinyinScorer------
当前总得分 2.7 当前评价器给出得分 1.1666666666666665 当前评价器给出得分最大值 1.6666666666666665 候选句子内容 服务器 本轮增长分数 0.7000000000000002
当前总得分 1.4 当前评价器给出得分 1.6666666666666665 当前评价器给出得分最大值 1.6666666666666665 候选句子内容 wuqi 本轮增长分数 0.9999999999999999

System information

Windows
jdk11.0.5
HanLP version:1.x

Other info / logs
实际运行结果，出现了轮次评分大于1的情况。评价器评分叠加异常

---IdVectorScorer------
当前总得分 1.1457421786266213E-10 当前评价器给出得分 1.1457421786266213E-10 当前评价器给出得分最大值 1.1457421786266213E-10 候选句子内容 服务器 本轮增长分数 1.1457421786266213E-10
---EditDistanceScorer------
当前总得分 0.5000000002291485 当前评价器给出得分 0.5 当前评价器给出得分最大值 0.5 候选句子内容 服务器 本轮增长分数 0.5000000001145742
当前总得分 0.2 当前评价器给出得分 0.2 当前评价器给出得分最大值 0.5 候选句子内容 wuqi 本轮增长分数 0.2
---PinyinScorer------
当前总得分 1.4666666668041557 当前评价器给出得分 1.1666666666666665 当前评价器给出得分最大值 1.6666666666666665 候选句子内容 服务器 本轮增长分数 0.9666666665750072
当前总得分 1.7866666666666666 当前评价器给出得分 1.6666666666666665 当前评价器给出得分最大值 1.6666666666666665 候选句子内容 wuqi 本轮增长分数 1.5866666666666667

I've completed this form and searched the web for solutions.

The text was updated successfully, but these errors were encountered:

hankcs · 2022-04-09T17:41:40Z

感谢指正，当时确实犯了这个错误。

感谢反馈，已经修复，请参考上面的commit。
如果还有问题，欢迎重开issue。

QAQ516284797 added the bug label Apr 9, 2022

QAQ516284797 assigned hankcs Apr 9, 2022

hankcs added a commit that referenced this issue Apr 9, 2022

修复文本推荐的评分器分数计算时 scorer.boost 的 bug fix: #1718

867cc8d

hankcs closed this as completed Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HanLP1.x 文本推荐逻辑错误 #1718

HanLP1.x 文本推荐逻辑错误 #1718

QAQ516284797 commented Apr 9, 2022

hankcs commented Apr 9, 2022

HanLP1.x 文本推荐 逻辑错误 #1718

HanLP1.x 文本推荐 逻辑错误 #1718

Comments

QAQ516284797 commented Apr 9, 2022

hankcs commented Apr 9, 2022

HanLP1.x 文本推荐逻辑错误 #1718

HanLP1.x 文本推荐逻辑错误 #1718