Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CharTabel 归一化部分字符存在错误 #1615

Closed
1 task done
tiandiweizun opened this issue Feb 22, 2021 · 1 comment
Closed
1 task done

CharTabel 归一化部分字符存在错误 #1615

tiandiweizun opened this issue Feb 22, 2021 · 1 comment
Assignees
Labels

Comments

@tiandiweizun
Copy link
Contributor

tiandiweizun commented Feb 22, 2021

Describe the bug
A clear and concise description of what the bug is.

  1. 有个issue关于调用CharTabel,把“幺”改为“么”不合理portable修复了,但是下载1.7.5 zip包有问题,后发现CharTable.txt.bin md5不一致
  2. 以下字符有问题:其中第一列是原始字符,第二列是归一化后字符,括号表示 建议可以考虑括号内字符替换原有归一化内容
    猛 勐
    蜺 霓
    脊 嵴
    骼 胳
    拾 十
    劈 噼
    溜 熘
    呱 哌
    怵 憷
    糸 纟(丝)
    乾 干
    艸 艹(草)
    Code to reproduce the issue
    Provide a reproducible test case that is the bare minimum necessary to generate the problem.
public void testCharTable() {
        Map<String, String> normalizationBadCase = new HashMap<>();
        normalizationBadCase.put("猛", "猛");
        normalizationBadCase.put("蜺", "蜺");
        normalizationBadCase.put("脊", "脊");
        normalizationBadCase.put("骼", "骼");
        normalizationBadCase.put("拾", "拾");
        normalizationBadCase.put("劈", "劈");
        normalizationBadCase.put("溜", "溜");
        normalizationBadCase.put("呱", "呱");
        normalizationBadCase.put("怵", "怵");
        normalizationBadCase.put("糸", "丝");
        normalizationBadCase.put("乾", "乾");
        normalizationBadCase.put("艸", "草");
        for (Map.Entry<String, String> entry : normalizationBadCase.entrySet()) {
            assert CharTable.convert(entry.getKey()).equals(entry.getValue());
        }
    }

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): win10
  • Python version:
  • HanLP version: 1.8.0
  • I've completed this form and searched the web for solutions.
@hankcs
Copy link
Owner

hankcs commented Feb 22, 2021

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

数据包有一段时间没更新了,你可以自己更新这个文件。

@hankcs hankcs closed this as completed Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants