Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amr解析部分数字出错 #1721

Open
1 task done
SoaringTiger opened this issue Apr 15, 2022 · 4 comments
Open
1 task done

amr解析部分数字出错 #1721

SoaringTiger opened this issue Apr 15, 2022 · 4 comments
Assignees
Labels

Comments

@SoaringTiger
Copy link

SoaringTiger commented Apr 15, 2022

Describe the bug
例1:我给了他15万元。
amr 解析结果如下图:
bug
15万” 未被正确解析


例2: 我给了他十五点八万元。
bug2

十五点八万” 未被正确解析


例3: 我给了他十元三角八分钱。
屏幕快照 2022-04-15 下午5 54 34
十元三角八分” 未被正确解析

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Describe the current behavior
将“15万”改为“十五万”后,可解析为 “150000”
错误应出自数字转换的过程。 可以参考 https://github.com/microsoft/Recognizers-Text

Expected behavior
能正确显示 label。
当然了,输出数据里的 anchors 标记了原文位置,所以问题也不是特别的大😄

看了下输出的数据,anchors是保留了原文的位置,所以问题也不是特别的大。

System information

  • OS Platform and Distribution (Linux Ubuntu 16.04):
  • Python version: 3.9
  • HanLP version: 2.1b23

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

  • I've completed this form and searched the web for solutions.
@hankcs
Copy link
Owner

hankcs commented Apr 15, 2022

感谢反馈,的确存在中文数字解析的问题。微软的东西试过了,也没法处理一些混合小数与单位的情况,还是得靠自己改了改。请应用补丁:

 pip3 install perin_parser -U

@hankcs
Copy link
Owner

hankcs commented Apr 15, 2022

至于部分数值缺失,则是由于模型没有预测出来,而不是预测出来转换错误导致的。暂时没有太好的办法,可能需要跟NER做联合学习。

@cmdares
Copy link

cmdares commented May 8, 2022

期待

@tangYiQun
Copy link

1652064727(1)

复制的官网的demo,testutility 一直报错是什么原因?换了几个版本 1.8 .3 1.7.7 1.7.6 1.5.4 都报错

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants