热词间有相互干扰 #1727

kli017 · 2024-05-14T02:00:56Z

Notice: In order to resolve issues more efficiently, please raise issue following the template.
（注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

🐛 Bug

在runtime环境下使用speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx模型。添加如下热词表时感觉热词直接会有相互干扰。比如
针灸铜人 80
久通 80

测试可能会出现针灸通人、久铜等结果。请问添加热词是单独提高token概率吗。如果是全词匹配的话，按说wfst里影响不会这么大。有无办法解决？

The text was updated successfully, but these errors were encountered:

R1ckShi · 2024-05-28T02:58:48Z

runtime中的热词分两部分，首先是基于clas的nn热词，这个阶段是通过attention进行热词与decoder信息的匹配的
有热词冲突会导致attention机制产生错误的相关性，没有很好的解法
可能的解决方法是拆解长热词或者把短热词补长

kli017 added the bug Something isn't working label May 14, 2024

kli017 changed the title ~~热词直接有相互干扰~~ 热词间有相互干扰 May 14, 2024

R1ckShi closed this as completed May 28, 2024