You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please provide a clear and concise description of what the question is.
新版本macbert4csc中ConfusionCorrector实现逻辑问题,这里需要遍历疑似错误词典,然后每一个都需要re正则,当混淆词典比较大的时候,会特别慢。建议改为前缀树或者其他形式。
def correct(self, sentence: str):
"""
基于混淆集纠错
:param sentence: str, 待纠错的文本
:return: dict, {'source': 'src', 'target': 'trg', 'errors': [(error_word, correct_word, position), ...]}
"""
corrected_sentence = sentence
details = []
# 自定义混淆集加入疑似错误词典
for err, truth in self.custom_confusion.items():
for i in re.finditer(err, sentence):
start, end = i.span()
corrected_sentence = corrected_sentence[:start] + truth + corrected_sentence[end:]
details.append((err, truth, start))
return {'source': sentence, 'target': corrected_sentence, 'errors': details}
Describe the Question
Please provide a clear and concise description of what the question is.
新版本macbert4csc中ConfusionCorrector实现逻辑问题,这里需要遍历疑似错误词典,然后每一个都需要re正则,当混淆词典比较大的时候,会特别慢。建议改为前缀树或者其他形式。
实测当混淆词典为1万时,ConfusionCorrector纠正速度为200-300ms每个句子,而macbert4csc推理一条句子,只需要几毫秒几十毫秒
The text was updated successfully, but these errors were encountered: