Simstring matcher could produce span with corrected term #38

nourG22 · 2024-04-15T16:07:10Z

Issue

When using SimString for typo detection, it accurately identifies misspelled words like "mxnidipine" but lacks functionality to produce the corrected version of the word, such as "manidipine".

Reproduction Steps

Input text containing misspelled words, such as "mxnidipine".
Utilize SimString for word detection, even when encountering typos.

Current Behavior

SimString accurately identifies misspelled terms but does not provide the corrected version.

Cause

The current logic of SimString is constrained within a parent class, lacking a specialized run() method which hinders the generation of corrected terms.

Suggested Solution

Implement a run() specialization within the SimString matcher class to enable the generation of corrected terms. This specialization should extend to both SimString matcher and regular expression matcher functionalities.

The text was updated successfully, but these errors were encountered:

ghisvail · 2024-04-18T12:29:09Z

Thanks @nourG22 for the very detailed reporting.

I have discussed this issue with the rest of the team, and we were thinking of an alternative solution which would provide more flexibility. Your report was very useful to kickstart the discussion with a realistic use case.

The proposed alternative is the following: instead of replacing the text within the span produced by the SimstringMatcher, the span could be enhanced by another normalization (whose name is up for discussion, let's call it typo correction for the sake of it), which may then be applied to the text in a follow-up operation (up for discussion too) or just carried around within the rest of the processing.

This way, users still have the choice to keep or correct for the typo, or use an alternative normalization (e.g. UMLS) and still carry the information about the match around.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simstring matcher could produce span with corrected term #38

Simstring matcher could produce span with corrected term #38

nourG22 commented Apr 15, 2024

ghisvail commented Apr 18, 2024

Simstring matcher could produce span with corrected term #38

Simstring matcher could produce span with corrected term #38

Comments

nourG22 commented Apr 15, 2024

Issue

Reproduction Steps

Current Behavior

Cause

Suggested Solution

ghisvail commented Apr 18, 2024