You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a few cases, the algorithm for suggesting corrections does not come up with the best suggestion, or even with any suggestions at all, usually because the word the users meant to write and what they actually typed are too far away from each other in terms of Levenshtein distance.
German examples include the word 'analpherbet' (as reported in the forum recently) and the weird but common misspelling 'legendlich' for 'lediglich' (almost 40,000 hits on Google).
I wonder if we could have a manually maintained, semicolon- or tab-separated list
for those of these cases that are brought to our attention. That could either be a separate file suggestions.txt, or maybe we could use the existing file prohibit.txt.
Example:
LanguageTool could look up a misspelled word in the first column and display the words in the remainder of the line as suggestions, ideally in the order they are given in the file. IMO the suggestions that are generated programmatically should be ignored if a misspelled word is found in the list. This would allow us to not only add missing suggestions, but also overrule misleading suggestions the software produces sometimes (mostly weird compounds that aren't really used, such as 'Brustwalze').
The text was updated successfully, but these errors were encountered:
This issue also shows our spell checker (often that's just hunspell) is far from perfect. A native speaker can see what "analpherbet" is meant to mean, and LT should be able to get that, too.
There is already a solution for that: getAdditionalTopSuggestions() in
languagetool/languagetool-language-modules/de/src/main/java/org/languagetool/rules/de/GermanSpellerRule.java
In a few cases, the algorithm for suggesting corrections does not come up with the best suggestion, or even with any suggestions at all, usually because the word the users meant to write and what they actually typed are too far away from each other in terms of Levenshtein distance.
German examples include the word 'analpherbet' (as reported in the forum recently) and the weird but common misspelling 'legendlich' for 'lediglich' (almost 40,000 hits on Google).
I wonder if we could have a manually maintained, semicolon- or tab-separated list
for those of these cases that are brought to our attention. That could either be a separate file suggestions.txt, or maybe we could use the existing file prohibit.txt.
Example:
LanguageTool could look up a misspelled word in the first column and display the words in the remainder of the line as suggestions, ideally in the order they are given in the file. IMO the suggestions that are generated programmatically should be ignored if a misspelled word is found in the list. This would allow us to not only add missing suggestions, but also overrule misleading suggestions the software produces sometimes (mostly weird compounds that aren't really used, such as 'Brustwalze').
The text was updated successfully, but these errors were encountered: