List with hard-coded spelling suggestions for some cases? #679

janschreiber · 2017-02-28T22:35:28Z

In a few cases, the algorithm for suggesting corrections does not come up with the best suggestion, or even with any suggestions at all, usually because the word the users meant to write and what they actually typed are too far away from each other in terms of Levenshtein distance.
German examples include the word 'analpherbet' (as reported in the forum recently) and the weird but common misspelling 'legendlich' for 'lediglich' (almost 40,000 hits on Google).
I wonder if we could have a manually maintained, semicolon- or tab-separated list
for those of these cases that are brought to our attention. That could either be a separate file suggestions.txt, or maybe we could use the existing file prohibit.txt.
Example:

analpherbet; Analphabet
analpherbeten; Analphabeten; Analphabetin
legendlich; lediglich; leg endlich
mistreiter; Mitstreiter

LanguageTool could look up a misspelled word in the first column and display the words in the remainder of the line as suggestions, ideally in the order they are given in the file. IMO the suggestions that are generated programmatically should be ignored if a misspelled word is found in the list. This would allow us to not only add missing suggestions, but also overrule misleading suggestions the software produces sometimes (mostly weird compounds that aren't really used, such as 'Brustwalze').

The text was updated successfully, but these errors were encountered:

milekpl · 2017-04-25T11:04:20Z

There is a very simple solution for Polish already in place:

Use SimpleReplaceRule to create replacements for popular spelling mistakes in rules/replace.txt.
Use ignore.txt in hunspell folder to list these popular spelling mistakes.

You could write up a script to populate both files from a list and a JUnit test to check that you don't get additional replacements..

danielnaber · 2017-06-01T18:51:11Z

This issue also shows our spell checker (often that's just hunspell) is far from perfect. A native speaker can see what "analpherbet" is meant to mean, and LT should be able to get that, too.

janschreiber · 2017-07-15T11:49:06Z

There is already a solution for that: getAdditionalTopSuggestions() in
languagetool/languagetool-language-modules/de/src/main/java/org/languagetool/rules/de/GermanSpellerRule.java

janschreiber added the enhancement label Feb 28, 2017

janschreiber closed this as completed Jul 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List with hard-coded spelling suggestions for some cases? #679

List with hard-coded spelling suggestions for some cases? #679

janschreiber commented Feb 28, 2017 •

edited

milekpl commented Apr 25, 2017

danielnaber commented Jun 1, 2017

janschreiber commented Jul 15, 2017

List with hard-coded spelling suggestions for some cases? #679

List with hard-coded spelling suggestions for some cases? #679

Comments

janschreiber commented Feb 28, 2017 • edited

milekpl commented Apr 25, 2017

danielnaber commented Jun 1, 2017

janschreiber commented Jul 15, 2017

janschreiber commented Feb 28, 2017 •

edited