-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected capital letters returned for certain capitalized misspellings #20
Comments
Along a similar vein, these misspellings of
On the other hand, these misspellings of
|
It seems this problem is broader than initially recognized. I've discovered 190 additional 5-letter-words suggested by I've included a json file in the gist with one key per result returned by I've counted 39 unexpected uppercase "R"'s, 36 unexpected "C"'s, 34 unexpected "B"'s, 27 unexpected "T"'s, 15 unexpected "E"'s, and ten or fewer unexpected "M"'s, "V"'s, "O"'s, "W"'s, "Y"'s, "N"'s, "P"'s or "U"'s. |
TLDR: see nspell issue 37 and nspell issue 41. |
Closing as the nspell PRs are released, and I’m assuming they fixed this! |
TLDR: see PR 38 and PR 39 that I've opened against
nspell
.Subject of the issue
Note: I've changed the name of this issue from "Mysterious capital E returned for misspelled 5-letter nouns with a single capital T" to "Unexpected capital letters returned for certain capitalized misspellings," and I've edited this post slightly to reflect the broader scope.
Background
I had originally found this error for capitalized variants of 16 dictionary-en words: "tepee", "thane", "thole", "three", "throe", "tilde", "tinge", "tonne", "toque", "tribe", "trike", "trope", "trove", "truce", "tuque", and "twine". To give one notable example, any misspelling matching this RegEx
/^Thre[f-ln-racuvxyz]$/
is corrected to "ThreE" instead of "Three."Edit The below algorithm produces misspellings of the original 16 dictionary words, but I have since found 190 additional 5 letter words that occasionally occur in
retext-spell
vfile
messages with extraneous capital letters. I have saved these new words and the list of misspellings needed to generate them in a json file bundled with the gist for this issue.Generating examples
The gist to reproduce this issue tests misspellings generated as such:
dictionary-en
word starting with "T" and ending in "e"/MS
inindex.dic
Torte
(due to the second "t")If the misspellings do not match a different dictionary word more closely than the originally selected 5-letter word, then the first "expected" value in the
vfile
message emitted byretext-spell
will be the originally selected 5-letter word with final "e" mistakenly capitalized as "E".Edit Without getting into the details of
nspell
's keyboard groups, there is no easy way to generate the 190 newly discovered 5-letter words that do not match the misspellings generated with the above method.Your environment
Steps to reproduce
I've created a gist.
Execute the following commands to download the gist and install dependencies:
Run one of the following commands to test with various suffixes:
npm run test "*"
(for no suffix)npm run test "*s"
(for the plural)npm run test "*'s"
(for the possessive)Side note In contrast to the examples that produce the bug defined in this issue, you can run
npm run test "t"
,npm run test "ts"
, andnpm run test "t's"
to see the results of misspellings that fail to produce the bug due to the presence of a lowercase "t" in the misspelling.Expected behavior
All the logged
vfile
message reasons should show suggested values without unusual capitalization. The hundreds of misspellings tested withnpm run test "*"
,npm run test "*s"
, andnpm run test "*'s"
should generate suggested values with lowercase "e" characters. For example, the first tested misspellingTepea
should generate a top suggested value of "Tepee". The pluralTepeas
should generate a top suggested value of "Tepees". The possessiveTepea's
should generate a top suggested value of "Tepee's".Actual behavior
The hundreds of misspellings tested with
npm run test "*"
,npm run test "*s"
, andnpm run test "*'s"
all generate suggested values with uppercase "E" characters. For example, the first tested misspellingTepea
generates a top suggested value of "TepeE". The pluralTepeas
generates a top suggested value of "TepeEs". The possessiveTepea's
generates a top suggested value of "TepeE's".The text was updated successfully, but these errors were encountered: