Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected capital letters returned for certain capitalized misspellings #20

Closed
tvquizphd opened this issue Jan 2, 2021 · 4 comments
Closed
Labels
👀 no/external This makes more sense somewhere else

Comments

@tvquizphd
Copy link

tvquizphd commented Jan 2, 2021

TLDR: see PR 38 and PR 39 that I've opened against nspell.

Subject of the issue

Note: I've changed the name of this issue from "Mysterious capital E returned for misspelled 5-letter nouns with a single capital T" to "Unexpected capital letters returned for certain capitalized misspellings," and I've edited this post slightly to reflect the broader scope.

Background

I had originally found this error for capitalized variants of 16 dictionary-en words: "tepee", "thane", "thole", "three", "throe", "tilde", "tinge", "tonne", "toque", "tribe", "trike", "trope", "trove", "truce", "tuque", and "twine". To give one notable example, any misspelling matching this RegEx /^Thre[f-ln-racuvxyz]$/ is corrected to "ThreE" instead of "Three."

Edit The below algorithm produces misspellings of the original 16 dictionary words, but I have since found 190 additional 5 letter words that occasionally occur in retext-spell vfile messages with extraneous capital letters. I have saved these new words and the list of misspellings needed to generate them in a json file bundled with the gist for this issue.

Generating examples

The gist to reproduce this issue tests misspellings generated as such:

  • Capitalize any 5-letter dictionary-en word starting with "T" and ending in "e"
  • Ensure that the word has affix code /MS in index.dic
  • Ensure that the word is not Torte (due to the second "t")
  • Replace the final "e" with a single letter (except "t")
  • Optionally add the plural "s" or the possessive "'s"

If the misspellings do not match a different dictionary word more closely than the originally selected 5-letter word, then the first "expected" value in the vfile message emitted by retext-spell will be the originally selected 5-letter word with final "e" mistakenly capitalized as "E".

Edit Without getting into the details of nspell's keyboard groups, there is no easy way to generate the 190 newly discovered 5-letter words that do not match the misspellings generated with the above method.

Your environment

  • OS: MacOS Sierra 10.12.6
  • Packages: dictionary-en==3.0.1, retext==7.0.1, retext-spell==4.0.0
  • Env: node==15.3.0, npm==7.0.14

Steps to reproduce

I've created a gist.

Execute the following commands to download the gist and install dependencies:

git clone https://gist.github.com/a4e2ff11cd868a5b40a65b3c53c8574a.git
cd a4e2ff11cd868a5b40a65b3c53c8574a
npm install

Run one of the following commands to test with various suffixes:

  • npm run test "*" (for no suffix)
  • npm run test "*s" (for the plural)
  • npm run test "*'s" (for the possessive)

Side note In contrast to the examples that produce the bug defined in this issue, you can run npm run test "t", npm run test "ts", and npm run test "t's" to see the results of misspellings that fail to produce the bug due to the presence of a lowercase "t" in the misspelling.

Expected behavior

All the logged vfile message reasons should show suggested values without unusual capitalization. The hundreds of misspellings tested with npm run test "*", npm run test "*s", and npm run test "*'s" should generate suggested values with lowercase "e" characters. For example, the first tested misspelling Tepea should generate a top suggested value of "Tepee". The plural Tepeas should generate a top suggested value of "Tepees". The possessive Tepea's should generate a top suggested value of "Tepee's".

Actual behavior

The hundreds of misspellings tested with npm run test "*", npm run test "*s", and npm run test "*'s" all generate suggested values with uppercase "E" characters. For example, the first tested misspelling Tepea generates a top suggested value of "TepeE". The plural Tepeas generates a top suggested value of "TepeEs". The possessive Tepea's generates a top suggested value of "TepeE's".

@tvquizphd tvquizphd added 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Jan 2, 2021
@tvquizphd
Copy link
Author

tvquizphd commented Jan 2, 2021

Along a similar vein, these misspellings of Tinpot result in an unexpected capitalized "O":

  1:1-1:6  warning  `Tinpb` is misspelt; did you mean `TinpOt` ... tinpb  retext-spell
  1:1-1:6  warning  `Tinpc` is misspelt; did you mean `TinpOt` ... tinpc  retext-spell
  1:1-1:6  warning  `Tinpd` is misspelt; did you mean `TinpOt` ... tinpd  retext-spell
  1:1-1:6  warning  `Tinpf` is misspelt; did you mean `TinpOt` ... tinpf  retext-spell
  1:1-1:6  warning  `Tinph` is misspelt; did you mean `TinpOt` ... tinph  retext-spell
  1:1-1:6  warning  `Tinpj` is misspelt; did you mean `TinpOt` ... tinpj  retext-spell
  1:1-1:6  warning  `Tinpl` is misspelt; did you mean `TinpOt` ... tinpl  retext-spell
  1:1-1:6  warning  `Tinpm` is misspelt; did you mean `TinpOt` ... tinpm  retext-spell
  1:1-1:6  warning  `Tinpq` is misspelt; did you mean `TinpOt` ... tinpq  retext-spell
  1:1-1:6  warning  `Tinpv` is misspelt; did you mean `TinpOt` ... tinpv  retext-spell
  1:1-1:6  warning  `Tinpx` is misspelt; did you mean `TinpOt` ... tinpx  retext-spell

On the other hand, these misspellings of Tinpot suggest the correct capitalization:

  1:1-1:6  warning  `Tinpo` is misspelt; did you mean `Tinpot`?  tinpo  retext-spell
  1:1-1:6  warning  `Tinpp` is misspelt; did you mean `Tinpot` ... tinpp  retext-spell
  1:1-1:6  warning  `Tinpu` is misspelt; did you mean `Tinpot` ... tinpu  retext-spell
  1:1-1:6  warning  `Tinpz` is misspelt; did you mean `Tinpot` ... tinpz  retext-spell

@tvquizphd
Copy link
Author

tvquizphd commented Jan 2, 2021

It seems this problem is broader than initially recognized. I've discovered 190 additional 5-letter-words suggested by retext-spell that include single unexpected capital letters.

I've included a json file in the gist with one key per result returned by retext-spell with a single unexpected capital letter. Each key lists the misspellings to produce the key. Each misspelling derives from replacing the middle character in a 5-letter dictionary word.

I've counted 39 unexpected uppercase "R"'s, 36 unexpected "C"'s, 34 unexpected "B"'s, 27 unexpected "T"'s, 15 unexpected "E"'s, and ten or fewer unexpected "M"'s, "V"'s, "O"'s, "W"'s, "Y"'s, "N"'s, "P"'s or "U"'s.

@tvquizphd tvquizphd changed the title Mysterious capital E returned for misspelled 5-letter nouns with a single capital T Unexpected capital letters returned for certain capitalized misspellings Jan 3, 2021
@tvquizphd
Copy link
Author

tvquizphd commented Jan 3, 2021

TLDR: see nspell issue 37 and nspell issue 41.

@wooorm
Copy link
Member

wooorm commented Feb 15, 2021

Closing as the nspell PRs are released, and I’m assuming they fixed this!

@wooorm wooorm closed this as completed Feb 15, 2021
@wooorm wooorm added 👀 no/external This makes more sense somewhere else and removed 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👀 no/external This makes more sense somewhere else
Development

No branches or pull requests

2 participants