Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve suggestions when short compounds are available #2092

Open
Jason3S opened this issue Dec 12, 2021 · 2 comments
Open

Improve suggestions when short compounds are available #2092

Jason3S opened this issue Dec 12, 2021 · 2 comments

Comments

@Jason3S
Copy link
Collaborator

Jason3S commented Dec 12, 2021

Info

When make suggestion for languages like Estonian, lots of improbable words are suggested because very short compounds are possible. These suggestion even prevent showing the correct suggestion.

There are two issues here:

  1. The number of compounds allowed are not currently limited.
  2. Short compound segments are too cheap (longer words should be preferred).
@ssbarnea
Copy link
Contributor

ssbarnea commented Jul 28, 2022

I was about to file a bug for publishin as being considered ok when ``allowCompoundWords: true` are enabled, and apparently this should give a hint:

pub+lis+hin * en_us*               ../../../.nvm/versions/node/v17.6.0/lib/n...e_modules/@cspell/dict-en_us/en_US.trie.gz

Missing a letter from a word is a very common type but in that case it seems that enabling compound words works against us.

I have other examples but I considered this relevant. Can we do something to avoid this category of problems?

I was thinking that if we could have an extra setting that would allow compound words only if they contain one word with at least 4 chars would reduce considerably the number of errors.

@Jason3S
Copy link
Collaborator Author

Jason3S commented Jul 28, 2022

@ssbarnea,

Sorry for not being clear, this issue stems from Hunspell dictionaries that has flags for marking words as compoundable (begin, end, middle). The spell checker supports them. But, if care is not taken by the dictionary writer, unusual and unhelpful suggestions can result.

allowCompoundWords is something else, it allows combining any combination of words found in the dictionary together. It was designed to reduce the number of false positives from the common practice of programmers to just glue words together when making variable and function names.

allowCompoundWords has been a never ending source of issues. As a result, it has been removed from all standard dictionary definitions and its use discouraged. It is better to create a common word compounds for programmers dictionary that would have an explicit list of allowed compounds.

Please note, custom dictionaries support compound annotations.

compound-word-list.txt

*error*
*code*

This allows things like: errorcode and codeerror. But also errorerrorerror.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants