Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C/C++: Some misspelled words are not detected #345

Open
ambrop72 opened this issue May 21, 2019 · 18 comments
Open

C/C++: Some misspelled words are not detected #345

ambrop72 opened this issue May 21, 2019 · 18 comments

Comments

@ambrop72
Copy link

Misspelled words which are not detected: avalible, handeled, evalulated, deciced, pressent, senting.

@ambrop72 ambrop72 changed the title Some misspelled are not detected Some misspelled words are not detected May 21, 2019
@Jason3S
Copy link
Collaborator

Jason3S commented May 22, 2019

It detected those words for me.

image

I searched all the dictionaries, the words were not found. What programming language are you using?

@ambrop72
Copy link
Author

Hi, thanks for looking into this. I'm using C++ (and the official C++ extension). I didn't do any special setup of the spell checker other than which file types are checked and adding some words to the workspace dictionary (definitely not these ones, I will check).

@Jason3S
Copy link
Collaborator

Jason3S commented May 23, 2019

It is because most people while programming in c++ glue words together: errorhandler, to account for that, the spell checker allows for compound words. Your examples include multiple valid words: dec°iced, press°ent, han°deled

@Jason3S Jason3S changed the title Some misspelled words are not detected C++: Some misspelled words are not detected May 24, 2019
@Jason3S Jason3S changed the title C++: Some misspelled words are not detected C/C++: Some misspelled words are not detected May 24, 2019
@Jason3S
Copy link
Collaborator

Jason3S commented Jun 4, 2019

I to do not think the way it currently works is ideal.

The plan is to change C/C++ compound matching to match against noun compounds instead of compounds made up of all words. errorcode, resturncode, htmlelement, messagehandler, errormessage would all be the kinds of stuff it would think is correct. This would help with suggestions as well. Things like noun{1,3} or (verb)(noun){,3}.

@kkaja123
Copy link

I am running into this issue during code reviews and it causes quite a bit of grief. Specifically, words like evalute and GetMsgSrollTime (should be GetMsgScrollTime) are not being detected.

Many developers use a naming convention to separate individual words in an identifier name (e.g. camelCase, PascalCase, and snake_case). It would be great if the extension could take advantage of this to check for misspelled words. Unfortunately, it is difficult to predict which naming convention a developer is using. Therefore, an option to control how the extension parses compound words could work well for this issue. The option would have a checklist of common compound word naming conventions (including compound words using all lowercase letters) that the extension would know to treat as compound words. If camelCase is enabled, evalUte would be okay. If camelCase is disabled, evalUte would be treated as one word, "evalute," and would be incorrect. If alllowercase is enabled, evalute would be okay. If alllowercase is disabled, evalute would be incorrect. I think you get the picture.

There's also the case where any any naming convention could be used (non-common ones). Considering something like eVaLUtE, if any naming convention is enabled, the extension would behave as it does today (does not detect individual words based on case changes). The extension would interpret that word as eVaL + UtE. I hope it's obvious that nobody would think that eVaLUtE is correctly spelled, since humans follow patterns instead of chaos.

@kit1980
Copy link

kit1980 commented Dec 17, 2019

I see the same with Python. For example, "singal" is not detected, presumably because it's "sing" + "al".

I thought that the cSpell.allowCompoundWords would control this behavior, but apparently not...

@Jason3S
Copy link
Collaborator

Jason3S commented Dec 23, 2019

@kit1980 you are right, it is because allowCompoundWords is turned on for Python and C/C++.

To turn off allowCompoundWords for a language, you need to override it at the language level:

The following will turn off compound word matching for C/C++ and Python:

    "cSpell.languageSettings": [
        {
            "languageId": "c,cpp,python",
            "allowCompoundWords": false
        }
    ]

@jharrang
Copy link

@kit1980 you are right, it is because allowCompoundWords is turned on for Python and C/C++.

To turn off allowCompoundWords for a language, you need to override it at the language level:

The following will turn off compound word matching for C/C++ and Python:

    "cSpell.languageSettings": [
        {
            "languageId": "c,cpp,python",
            "allowCompoundWords": false
        }
    ]

Thanks for the fix! This should really be the default behavior IMHO (or at least the default behavior needs some case-matching refinement). Add me to the list of people who pushed code with typos because of this.

@Jason3S
Copy link
Collaborator

Jason3S commented Jan 22, 2020

My plan is to turn allowCompoundWords off by default.
To do that, I have been working on a way to define compoundable words.
It is a simple syntax:

error*
*code
+infix+
+msg

* - optional compound
+ - required compound

With this definition valid words are:

error, code, errorcode, errormsg, errorinfixmsg

The follow are some of the not allowed words:

codemsg, msg

@PEZ
Copy link

PEZ commented Mar 2, 2022

Is this the reason why servie isn't correctly checked? In a plain text file:

spellcheck-servie.mp4

@PEZ
Copy link

PEZ commented Mar 2, 2022

Is this the reason why servie isn't correctly checked?

Yes it was. Sorry for the noice.

@Jason3S
Copy link
Collaborator

Jason3S commented Mar 2, 2022

@PEZ,

You can use the cspell trace command to check.

npx cspell trace --language-id=cpp servie

image

@PEZ
Copy link

PEZ commented Mar 2, 2022

Ah. sweet!

@mwermelinger
Copy link

The setting for compound words tell us that it might make misspelled words look correct. It would be nice to also tell us the setting can be disabled per language. I was getting frustrated with all the undetected typos in Markdown, like insructions, but disabling compound words for Markdown helps a lot. I'd rather have false positives (flagged correct word) than false negatives (misspelled word not flagged). Is there also a way to disable in code comments, i.e. compound words would only be allowed in code, not in natural language text?

@Jason3S
Copy link
Collaborator

Jason3S commented Feb 6, 2023

@mwermelinger,

allowCompoundWords is now off by default. It has been the cause of many complaints.

I continue to strongly urge not setting allowCompoundWords to true.

I think a better practice is to just add the common compound words to a custom dictionary.

It is possible to define a custom compound dictionary:

cspell.config.yaml

dictionaryDefinitions:
  - name: code-compounds
    description: Custom Dictionary for compound words
    path: ./compound-words.txt
    addWords: true

languageSettings:
  - caseSensitive: false
    languageId: cpp,c,python,javascript
    dictionaries:
      - code-compounds

compound-words.txt

*code*
*error*
*errors*
*help*
+end
begin+
+middle+
array

Only words with * or + will be combined.

  • * - optional compound
  • + - only part of a compound

@mwermelinger
Copy link

Jason, thanks for the reply but I'm afraid I don't understand the approach of having to explicitly list the compound words. How would cSpell accept identifiers like dayTimeUserMessage, unless we add all those (and many other) words to the dictionary? Seems a very labour intensive approach to add words to the dictionary as needed, unless I'm missing some point. Thanks in advance for any clarification.

@mwermelinger
Copy link

Forget it. Senior moment: snake and camel case are not considered compound words.

@Jason3S
Copy link
Collaborator

Jason3S commented Feb 6, 2023

snake and camel case are not considered compound words.

Exactly. The spell checker is able to split snake and camel case. It even will handle ERRORcode and ERRORCode. With a identifier like ERRORCode it will try both (ERRORC, ode) and (ERROR, code). It will handle IFrame, but not iframe.

Using the compound syntax above the following is considered correct:

  • errorcode, codeerrors, errorserrorerrros, beginend, beginmiddleend, begincode

Not accepted:

  • codebegin, endcode, enderror

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants