Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of --regex flag in use #391

Open
luzpaz opened this issue Mar 10, 2018 · 5 comments · May be fixed by #1482
Open

Example of --regex flag in use #391

luzpaz opened this issue Mar 10, 2018 · 5 comments · May be fixed by #1482

Comments

@luzpaz
Copy link
Collaborator

luzpaz commented Mar 10, 2018

Can someone demonstrate how to use the codespell --regex= flag?

@peternewman
Copy link
Collaborator

See for example here:
https://github.com/OpenLightingProject/ola/blob/master/.travis-ci.sh#L147
codespell --check-filenames --quiet 2 --regex "[a-zA-Z0-9][\\-'a-zA-Z0-9]+[a-zA-Z0-9]"

This will check words within underscore separated variable names. It's not perfect, but it's the best I've got so far, I think due to some Python regex limitations.

@peternewman
Copy link
Collaborator

Thinking about it, this will only match three letter or more words. But I've a feeling I couldn't do +? for non-greedy in Python.

@dwo
Copy link
Contributor

dwo commented Apr 14, 2020

I was surprised that the default regex includes _ as part of words (via \w which is [a-zA-Z0-9_]):

word_regex_def = u"[\\w\\-'’`]+"

It's a surprising default since so much code uses snake_case to separate words in variable names.

It seems like using -r "[a-zA-Z0-9\-'’\`]+" (the default with an unrolled \w and dropping the underscore) would also achieve checking words separated by underscores?

I'll see if I can get a moment to write this up into the README.

@larsoner
Copy link
Member

It seems reasonable to me to change the default actually, and document how to get the old behavior back for people who end up needing it

dwo added a commit to dwo/codespell that referenced this issue Apr 14, 2020
The `_` character was included via the `\w` character class. The default
regular expression now unrolls the `\w` and drops `_` from it.

This caused words in `snake_case` variables to be missed by codespell.

Resolves codespell-project#391
@dwo dwo linked a pull request Apr 14, 2020 that will close this issue
dwo added a commit to dwo/codespell that referenced this issue Apr 14, 2020
The `_` (underscore) character was included in the default regular
expression for words via the `\w` character class. The default now
unrolls `\w` and drops `_` from it.

This caused words in `snake_case` variables to be missed by codespell.

Resolves codespell-project#391
dwo added a commit to dwo/codespell that referenced this issue Apr 14, 2020
The `_` (underscore) character was included in the default regular
expression for words via the `\w` character class. The default now
unrolls `\w` and drops `_` from it.

This caused words in `snake_case` variables to be missed by codespell.

Resolves codespell-project#391
dwo added a commit to dwo/codespell that referenced this issue Apr 14, 2020
The `_` (underscore) character was included in the default regular
expression for words via the `\w` character class. The default now
unrolls `\w` and drops `_` from it.

This caused words in `snake_case` variables to be missed by codespell.

Resolves codespell-project#391
dwo added a commit to dwo/codespell that referenced this issue Apr 14, 2020
The `_` (underscore) character was included in the default regular
expression for words via the `\w` character class. The default now
unrolls `\w` and drops `_` from it.

This caused words in `snake_case` variables to be missed by codespell.

Resolves codespell-project#391
@Gabrielcarvfer
Copy link

Gabrielcarvfer commented Feb 23, 2023

See for example here: https://github.com/OpenLightingProject/ola/blob/master/.travis-ci.sh#L147 codespell --check-filenames --quiet 2 --regex "[a-zA-Z0-9][\\-'a-zA-Z0-9]+[a-zA-Z0-9]"

Thank you for the example. Ended up extending it a bit to catch subwords for [c|C]amelCase and snake_case. 😄
(?<![a-z])[a-z'`]+|[A-Z][a-z'`]*|[a-z]+'[a-z]*|[a-z]+(?=[_-])|[a-z]+(?=[A-Z])|\d+

https://regex101.com/r/0LFI8a/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants