-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: a simplified and generalized invalid-name config #7305
Comments
Thanks for the clear proposal @bukzor! I'm not sure though whether I agree with the changes you propose though. I understand that regular expressions can be hard to grasp for (new) developers, but adding to the complexity by adding 4 new subsettings for each type of "nameable" doesn't seem like it solves the issue of being "overly rigid and overly flexible". |
Thank you for the recap @bukzor. I think building on what exists right now and fixing the current issues is safer in this case. Redesigning something entirely is a very risky endeavor. We have insight on what is wrong with what we currently have, but not what will be wrong on something we design from scratch. Seeing how much pylint is used there will be something wrong. Every-time we add a new checker we can expect issues to pop up. This is the case for checker affecting a particular statement like I think the major pain point of So in my opinion what we need is fix #2018 the simplest way we can : For that we need to separate short name and a regex that does not match logically in the code. This is already a lot of work to get it right, as we need to not break any existing disable/enable, refactor a lot of code and functional tests and upgrade the regex for all the name style. I started to work on it for maybe 10 hours before my disk crashed and loosing the code and it was half done at best. Mostly copy pasting my comment from the previous discussion:
|
Thanks for consideration. You can close this if you like. I may reopen this discussion when/if other changes have already landed. |
Perhaps what I failed to explain is that these four "subsettings" of each named style already exist, in the current
I believe you underestimate this issue dramatically. I estimate more than 75% of self-described "experienced developers" would fail to write a bug free regex for one of these, given three attempts. While regular expressions are very compact, each token (very nearly each character) doubles the state space, which gets beyond human capability very quickly. Consider the number of responsibilities and edge cases handled by the default name regular expression: I've probably overstated this point dramatically, but I'm only trying to be clear :)
I believe the total complexity remains the same. It's only that complexity has been shifted away from the user and into the system (where it ideally belongs). Each of the four new subsettings is entirely simple (has a singular, obvious purpose, as opposed to complex).
By boiling down the naming schemes to their essential, atomic parameters and leaving all other considerations to the framework, the system is quite flexible (all current naming schemes and millions more can be represented) but not overly so: it's no longer possible to accidentally disallow
That's only an optional part of this proposal. It's not necessary to remove the -rgx configuration. I only meant to say that I believe it would serve no purpose and so could be removed. That said, the user-facing gains are relatively minor and the implementation cost is not. |
I think you've neglected to consider that the four proposed options have just four degrees of freedom, while the current regular expressions have (much more than) thousands of degrees of freedom. This means that in the proposed scheme it's much easier to consider (and test) the various possiblities, because there are so many orders of magnitude fewer.
I didn't add this because I thought it was incidental, and the proposal already got long, but this proposal fixes #2018 very neatly. Because allowed-letters and allowed-first-letter are parameters (e.g. allowed-letters=
I could work on that before the rest, then, if you like. |
I think your reference to degrees of freedom is very helpful for the discussion. What I'm not particularly keen on in your current proposal is that we remove the option with the unlimited degrees of freedom. I think that customisability is a crucial part of Do you think there would be a way to have |
Yeah, that and the degree of freedom (that we were dumping on the user so it's not "our" problem) convinced me. Also two small regexes still give a as much freedom, unless I'm mistaken. Nothing prevent someone from just dumping their arbitrarily complex regex in the allowed characters option, they'll just have to migrate the first letter handling in the first letter option and the length handling in the length options.
It would be great for sure.
As long as there are default values for PEP8 and most users do not use those options, I don't think it's an issue. Maybe at first we could start by:
Then later (next minor?):
Then in 3.0:
I think we need to maintain the old regex for some time but maintaining 36 regexes forever is not something I look forward to 😄 |
I just though that the regex allowed to have different length criteria for test functions (i.e. allow super verbose function name if it starts by I'm probably missing other use cases. Maybe an argument to keep the old regex and not deprecate. |
Could pylint adapt naming conventions of mainstream packages and add those as defaults to an exception list whenever these are imported? For example, consider the purpose of data science and machine learning, as in this issue. A typical data science project will use some of the following projects, In case a user wants scipy defaults, except for example "f", there should be a possibility for that. |
Current problem
The current configuration system for naming style in pylint can be seen as simultaneously overly rigid and overly flexible. There are two distinct ways to configure naming schemes. Primarily there is a named-scheme selection where users choose among lower_case UPPER_CASE PascalCase, etc., and there is a regex-based (
-rgx
) configuration where the user meant to write a regular expression that matches all (and only) permissible variable names.As a concrete example:
The named-style is convenient for users, but gives no clarity as to what exactly will match, and admits no ability to make small tweaks to its definition. The regex configuration is its near opposite, being completely unambiguous and easy to modify but painful-to-impossible to use for the average human.
Desired solution
This is my (tentative!) proposal: (keep in mind that all details will/would be modified to reflect maintainers' requirements)
Let's move this configuration out of "basic" to a new, separate "naming" namespace.
Or, alternatively: (I prefer the above)
Each naming style is be defined by just four characteristics: allowed characters, allowed first-letter character, minimum and maximum length. If we can expose those quadruplets to the configuration, then the configuration is easy-to-use (users can largely ignore these fine details), unambiguous, easy to modify, and there's no (obvious) need to expose users to the full footgun of regular expressions.
Note: The "character" configs are required to be a single character-class (i.e. a single-char regex beginning and ending with square brackets). This limitation allows excellent error messaging in the case of a invalid-name error, but also enables programmatic use.
This obviously enables the potential (but not requirement) to allow users to define custom named naming-schemes.
All naming schemes would then be defined as (approximately)
f'_{{0,2}}{allowed-first-character}{allowed-characters}{{{min-length},{max-length}}}'
I believe the regular-expression-based configuration could then be phased out entirely, but could of course be retained if wanted as a fifth, optional
regex
attribute of the naming scheme with defaultNone
.Additional context
This idea was split out of #3704, which considers revamping the invalid-name checker.
I (@bukzor) am volunteering to implement this change, if ratified.
The text was updated successfully, but these errors were encountered: