Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regex should be more complete/selective #1

Closed
larsoner opened this issue Jul 9, 2020 · 2 comments · Fixed by #4
Closed

BUG: Regex should be more complete/selective #1

larsoner opened this issue Jul 9, 2020 · 2 comments · Fixed by #4

Comments

@larsoner
Copy link
Owner

larsoner commented Jul 9, 2020

It should be more selective (matching [ and maybe ensuring any i/j are immediately preceded by numbers, currently it's pretty basic:

ARRAY_LIKE_REGEX = re.compile(
    r'[\[(][0-9 ij,+\-*^\/&|\[\]]*[\]),;]'
)

@jnothman I know you work on sklearn, @agramfort mentioned this might be useful for you all, too. Any interest in improving the regex here, and maybe adopting this for sklearn?

Basically this plugin will replace E error variants:

    extraneous_whitespace,  # 201, 202
    whitespace_around_operator,  # 221, 222
    whitespace_around_comma,  # 241

With A2XX versions, where the ones above are ignored if they occur within an array-like list of (list of) numerical values. So you basically run an equivalent of flake8 --ignore E201,E202,E203,E221,E222,E241 --select=A and it should work. This came up originally for scipy (WIP PR to add this plugin here) but I imagine you all might have been annoyed about this at some point, too, so wanted to loop you in.

@jnothman
Copy link

I'd start with readability...

ARRAY_LIKE_REGEX = re.compile(r'''(?x)
    [\[(]    
    [0-9 ij,+\-*^\/&|\[\]]*
    [\]),;]
''')

I'm surprised . is not in the above character class, nor e for exponentiation or L for long ints (not sure this is needed anymore). I don't know the purpose of some of the others, such as ;.

Does this need to operate line-by-line, or can we match a multi-line expression? That way you could better require being in [ ].

While I do think that these formats can be aesthetically better in tests etc, we are likely moving to black at Scikit-learn, so neither my nor flake8's opinion on style matters any longer!

@larsoner
Copy link
Owner Author

I'm surprised . is not in the above character class, nor e for exponentiation or L for long ints (not sure this is needed anymore). I don't know the purpose of some of the others, such as ;.

Just didn't think of these...

Does this need to operate line-by-line, or can we match a multi-line expression? That way you could better require being in [ ].

It operates on a "logical line", so yes it's a single complete expression with no newlines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants