BUG: Regex should be more complete/selective #1

larsoner · 2020-07-09T17:08:04Z

It should be more selective (matching [ and maybe ensuring any i/j are immediately preceded by numbers, currently it's pretty basic:

ARRAY_LIKE_REGEX = re.compile(
    r'[\[(][0-9 ij,+\-*^\/&|\[\]]*[\]),;]'
)

@jnothman I know you work on sklearn, @agramfort mentioned this might be useful for you all, too. Any interest in improving the regex here, and maybe adopting this for sklearn?

Basically this plugin will replace E error variants:

    extraneous_whitespace,  # 201, 202
    whitespace_around_operator,  # 221, 222
    whitespace_around_comma,  # 241

With A2XX versions, where the ones above are ignored if they occur within an array-like list of (list of) numerical values. So you basically run an equivalent of flake8 --ignore E201,E202,E203,E221,E222,E241 --select=A and it should work. This came up originally for scipy (WIP PR to add this plugin here) but I imagine you all might have been annoyed about this at some point, too, so wanted to loop you in.

The text was updated successfully, but these errors were encountered:

jnothman · 2020-07-10T01:47:50Z

I'd start with readability...

ARRAY_LIKE_REGEX = re.compile(r'''(?x)
    [\[(]    
    [0-9 ij,+\-*^\/&|\[\]]*
    [\]),;]
''')

I'm surprised . is not in the above character class, nor e for exponentiation or L for long ints (not sure this is needed anymore). I don't know the purpose of some of the others, such as ;.

Does this need to operate line-by-line, or can we match a multi-line expression? That way you could better require being in [ ].

While I do think that these formats can be aesthetically better in tests etc, we are likely moving to black at Scikit-learn, so neither my nor flake8's opinion on style matters any longer!

larsoner · 2020-07-10T02:15:37Z

I'm surprised . is not in the above character class, nor e for exponentiation or L for long ints (not sure this is needed anymore). I don't know the purpose of some of the others, such as ;.

Just didn't think of these...

Does this need to operate line-by-line, or can we match a multi-line expression? That way you could better require being in [ ].

It operates on a "logical line", so yes it's a single complete expression with no newlines.

larsoner mentioned this issue Jul 9, 2020

MRG, MAINT: Allow spaces around array-like scipy/scipy#12516

Closed

3 tasks

larsoner mentioned this issue Jul 10, 2020

ENH: Better selectivity and binary ops #4

Merged

larsoner closed this as completed in #4 Jul 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Regex should be more complete/selective #1

BUG: Regex should be more complete/selective #1

larsoner commented Jul 9, 2020

jnothman commented Jul 10, 2020

larsoner commented Jul 10, 2020

BUG: Regex should be more complete/selective #1

BUG: Regex should be more complete/selective #1

Comments

larsoner commented Jul 9, 2020

jnothman commented Jul 10, 2020

larsoner commented Jul 10, 2020