Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: re discourage single char class #24123

Conversation

tylerjereddy
Copy link

  • this seemed simple enough not to justify a cognate issue,
    but it is a regex matter, so maybe it will need one...

  • there are a few cases in the re module documentation
    that suggest "indirectly" escaping single metacharacters
    by placing them in a character class; i.e., [|] as a viable
    alternative to \|

  • in general, this is likely best to avoid because you can
    incur the overhead of a character class with none of the
    benefits of a multi-character character class

  • this is specifically discussed in the more detailed
    reference cited by the re module docs; in particular,
    see Chapter 6 of:
    Friedl, Jeffrey. Mastering Regular Expressions. 3rd ed.,
    O’Reilly Media, 2009.

  • on the topic of using classes to escape metacharacters, which is
    slightly more justified than single (non-meta)character classes,
    the author notes ...it's probably because the author didn't know about escaping...
    While that may be a little harsh,
    avoiding the overhead still makes sense to me.

  • the docs already cover the fact that most metacharacters
    are deactivated inside character classes, so that general information
    is still presented to the user clearly, but now without explicitly
    suggesting that the mechanism be used for single (meta)characters

* there are a few cases in the `re` module documentation
that suggest "indirectly" escaping single metacharacters
by placing them in a character class

* in general, this is likely best to avoid because you can
incur the overhead of a character class with none of the
benefits of a multi-character character class

* this is specifically discussed in the more detailed
reference cited by the `re` module docs; in particular,
see Chapter 6 of:
Friedl, Jeffrey. Mastering Regular Expressions. 3rd ed.,
O’Reilly Media, 2009.

* on the topic of using classes to escape metacharacters, which is
slightly more justified than single (non-meta)character classes,
the author notes `...it's probably because the
author didn't know about escaping...` While that may be a little harsh,
avoiding the overhead still makes sense to me.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@tylerjereddy

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@github-actions
Copy link

github-actions bot commented Feb 7, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Feb 7, 2021
@serhiy-storchaka
Copy link
Member

There is no any overhead, because \| and [|] look identical to the RE engine:

>>> re.compile(r'\|[|]', re.DEBUG)
LITERAL 124
LITERAL 124

 0. INFO 10 0b11 2 2 (to 11)
      prefix_skip 2
      prefix [0x7c, 0x7c] ('||')
      overlap [0, 1]
11: LITERAL 0x7c ('|')
13. LITERAL 0x7c ('|')
15. SUCCESS
re.compile('\\|[|]', re.DEBUG)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review docs Documentation in the Doc dir stale Stale PR or inactive for long period of time. topic-regex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants