Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior of re module when VERBOSE flag is set #76037

Closed
bkline mannequin opened this issue Oct 23, 2017 · 4 comments
Closed

Unexpected behavior of re module when VERBOSE flag is set #76037

bkline mannequin opened this issue Oct 23, 2017 · 4 comments
Labels
stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error

Comments

@bkline
Copy link
Mannequin

bkline mannequin commented Oct 23, 2017

BPO 31856
Nosy @bkline, @ezio-melotti
Files
  • regex-repro.py: Repro case for issue
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-10-23.23:55:55.109>
    created_at = <Date 2017-10-23.23:25:05.230>
    labels = ['expert-regex', 'invalid', 'type-bug', 'library']
    title = 'Unexpected behavior of re module when VERBOSE flag is set'
    updated_at = <Date 2017-10-24.03:36:50.013>
    user = 'https://github.com/bkline'

    bugs.python.org fields:

    activity = <Date 2017-10-24.03:36:50.013>
    actor = 'bkline'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-10-23.23:55:55.109>
    closer = 'mrabarnett'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2017-10-23.23:25:05.230>
    creator = 'bkline'
    dependencies = []
    files = ['47232']
    hgrepos = []
    issue_num = 31856
    keywords = []
    message_count = 4.0
    messages = ['304849', '304852', '304853', '304856']
    nosy_count = 3.0
    nosy_names = ['bkline', 'ezio.melotti', 'mrabarnett']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue31856'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @bkline
    Copy link
    Mannequin Author

    bkline mannequin commented Oct 23, 2017

    According to the documentation of the re module, "When this flag [re.VERBOSE] has been specified, whitespace within the RE string is ignored, except when the whitespace is in a character class or preceded by an unescaped backslash; this lets you organize and indent the RE more clearly. This flag also lets you put comments within a RE that will be ignored by the engine; comments are marked by a '#' that’s neither in a character class [n]or preceded by an unescaped backslash." (I'm quoting from the 3.6.3 documentation, but I've tested with several versions of Python, as indicated in the issue's Versions field, all with the same results.)

    Given this description, I would have expected the output for each of the pairs of calls to findall() in the attached repro code to be the same, but that is not what's happening. In the case of the first pair of calls, for example, the non-verbose version finds two more matches than the verbose version, even though the regular expression is identical for the two calls, ignoring whitespace and comments in the expression string. Similar problems appear with the other two pairs of calls.

    Here's the output from the attached code:

    ['&', '(', '/Term/SemanticType/@cdr:ref', '==']
    ['/Term/SemanticType/@cdr:ref', '==']
    [' XXX ']
    []
    [' XXX ']
    []

    It would seem that at least one of the following is true:

    1. the module is not behaving as it should
    2. the documentation is wrong
    3. I have not understood the documentation correctly

    I'm happy for it to be #3, as long as someone can explain what I have not understood.

    @bkline bkline mannequin added stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error labels Oct 23, 2017
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Oct 23, 2017

    Your verbose examples put the pattern into raw triple-quoted strings, which is OK, but their first character is a backslash, which makes the next character (a newline) an escaped literal whitespace character. Escaped whitespace is significant in a verbose pattern.

    @mrabarnett mrabarnett mannequin closed this as completed Oct 23, 2017
    @mrabarnett mrabarnett mannequin added the invalid label Oct 23, 2017
    @bkline
    Copy link
    Mannequin Author

    bkline mannequin commented Oct 24, 2017

    I had been under the impression that "escaped" in this context meant that an escape character (the backslash) was part of the string value for the regular expression (there's a little bit of overloading going on with that word). Thanks for setting me straight.

    @bkline
    Copy link
    Mannequin Author

    bkline mannequin commented Oct 24, 2017

    The light finally comes on. I actually *was* putting a backslash into the string value, with the raw flag (which is, of course, what you were trying to tell me). Thanks for your patience. :-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants