New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior of re module when VERBOSE flag is set #76037
Comments
According to the documentation of the re module, "When this flag [re.VERBOSE] has been specified, whitespace within the RE string is ignored, except when the whitespace is in a character class or preceded by an unescaped backslash; this lets you organize and indent the RE more clearly. This flag also lets you put comments within a RE that will be ignored by the engine; comments are marked by a '#' that’s neither in a character class [n]or preceded by an unescaped backslash." (I'm quoting from the 3.6.3 documentation, but I've tested with several versions of Python, as indicated in the issue's Given this description, I would have expected the output for each of the pairs of calls to findall() in the attached repro code to be the same, but that is not what's happening. In the case of the first pair of calls, for example, the non-verbose version finds two more matches than the verbose version, even though the regular expression is identical for the two calls, ignoring whitespace and comments in the expression string. Similar problems appear with the other two pairs of calls. Here's the output from the attached code: ['&', '(', '/Term/SemanticType/@cdr:ref', '=='] It would seem that at least one of the following is true:
I'm happy for it to be #3, as long as someone can explain what I have not understood. |
Your verbose examples put the pattern into raw triple-quoted strings, which is OK, but their first character is a backslash, which makes the next character (a newline) an escaped literal whitespace character. Escaped whitespace is significant in a verbose pattern. |
I had been under the impression that "escaped" in this context meant that an escape character (the backslash) was part of the string value for the regular expression (there's a little bit of overloading going on with that word). Thanks for setting me straight. |
The light finally comes on. I actually *was* putting a backslash into the string value, with the raw flag (which is, of course, what you were trying to tell me). Thanks for your patience. :-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: