New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect pattern in the re module docs for conditional regex #55492
Comments
In the re docs, it states the following for the conditional regular expression syntax: (?(id/name)yes-pattern|no-pattern) this regex is incomplete as it allows for 'user@host.com>': >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<user@host.com>'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'user@host.com'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<user@host.com'))
False
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'user@host.com>'))
True This error has existed since this feature was added in 2.4... ... through the 3.3. docs... The fix is to add the end char '$' to the regex to get all 4 working: >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<user@host.com>'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'user@host.com'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<user@host.com'))
False
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'user@host.com>'))
False If accepted, I propose this patch (also attached): $ svn diff re.rst
Index: re.rst =================================================================== --- re.rst (revision 88499)
+++ re.rst (working copy)
@@ -297,9 +297,9 @@
``(?(id/name)yes-pattern|no-pattern)``
Will try to match with ``yes-pattern`` if the group with given *id* or *name*
exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is optional and
- can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)`` is a poor email
+ can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)`` is a poor email
matching pattern, which will match with ``'<user@host.com>'`` as well as
- ``'user@host.com'``, but not with ``'<user@host.com'``.
+ ``'user@host.com'``, but not with ``'<user@host.com'`` nor ``'user@host.com>'`` . |
i wanted to add one additional comment that it would be nice to have a the fix of using a '$' prevents this from happening, so i'm not 100% |
On Tue, Feb 22, 2011 at 08:48:20AM +0000, wesley chun wrote:
Better would be a regex for white-space '\s' which would achieve the |
Thinking about the regex pattern again. The example given is not really wrong. It does what it claims to match, that is '<user@example.com>' and 'user@example.com' and reject <user@example.com' kind of string. Nothing is said about 'user@example.com>' kind of string. Also, this is not an example of validating an email address or finding an email address pattern in text data. A good regex for that purposes would be more complex[1][2]. Having said that, if example of conditional regex has been given - the current one is sufficient (in which case no change is required) or a simpler one can be presented, which may not like matching a email address and thus devoid of any expectations of valid patterns. Also, if we 'really' think that rejecting 'user@example>' is good idea in the example documentation, then having '$' in no-pattern of regex is good enough. No need to think for regex search cases for the explanation given about. 1: http://www.regular-expressions.info/email.html |
New changeset 06cca90ff105 by orsenthil in branch 'default': |
New changeset d676601fee6f by Senthil Kumaran in branch '3.1': |
Okay, fixed in all relevant branches. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: