New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misleading documentations and comments in regular expression HOWTO #62979
Comments
According to: http://oald8.oxfordlearnersdictionaries.com/dictionary/alphanumeric Alphanumeric is defined as [A-Za-z0-9]. Underscore (_) is not one of them. One of the documentation in Python (Doc/tutorial/stdlib2.rst) differentiates them very clearly: "The format uses placeholder names formed by Yet, in documentations as well as comments in regex, we implicitely assumes underscore belongs to alphanumeric. Explicit is better than implicit! Attached the patch to differentiate alphanumeric and underscore in documentations and comments in regex. This is important in case someone is confused with this code:
>>> import re
>>> re.split('\W', 'haha$hihi*huhu_hehe hoho')
['haha', 'hihi', 'huhu_hehe', 'hoho'] On the side note: |
I was wondering which doc you were alluding it, before I noticed your patch is against the regex HOWTO. |
In Lib/re.py, starting from line 77 (Python 3.4):
The prelude is "Matches any alphanumeric character;". Yet, in any case (bytes, string patterns with ascii flag, string patterns without the ascii flag, strings with locale), the underscore is always included. Then why don't we change the prelude to "Matches any alphanumeric character and underscore character;"? In the description we explain the alphanumeric depending on it's unicode or not can be [A-Za-z0-9] or wider than that. The description is already okay but the prelude is misleading readers. |
The answer to the question about "alphanumerics" versus "alphanumeric characters" is that is is mostly likely context-dependent, so I'd have to see particular examples to say which I though read better. So, there is no One True Answer for this question, I think. |
Unfortunately making the sentences pedantically correct also makes them ungainly, and I think people generally assume that underscores are treated as a letter. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: