-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the usability of the match object named group API #68642
Comments
The usability, learnability, and readability of match object code would be improved by giving it a more Pythonic API (inspired by ElementTree). Given a search like: data = 'Answer found on row 8 column 12.'
mo = re.search(r'row (?P<row>\d+) column (?P<col>\d+)', data) We currently access results with: print(mo.group('col'))
print(len(mo.groups()) A better way would look like this: print(mo['col'])
print(len(mo)) This would work nicely with string formatting: print('Located coordinate at (%(row)s, %(col)s)' % mo) |
You can use mo.groupdict(). print('Located coordinate at (%(row)s, %(col)s)' % mo.groupdict()) As for len(mo), this is ambiguous, as well as indexing with integer indices. You suggest len(mo) be equal len(mo.groups()) and this looks reasonable to me, but in regex len(mo) equals to len(mo.groups())+1 (because mo[1] equals to mo.group(1) or mo.groups()[0]). If indexing will work only with named groups, it would be expected that len(mo) will be equal to len(mo.groupdict()). |
This has already been discussed in another issue. As Serhiy mentioned, len(mo) and mo[num] would be ambiguous because of the group 0, but mo[name] might be ok. |
I'd definitely be for mo['col']. I can't say I've ever used len(mo.groups()). I do have lots of code like: Using groupdict there is doable but not great. But: |
I agree that it would be nice if len(mo) == len(mo.groups()), but Serhiy has explained why that's not the case in the regex module. The regex module does support mo[name], so: print('Located coordinate at (%(row)s, %(col)s)' % mo) already work. |
The disadvantage of supporting len() is its ambiguousness. Supporting indexing with group name also has disadvantages (benefits already was mentioned above).
This feature would improve the access to named groups (6 characters less to type for every case and better readability), but may be implementing access via attributes would be even better? mo.groupnamespace().col or mo.ns.col? |
The whole point is to eliminate the unnecessary extra level. mo['name'], mo['rank'], mo['serialnumber'] There are several problems with trying to turn this into attribute access. One of the usual ones are the conflict between the user fieldnames and the actual methods and attributes of the objects (that is why named tuples have the irritating leading underscore for its own attributes and methods). The other problem is that it interferes with usability when the fieldname is stored in a variable. Contrast, "fieldname='rank'; print(mo[fieldname])" with "fieldname='rank'; print(getattr(mo, fieldname))". I'm happy to abandon the "len(mo)" suggestion, but mo[groupname] would be really nice. |
I've been playing around with this. My implementation is basically the naive: def __getitem__(self, value):
return self.group(value) I have the following tests passing: def test_match_getitem(self):
pat = re.compile('(?:(?P<a1>a)|(?P<b2>b))(?P<c3>c)?')
m = pat.match('a')
self.assertEqual(m['a1'], 'a')
self.assertEqual(m['b2'], None)
self.assertEqual(m['c3'], None)
self.assertEqual(m[0], 'a')
self.assertEqual(m[1], 'a')
self.assertEqual(m[2], None)
self.assertEqual(m[3], None)
with self.assertRaises(IndexError):
m['X']
m = pat.match('ac')
self.assertEqual(m['a1'], 'a')
self.assertEqual(m['b2'], None)
self.assertEqual(m['c3'], 'c')
self.assertEqual(m[0], 'ac')
self.assertEqual(m[1], 'a')
self.assertEqual(m[2], None)
self.assertEqual(m[3], 'c')
with self.assertRaises(IndexError):
m['X']
# Cannot assign.
with self.assertRaises(TypeError):
m[0] = 1
# No len().
self.assertRaises(TypeError, len, m) But because I'm just calling group(), you'll notice a few oddities. Namely:
I can't decide if these are good (because they're consistent with group()), or bad (because they're surprising. I'm interested in your opinions. I'll attach the patch when I'm at a better computer. |
Here's the patch. I added some more tests, including tests for ''.format_map(). I think the format_map() tests convince me that keeping None for non-matched groups makes sense. |
Updated patch, with docs. I'd like to get this in to 3.6. Can anyone take a look? |
New changeset ac0643314d12 by Eric V. Smith in branch 'default': |
Please document this feature in What's News Eric. There is no need to add the __getitem__ method explicitly the the list of methods match_methods. It would be added automatically. |
I added a note in Misc/NEWS. Is that not what you mean? I'll look at match_methods. |
I meant Doc/whatsnew/3.6.rst. |
New changeset 3265247e08f0 by Eric V. Smith in branch 'default': |
Fixed. Thanks, Serhiy. |
./Modules/_sre.c:2425:14: warning: ‘match_getitem_doc’ defined but not used [- |
On 9/11/2016 10:05 AM, Serhiy Storchaka wrote:
Not in Visual Studio! Standby. |
New changeset 9eb38e0f1cad by Eric V. Smith in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: