Permalink
Browse files

bpo-28638: Optimize namedtuple() creation time by minimizing use of e…

…xec() (#3454)

* Working draft without _source

* Re-use itemgetter() instances

* Speed-up calls to __new__() with a pre-bound tuple.__new__()

* Add note regarding string interning

* Remove unnecessary create function wrappers

* Minor sync-ups with PR-2736.  Mostly formatting and f-strings

* Bring-in qualname/__module fix-ups from PR-2736

* Formally remove the verbose flag and _source attribute

* Restore a test of potentially problematic field names

* Restore kwonly_args test but without the verbose option

* Adopt Inada's idea to reuse the docstrings for the itemgetters

* Neaten-up a bit

* Add news blurb

* Serhiy pointed-out the need for interning

* Jelle noticed as missing f on an f-string

* Add whatsnew entry for feature removal

* Accede to request for dict literals instead keyword arguments

* Leave the method.__module__ attribute pointing the actual location of the code

* Improve variable names and add a micro-optimization for an non-public helper function

* Simplify by in-lining reuse_itemgetter()

* Arrange steps in more logical order

* Save docstring in local cache instead of interning
  • Loading branch information...
rhettinger committed Sep 10, 2017
1 parent 3cedf46 commit 8b57d7363916869357848e666d03fa7614c47897
@@ -763,7 +763,7 @@ Named tuples assign meaning to each position in a tuple and allow for more reada
self-documenting code. They can be used wherever regular tuples are used, and
they add the ability to access fields by name instead of position index.
.. function:: namedtuple(typename, field_names, *, verbose=False, rename=False, module=None)
.. function:: namedtuple(typename, field_names, *, rename=False, module=None)
Returns a new tuple subclass named *typename*. The new subclass is used to
create tuple-like objects that have fields accessible by attribute lookup as
@@ -786,10 +786,6 @@ they add the ability to access fields by name instead of position index.
converted to ``['abc', '_1', 'ghi', '_3']``, eliminating the keyword
``def`` and the duplicate fieldname ``abc``.
If *verbose* is true, the class definition is printed after it is
built. This option is outdated; instead, it is simpler to print the
:attr:`_source` attribute.
If *module* is defined, the ``__module__`` attribute of the named tuple is
set to that value.
@@ -806,6 +802,9 @@ they add the ability to access fields by name instead of position index.
.. versionchanged:: 3.6
Added the *module* parameter.
.. versionchanged:: 3.7
Remove the *verbose* parameter and the :attr:`_source` attribute.
.. doctest::
:options: +NORMALIZE_WHITESPACE
@@ -878,15 +877,6 @@ field names, the method and attribute names start with an underscore.
>>> for partnum, record in inventory.items():
... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())
.. attribute:: somenamedtuple._source
A string with the pure Python source code used to create the named
tuple class. The source makes the named tuple self-documenting.
It can be printed, executed using :func:`exec`, or saved to a file
and imported.
.. versionadded:: 3.3
.. attribute:: somenamedtuple._fields
Tuple of strings listing the field names. Useful for introspection
View
@@ -435,6 +435,12 @@ API and Feature Removals
Python 3.1, and has now been removed. Use the :func:`~os.path.splitdrive`
function instead.
* :func:`collections.namedtuple` no longer supports the *verbose* parameter
or ``_source`` attribute which showed the generated source code for the
named tuple class. This was part of an optimization designed to speed-up
class creation. (Contributed by Jelle Zijlstra with further improvements
by INADA Naoki, Serhiy Storchaka, and Raymond Hettinger in :issue:`28638`.)
* Functions :func:`bool`, :func:`float`, :func:`list` and :func:`tuple` no
longer take keyword arguments. The first argument of :func:`int` can now
be passed only as positional argument.
View
@@ -301,59 +301,9 @@ def __eq__(self, other):
### namedtuple
################################################################################
_class_template = """\
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict
_nt_itemgetters = {}
class {typename}(tuple):
'{typename}({arg_list})'
__slots__ = ()
_fields = {field_names!r}
def __new__(_cls, {arg_list}):
'Create new instance of {typename}({arg_list})'
return _tuple.__new__(_cls, ({arg_list}))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new {typename} object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != {num_fields:d}:
raise TypeError('Expected {num_fields:d} arguments, got %d' % len(result))
return result
def _replace(_self, **kwds):
'Return a new {typename} object replacing specified fields with new values'
result = _self._make(map(kwds.pop, {field_names!r}, _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + '({repr_fmt})' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values.'
return OrderedDict(zip(self._fields, self))
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
{field_defs}
"""
_repr_template = '{name}=%r'
_field_template = '''\
{name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')
'''
def namedtuple(typename, field_names, *, verbose=False, rename=False, module=None):
def namedtuple(typename, field_names, *, rename=False, module=None):
"""Returns a new subclass of tuple with named fields.
>>> Point = namedtuple('Point', ['x', 'y'])
@@ -390,46 +340,104 @@ def namedtuple(typename, field_names, *, verbose=False, rename=False, module=Non
or _iskeyword(name)
or name.startswith('_')
or name in seen):
field_names[index] = '_%d' % index
field_names[index] = f'_{index}'
seen.add(name)
for name in [typename] + field_names:
if type(name) is not str:
raise TypeError('Type names and field names must be strings')
if not name.isidentifier():
raise ValueError('Type names and field names must be valid '
'identifiers: %r' % name)
f'identifiers: {name!r}')
if _iskeyword(name):
raise ValueError('Type names and field names cannot be a '
'keyword: %r' % name)
f'keyword: {name!r}')
seen = set()
for name in field_names:
if name.startswith('_') and not rename:
raise ValueError('Field names cannot start with an underscore: '
'%r' % name)
f'{name!r}')
if name in seen:
raise ValueError('Encountered duplicate field name: %r' % name)
raise ValueError(f'Encountered duplicate field name: {name!r}')
seen.add(name)
# Fill-in the class template
class_definition = _class_template.format(
typename = typename,
field_names = tuple(field_names),
num_fields = len(field_names),
arg_list = repr(tuple(field_names)).replace("'", "")[1:-1],
repr_fmt = ', '.join(_repr_template.format(name=name)
for name in field_names),
field_defs = '\n'.join(_field_template.format(index=index, name=name)
for index, name in enumerate(field_names))
)
# Execute the template string in a temporary namespace and support
# tracing utilities by setting a value for frame.f_globals['__name__']
namespace = dict(__name__='namedtuple_%s' % typename)
exec(class_definition, namespace)
result = namespace[typename]
result._source = class_definition
if verbose:
print(result._source)
# Variables used in the methods and docstrings
field_names = tuple(map(_sys.intern, field_names))
num_fields = len(field_names)
arg_list = repr(field_names).replace("'", "")[1:-1]
repr_fmt = '(' + ', '.join(f'{name}=%r' for name in field_names) + ')'
tuple_new = tuple.__new__
_len = len
# Create all the named tuple methods to be added to the class namespace
s = f'def __new__(_cls, {arg_list}): return _tuple_new(_cls, ({arg_list}))'
namespace = {'_tuple_new': tuple_new, '__name__': f'namedtuple_{typename}'}
# Note: exec() has the side-effect of interning the typename and field names
exec(s, namespace)
__new__ = namespace['__new__']
__new__.__doc__ = f'Create new instance of {typename}({arg_list})'
@classmethod
def _make(cls, iterable):
result = tuple_new(cls, iterable)
if _len(result) != num_fields:
raise TypeError(f'Expected {num_fields} arguments, got {len(result)}')
return result
_make.__func__.__doc__ = (f'Make a new {typename} object from a sequence '
'or iterable')
def _replace(_self, **kwds):
result = _self._make(map(kwds.pop, field_names, _self))
if kwds:
raise ValueError(f'Got unexpected field names: {list(kwds)!r}')
return result
_replace.__doc__ = (f'Return a new {typename} object replacing specified '
'fields with new values')
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + repr_fmt % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values.'
return OrderedDict(zip(self._fields, self))
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
# Modify function metadata to help with introspection and debugging
for method in (__new__, _make.__func__, _replace,
__repr__, _asdict, __getnewargs__):
method.__qualname__ = f'{typename}.{method.__name__}'
# Build-up the class namespace dictionary
# and use type() to build the result class
class_namespace = {
'__doc__': f'{typename}({arg_list})',
'__slots__': (),
'_fields': field_names,
'__new__': __new__,
'_make': _make,
'_replace': _replace,
'__repr__': __repr__,
'_asdict': _asdict,
'__getnewargs__': __getnewargs__,
}
cache = _nt_itemgetters
for index, name in enumerate(field_names):
try:
itemgetter_object, doc = cache[index]
except KeyError:
itemgetter_object = _itemgetter(index)
doc = f'Alias for field number {index}'
cache[index] = itemgetter_object, doc
class_namespace[name] = property(itemgetter_object, doc=doc)
result = type(typename, (tuple,), class_namespace)
# For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created. Bypass this step in environments where
@@ -194,7 +194,6 @@ def test_factory(self):
self.assertEqual(Point.__module__, __name__)
self.assertEqual(Point.__getitem__, tuple.__getitem__)
self.assertEqual(Point._fields, ('x', 'y'))
self.assertIn('class Point(tuple)', Point._source)
self.assertRaises(ValueError, namedtuple, 'abc%', 'efg ghi') # type has non-alpha char
self.assertRaises(ValueError, namedtuple, 'class', 'efg ghi') # type has keyword
@@ -366,11 +365,37 @@ def test_name_conflicts(self):
newt = t._replace(itemgetter=10, property=20, self=30, cls=40, tuple=50)
self.assertEqual(newt, (10,20,30,40,50))
# Broader test of all interesting names in a template
with support.captured_stdout() as template:
T = namedtuple('T', 'x', verbose=True)
words = set(re.findall('[A-Za-z]+', template.getvalue()))
words -= set(keyword.kwlist)
# Broader test of all interesting names taken from the code, old
# template, and an example
words = {'Alias', 'At', 'AttributeError', 'Build', 'Bypass', 'Create',
'Encountered', 'Expected', 'Field', 'For', 'Got', 'Helper',
'IronPython', 'Jython', 'KeyError', 'Make', 'Modify', 'Note',
'OrderedDict', 'Point', 'Return', 'Returns', 'Type', 'TypeError',
'Used', 'Validate', 'ValueError', 'Variables', 'a', 'accessible', 'add',
'added', 'all', 'also', 'an', 'arg_list', 'args', 'arguments',
'automatically', 'be', 'build', 'builtins', 'but', 'by', 'cannot',
'class_namespace', 'classmethod', 'cls', 'collections', 'convert',
'copy', 'created', 'creation', 'd', 'debugging', 'defined', 'dict',
'dictionary', 'doc', 'docstring', 'docstrings', 'duplicate', 'effect',
'either', 'enumerate', 'environments', 'error', 'example', 'exec', 'f',
'f_globals', 'field', 'field_names', 'fields', 'formatted', 'frame',
'function', 'functions', 'generate', 'get', 'getter', 'got', 'greater',
'has', 'help', 'identifiers', 'index', 'indexable', 'instance',
'instantiate', 'interning', 'introspection', 'isidentifier',
'isinstance', 'itemgetter', 'iterable', 'join', 'keyword', 'keywords',
'kwds', 'len', 'like', 'list', 'map', 'maps', 'message', 'metadata',
'method', 'methods', 'module', 'module_name', 'must', 'name', 'named',
'namedtuple', 'namedtuple_', 'names', 'namespace', 'needs', 'new',
'nicely', 'num_fields', 'number', 'object', 'of', 'operator', 'option',
'p', 'particular', 'pickle', 'pickling', 'plain', 'pop', 'positional',
'property', 'r', 'regular', 'rename', 'replace', 'replacing', 'repr',
'repr_fmt', 'representation', 'result', 'reuse_itemgetter', 's', 'seen',
'self', 'sequence', 'set', 'side', 'specified', 'split', 'start',
'startswith', 'step', 'str', 'string', 'strings', 'subclass', 'sys',
'targets', 'than', 'the', 'their', 'this', 'to', 'tuple', 'tuple_new',
'type', 'typename', 'underscore', 'unexpected', 'unpack', 'up', 'use',
'used', 'user', 'valid', 'values', 'variable', 'verbose', 'where',
'which', 'work', 'x', 'y', 'z', 'zip'}
T = namedtuple('T', words)
# test __new__
values = tuple(range(len(words)))
@@ -396,30 +421,15 @@ def test_name_conflicts(self):
self.assertEqual(t.__getnewargs__(), values)
def test_repr(self):
with support.captured_stdout() as template:
A = namedtuple('A', 'x', verbose=True)
A = namedtuple('A', 'x')
self.assertEqual(repr(A(1)), 'A(x=1)')
# repr should show the name of the subclass
class B(A):
pass
self.assertEqual(repr(B(1)), 'B(x=1)')
def test_source(self):
# verify that _source can be run through exec()
tmp = namedtuple('NTColor', 'red green blue')
globals().pop('NTColor', None) # remove artifacts from other tests
exec(tmp._source, globals())
self.assertIn('NTColor', globals())
c = NTColor(10, 20, 30)
self.assertEqual((c.red, c.green, c.blue), (10, 20, 30))
self.assertEqual(NTColor._fields, ('red', 'green', 'blue'))
globals().pop('NTColor', None) # clean-up after this test
def test_keyword_only_arguments(self):
# See issue 25628
with support.captured_stdout() as template:
NT = namedtuple('NT', ['x', 'y'], verbose=True)
self.assertIn('class NT', NT._source)
with self.assertRaises(TypeError):
NT = namedtuple('NT', ['x', 'y'], True)
@@ -0,0 +1,9 @@
Changed the implementation strategy for collections.namedtuple() to
substantially reduce the use of exec() in favor of precomputed methods. As a
result, the *verbose* parameter and *_source* attribute are no longer
supported. The benefits include 1) having a smaller memory footprint for
applications using multiple named tuples, 2) faster creation of the named
tuple class (approx 4x to 6x depending on how it is measured), and 3) minor
speed-ups for instance creation using __new__, _make, and _replace. (The
primary patch contributor is Jelle Zijlstra with further improvements by
INADA Naoki, Serhiy Storchaka, and Raymond Hettinger.)

0 comments on commit 8b57d73

Please sign in to comment.