PEP: 455 Title: Adding a key-transforming dictionary to collections Version:
This PEP proposes a new data structure for the collections
module, called "TransformDict" in this PEP. This structure is a mutable mapping which transforms the key using a given function when doing a lookup, but retains the original key when reading.
See the rationale at https://mail.python.org/pipermail/python-dev/2015-May/140003.html and for an earlier partial review, see https://mail.python.org/pipermail/python-dev/2013-October/129937.html .
Numerous specialized versions of this pattern exist. The most common is a case-insensitive case-preserving dict, i.e. a dict-like container which matches keys in a case-insensitive fashion but retains the original casing. It is a very common need in network programming, as many protocols feature some arrays of "key / value" properties in their messages, where the keys are textual strings whose case is specified to be ignored on receipt but by either specification or custom is to be preserved or non-trivially canonicalized when retransmitted.
Another common request is an identity dict, where keys are matched according to their respective id()s instead of normal matching.
Both are instances of a more general pattern, where a given transformation function is applied to keys when looking them up: that function being str.lower
or str.casefold
in the former example and the built-in id
function in the latter.
(It could be said that the pattern projects keys from the user-visible set onto the internal lookup set.)
TransformDict is a MutableMapping
implementation: it faithfully implements the well-known API of mutable mappings, like dict
itself and other dict-like classes in the standard library. Therefore, this PEP won't rehash the semantics of most TransformDict methods.
The transformation function needn't be bijective, it can be strictly surjective as in the case-insensitive example (in other words, different keys can lookup the same value):
>>> d = TransformDict(str.casefold)
>>> d['SomeKey'] = 5
>>> d['somekey']
5
>>> d['SOMEKEY']
5
TransformDict retains the first key used when creating an entry:
>>> d = TransformDict(str.casefold)
>>> d['SomeKey'] = 1
>>> d['somekey'] = 2
>>> list(d.items())
[('SomeKey', 2)]
The original keys needn't be hashable, as long as the transformation function returns a hashable one:
>>> d = TransformDict(id)
>>> l = [None]
>>> d[l] = 5
>>> l in d
True
As shown in the examples above, creating a TransformDict requires passing the key transformation function as the first argument (much like creating a defaultdict
requires passing the factory function as first argument).
The constructor also takes other optional arguments which can be used to initialize the TransformDict with certain key-value pairs. Those optional arguments are the same as in the dict
and defaultdict
constructors:
>>> d = TransformDict(str.casefold, [('Foo', 1)], Bar=2)
>>> sorted(d.items())
[('Bar', 2), ('Foo', 1)]
TransformDict also features a lookup method returning the stored key together with the corresponding value:
>>> d = TransformDict(str.casefold, {'Foo': 1})
>>> d.getitem('FOO')
('Foo', 1)
>>> d.getitem('bar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'bar'
The method name getitem()
follows the standard popitem()
method on mutable mappings.
TransformDict has a simple read-only property transform_func
which gives back the transformation function.
Most python-dev respondents found retaining the first user-supplied key more intuitive than retaining the last. Also, it matches the dict object's own behaviour when using different but equal keys:
>>> d = {}
>>> d[1] = 'hello'
>>> d[1.0] = 'world'
>>> d
{1: 'world'}
Furthermore, explicitly retaining the last key in a first-key-retaining scheme is still possible using the following approach:
d.pop(key, None)
d[key] = value
while the converse (retaining the first key in a last-key-retaining scheme) doesn't look possible without rewriting part of the container's code.
Using a function pair isn't necessary, since the original key is retained by the container. Moreover, an encoder / decoder pair would require the transformation to be bijective, which prevents important use cases like case-insensitive matching.
Dictionary values are not used for lookup, their semantics are totally irrelevant to the container's operation. Therefore, there is no point in having both an "original" and a "transformed" value: the transformed value wouldn't be used for anything.
It was asked why we would provide the generic TransformDict construct rather than a specialized case-insensitive dict variant. The answer is that it's nearly as cheap (code-wise and performance-wise) to provide the generic construct, and it can fill more use cases.
Even case-insensitive dicts can actually elicit different transformation functions: str.lower
, str.casefold
or in some cases bytes.lower
when working with text encoded in an ASCII-compatible encoding.
Two other constructor patterns were proposed by Serhiy Storchaka:
A type factory scheme:
d = TransformDict(str.casefold)(Foo=1)
A subclassing scheme:
class CaseInsensitiveDict(TransformDict): __transform__ = str.casefold d = CaseInsensitiveDict(Foo=1)
While both approaches can be defended, they don't follow established practices in the standard library, and therefore were rejected.
A patch for the collections module is tracked on the bug tracker at http://bugs.python.org/issue18986.
Case-insensitive dicts are a popular request:
- http://twistedmatrix.com/documents/current/api/twisted.python.util.InsensitiveDict.html
- https://mail.python.org/pipermail/python-list/2013-May/647243.html
- https://mail.python.org/pipermail/python-list/2005-April/296208.html
- https://mail.python.org/pipermail/python-list/2004-June/241748.html
- http://bugs.python.org/msg197376
- http://stackoverflow.com/a/2082169
- http://stackoverflow.com/a/3296782
- http://code.activestate.com/recipes/66315-case-insensitive-dictionary/
- https://gist.github.com/babakness/3901174
- http://www.wikier.org/blog/key-insensitive-dictionary-in-python
- http://en.sharejs.com/python/14534
- http://www.voidspace.org.uk/python/archive.shtml#caseless
Identity dicts have been requested too:
- https://mail.python.org/pipermail/python-ideas/2010-May/007235.html
- http://www.gossamer-threads.com/lists/python/python/209527
Several modules in the standard library use identity lookups for object memoization, for example pickle
, json
, copy
, cProfile
, doctest
and _threading_local
.
.Net has a generic Dictionary
class where you can specify a custom IEqualityComparer
: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
Using it is the recommended way to write case-insensitive dictionaries: http://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary
Java has a specialized CaseInsensitiveMap
: http://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/CaseInsensitiveMap.html
It also has a separate IdentityHashMap
: http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html
The C++ Standard Template Library features an unordered_map
with customizable hash and equality functions: http://www.cplusplus.com/reference/unordered_map/unordered_map/
This document has been placed in the public domain.
Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: