New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for OrderedDicts #232
Conversation
Things like using
|
Also thanks, this will be cool if we can make it work. I often bemoan the lack of universal support for OrderedDicts in the ecosystem. Never occurred to me to improve toolz in this way. |
Great! I'll run some benchmarks tomorrow. I can't think of anything that I changed which would negatively affect performance too much. I guess there is a strange corner case which could be slow: If the first inputs to I haven't looked at |
Not very scientific or exhaustive benchmarks, but gives an idea. Things seem to be okay when using
I made a small change, and now small number of dictionaries the overhead of converting the intermediary result from
The corner case I mentioned has a large effect on performance, but it seems pretty artificial:
|
Interesting. Regarding impact to |
if return_ordered_dict and not isinstance(d, OrderedDict): | ||
result = dict(result) | ||
dict_ = dict | ||
return_ordered_dict = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this logic can be pulled out of the loop into a separate loop
if all(isinstance(d, OrderedDict) for d in dicts):
result = OrderedDict()
else:
result = dict()
This does walk through the dicts twice, but I suspect that this is cheap. It also requires that we make dicts
concrete and not lazy, but this was already the case due to how we implement merge_with
, which brings everything in to memory anyway.
If this doesn't significantly impact performance then it might be preferred for code simplicity's sake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't do that is because if dicts
is an iterator it ends up being exhausted before the loop ever starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion in that case is to actually make dicts
concrete by calling list
on it. This would be bad if we benfitted by laziness but, due to the current implementation of merge_with
we're not lazy anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, read that too quickly, sorry! Good point, I'll change it.
This should probably have some tests to ensure that OrderedDicts emerge in the appropriate cases. |
As pointed out by @mrocklin, the function already loaded all dictionaries into memory anyway. Timing before: d = [{i: i + 1 for i in range(10)} for j in range(10000)] %timeit merge_with(d) 100000 loops, best of 3: 11.2 µs per loop And after %timeit merge_with(d) 100000 loops, best of 3: 11.7 µs per loop
@@ -175,7 +190,7 @@ def assoc(d, key, value): | |||
>>> assoc({'x': 1}, 'y', 3) # doctest: +SKIP | |||
{'x': 1, 'y': 3} | |||
""" | |||
return merge(d, {key: value}) | |||
return merge(d, type(d)([(key, value)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the following will be faster.
d = d.copy()
d[key] = value
I think we should think about supporting (for input and output) any Previously, functions in One option to support generic mappings is for functions in |
Something along those lines sounds good to me. Another approach, that wouldn't require the passing of a def factory(d):
if type(d) is dict:
return {}
try:
r = d.__reduce__()
except TypeError:
raise TypeError
if len(r) == 2:
callable_, args = r
return callable_()
elif len(r) == 5:
callable_, args, state, list_items, dict_items = r
if state is not None or list_items is not None:
raise TypeError
return callable_(*args)
raise TypeError |
That's pretty clever! Relying on the pickle protocol may not be particularly robust though. A factory keyword will let you do this: |
I agree it's more explicit and robust. How would you want to deal with mappings that don't support In [135]: def update(d1, d2):
for k, v in d2.iteritems():
d1[k] = v
In [137]: d1 = {i: i +1 for i in range(1000)}
In [138]: d2 = {i: i +1 for i in range(1000)}
In [139]: %timeit update(d1, d2)
10000 loops, best of 3: 63.9 µs per loop
In [140]: %timeit d1.update(d2)
10000 loops, best of 3: 20.6 µs per loop Maybe something like this? def update(d1, d2):
if callable(getattr(d1, 'update', None)):
d1.update(d2)
else:
for k, v in d2.iteritems():
d1[k] = v |
New version with In [1]: def update(d1, d2):
if callable(getattr(d1, 'update', None)):
d1.update(d2)
else:
for k, v in d2.iteritems():
d1[k] = v
In [2]: d1 = {i: i +1 for i in range(1000)}
In [3]: d2 = {i: i +1 for i in range(1000)}
In [4]: %timeit update(d1, d2)
10000 loops, best of 3: 20.3 µs per loop
In [5]: %timeit d1.update(d2)
10000 loops, best of 3: 19.9 µs per loop |
Very cool. |
After having a closer look I'd actually suggest switching back to the I made that change and all tests should pass now. Let me know if you'd like to see any other changes. |
Oh, you're right about that. Thanks for being pedantic. I was looking at Mapping, not MutableMapping. We should check to see if anything needs done for We should probably mention this in the documentation somewhere too. This looks pretty good to me. @mrocklin, thoughts? |
Would you like to add yourself to AUTHORS.md, @bartvm? Welcome to PyToolz! |
Cool, I will, thanks! Currying is problematic for You could force |
Seems good to me. It's a nice approach. Explicitness as release valve. |
Okay, so |
Updated I tested with Two solutions:
|
This is excellent. I hope you find it convenient enough for your original use case with +1 to merge (which I'll do soon if no comment). It'll be interesting to see how easily and how well |
This is in! |
Great, thanks! |
I like the Dicttoolz package, but for many of my use cases I need the deterministic behaviour of
OrderedDict
. Adapting Dicttoolz to returnOrderedDict
if all of its inputs are one is relatively straightforward. Is this something you would consider merging?