Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert documents to dicts during insertion #256

Merged
merged 2 commits into from Dec 4, 2018

Conversation

@msiemens
Copy link
Owner

msiemens commented Nov 17, 2018

@caichinger

This comment has been minimized.

Copy link
Contributor

caichinger commented Nov 22, 2018

I have not yet taken a look at the changes but since converting to dict or performing isinstance checks incurs an overhead (and was discussed in #218 ), I thought I provide a few primitive timings using IPython's %timeit

In [1]: from collections.abc import Mapping
   ...: 
   ...: class CustomDocument(Mapping):
   ...:     def __init__(self, data):
   ...:         self.data = data
   ...:     def __repr__(self):
   ...:         return f'{self.__class__.__name__}({self.data})'
   ...:     def __getitem__(self, key):
   ...:         return self.data[key]
   ...:     def __iter__(self):
   ...:         return iter(self.data)
   ...:     def __len__(self):
   ...:         return len(self.data)
   ...: 
   ...: 
   ...: minimal_dict = {'int': 1}
   ...: minimal_doc = CustomDocument(minimal_dict)
   ...: another_dict = {key: list(range(1_000)) for key in range(10)}
   ...: another_doc = CustomDocument(another_dict)

In [2]: %timeit "{'int': 1}"
7.78 ns ± 0.154 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

In [3]: %timeit isinstance(minimal_dict, dict)  # of course similar for another_dict
66.1 ns ± 1.16 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: %timeit dict(minimal_dict)
181 ns ± 4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit dict(minimal_doc)
1.63 µs ± 7.32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit dict(another_dict)
360 ns ± 3.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit dict(another_doc)
3.24 µs ± 23 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

To me it seems that converting a dict into a dict is quite fast - and this is the most likely scenario.
Changing the API to avoid a check/conversion (suggested in #218) is not really a good idea in my opinion; converting the Mapping takes longer but I too think that and not providing a custom converter is a too easy mistake to make (as addressed in #245 ).

From my point of view the performance overhead seems okay given that the library's selling point is simplicity and ease of use, not performance. However, I acknowledge that I do not know how it is used by others.

If desired/required, I can perform additional benchmarks.

@msiemens

This comment has been minimized.

Copy link
Owner Author

msiemens commented Dec 4, 2018

Thanks @caichinger for the microbenchmarks! I agree with your conclusion. Using dict(document) simply looks like the most clean (i.e. pythonic) solution. We'll have to convert Mappings to dicts anyway so I think this is the way to go.

@msiemens msiemens merged commit 43a0ab7 into master Dec 4, 2018
4 checks passed
4 checks passed
License Compliance All checks passed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 100.0%
Details
@msiemens msiemens deleted the feature-dictify-docs-on-insert branch Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.