-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Refactor ._iter() method, 10x speed boost for dict(model) #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Moved all keys-related stuff (include, exclude, etc.) to ._iter() Removed redundant iteration through default values Almost all arguments checks moved out of loops, so checks happen once Fast yield from .__dict__ on plain .iter() (x10 boost) Removed redundant set(dict.keys()) in ._calculate_keys()
|
Oh, no. pydantic/main.py:669: in _iter
yield from iteritems
pydantic/main.py:666: in genexpr
for k, v in iteritems
E ValueError: generator already executingSeems like it's an old cython bug, that won't be fixed until Or we can bet back to old-style iteration. @samuelcolvin, @dmontagu what do you think? |
|
Ok, looks like we can't fix it, so I see these options:
I have an idea how we can achieve 3rd option: if exclude_none:
keys -= self.__null_fields__
if exclude_defaults:
keys -= self.__default_fields__Upd. just done that, all seems to work. But of course needed to change Union[
Tuple['DictStrAny', 'SetStr', Optional[ValidationError]], # for backward compatibility
Tuple['DictStrAny', 'SetStr', 'SetStr', 'SetStr', Optional[ValidationError]],
] |
|
I'm not sure:
Surely to work around the cython issue, the simplest solution would be to use a list for intermediate values of E.g. something like: iteritems = [(k, v) for k, v in iteritems if k in allowed_keys]? Or would this loose all the performance improvement? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other approach (though ugly) would be to have a choice of loops:
- loop if both
allowed_keysandexclude_noneareNone - different loops for the case one is
None, the other isNone, neither areNone
This would be ugly but not actually that long to write?
Then maybe we could change __iter__ to just return self.__dict__.items()?
pydantic/main.py
Outdated
| if allowed_keys is not None: | ||
| iteritems = ((k, v) for k, v in iteritems if k in allowed_keys) | ||
| if exclude_none: | ||
| iteritems = ((k, v) for k, v in iteritems if v is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess even if we have one loop we could move this check to before _get_value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that surely good idea
pydantic/main.py
Outdated
| if by_alias: | ||
| fields = self.__fields__ | ||
| iteritems = ((fields[k].alias if k in fields else k, v) for k, v in iteritems) | ||
| if to_dict or allowed_keys is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess even if we have one loop we could avoid calling _get_value in many cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. that's my point too 👍
iteritems = [(k, v) for k, v in iteritems if k in allowed_keys]It definitely will kill the performance, as it will actually construct new list in place for every condition. So not an option
Seems like the right thing to do 👍 About
|
|
For what it's worth, in general, I would be in favor of special casing the "default" cases for some of these recursive methods in order to achieve better performance. I think currently we are paying an unreasonably high price for checks that are nearly always unused. (Though, to be fair, pydantic is still obviously very fast.) But I think we need to be aggressive about making sure it's done in a way that is maintainable. Given the complexities noted by @samuelcolvin and in your immediately previous comment @MrMrRobat, I worry that adding To the extent that it addresses the same performance issues, I would be much more in favor of just declaring a separate module-level function (e.g., called something like The approach described above seems to imply that you'd need to be thinking about the effects of many distantly-related functions to ensure correctness ( |
Codecov Report
@@ Coverage Diff @@
## master #1017 +/- ##
========================================
Coverage ? 100%
========================================
Files ? 21
Lines ? 3669
Branches ? 719
========================================
Hits ? 3669
Misses ? 0
Partials ? 0
Continue to review full report at Codecov.
|
Bumps [pytest-mock](https://github.com/pytest-dev/pytest-mock) from 1.12.0 to 1.12.1. - [Release notes](https://github.com/pytest-dev/pytest-mock/releases) - [Changelog](https://github.com/pytest-dev/pytest-mock/blob/master/CHANGELOG.rst) - [Commits](pytest-dev/pytest-mock@v1.12.0...v1.12.1) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
…st episode (pydantic#1025) * add testimonials section with reference to python bytes podcast episode * added description to changes directory
Bumps [twine](https://github.com/pypa/twine) from 3.0.0 to 3.1.0. - [Release notes](https://github.com/pypa/twine/releases) - [Changelog](https://github.com/pypa/twine/blob/master/docs/changelog.rst) - [Commits](pypa/twine@3.0.0...3.1.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
* Support typing.Literal in python 3.8 * Improve import pattern for Literal * Update references to in docs * Try to get build to pass
* Add support for mapping types as custom root * Incorporate feedback * Add changes * Incorporate feedback * Add docs and tests * Fix linting issue * Incorporate more feedback * Add more specific match
* Add parse_as_type function * Add changes * Incorporate feedback * Add naming tests * Fix double quotes * Fix docs example * Reorder parameters; add dataclass and mapping tests * Rename parse_as_type to parse_obj, and add parse_file * Incorporate feedback * Incorporate feedback * use custom root types
* Add better support for validator reuse * Clean up classmethod unpacking * Add changes * Fix coverage check * Make 3.8 compatible * Update changes/940-dmontagu.md Co-Authored-By: Samuel Colvin <s@muelcolvin.com> * Make allow_reuse discoverable by adding to error message * switch _check_validator_name to _prepare_validator
|
Your rebase seems to have gone wrong, you have lots of files here which don't relate to this PR. You might need to rebase to get tests working correctly |
# Conflicts: # docs/install.md # docs/usage/models.md # docs/usage/types.md # pydantic/class_validators.py # pydantic/tools.py # pydantic/utils.py # tests/requirements.txt # tests/test_edge_cases.py # tests/test_tools.py # tests/test_validators.py
|
Hope last commits fixed the issue. Sorry for inconvenience! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, just need to work out how this should work with #1139
Co-Authored-By: Samuel Colvin <samcolvin@gmail.com>
Co-Authored-By: Samuel Colvin <samcolvin@gmail.com>
|
I guess, all ready 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the slow response, I've been travelling.
I think this is good except the one line I noticed might not be required?
Co-Authored-By: Samuel Colvin <samcolvin@gmail.com>
|
awesome, thank you. |
Changes
.__dict__on plain.iter()(x10 boost)include,exclude, etc.) to._iter().iter()set(dict.keys())in._calculate_keys()._keys_factory()methodFor tracking performance I used dataset from benchmarks. Seems like performance of
BaseModel.dict()didn't change much.dict(m)onmaster:dict(m)on_iter-refactor:Checklist
changes/<pull request or issue id>-<github username>.mdfile added describing change(see changes/README.md for details)