Skip to content

Conversation

Bobronium
Copy link
Contributor

@Bobronium Bobronium commented Nov 21, 2019

Changes

  • Fast yield from .__dict__ on plain .iter() (x10 boost)
  • As per Add public BaseModel.iter() method #997, moved all keys-related stuff (include, exclude, etc.) to ._iter()
  • Removed iteration through default values in .iter()
  • Almost all arguments checks moved out of loops, so checks happen once
  • Removed redundant set(dict.keys()) in ._calculate_keys()
  • Removed redundant ._keys_factory() method

For tracking performance I used dataset from benchmarks. Seems like performance of BaseModel.dict() didn't change much.

# benchmarks.run.generate_case

CASES = [generate_case() for _ in range(10)]


def test():
    for case in CASES:
        try:
            # benchmarks.test_pydantic.TestPydantic().model
            m = Model(**case)
        except ValidationError:
            pass
        else:
            print(timeit(partial(dict, m), number=100000))

dict(m) on master:

    2.8283871970000005
    3.3839481399999993
    3.3260206760000006
    3.629596341000001
    4.001494594
    2.797958307000002
    2.6821828930000002
    3.092909658
    3.724842543000001
    3.0450761390000025

dict(m) on _iter-refactor:

    0.2933802639999996
    0.28522739900000005
    0.28913957199999984
    0.29106095399999976
    0.2844331260000006
    0.289195844
    0.2902462040000007
    0.2863789990000001
    0.29688859900000075
    0.30014120099999886

Checklist

  • Unit tests for the changes exist | do we need any tests for that?
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
    (see changes/README.md for details)

Moved all keys-related stuff (include, exclude, etc.) to ._iter()
Removed redundant iteration through default values
Almost all arguments checks moved out of loops, so checks happen once
Fast yield from .__dict__ on plain .iter() (x10 boost)
Removed redundant set(dict.keys()) in ._calculate_keys()
@Bobronium
Copy link
Contributor Author

Bobronium commented Nov 21, 2019

Oh, no. cython CI just failed with

pydantic/main.py:669: in _iter
    yield from iteritems
pydantic/main.py:666: in genexpr
    for k, v in iteritems
E   ValueError: generator already executing

Seems like it's an old cython bug, that won't be fixed until cython 3.0:
Original issue: cython/cython#1159
Possible workaround: cython/cython#1479

Or we can bet back to old-style iteration.

@samuelcolvin, @dmontagu what do you think?

@Bobronium
Copy link
Contributor Author

Bobronium commented Nov 21, 2019

Ok, looks like we can't fix it, so I see these options:

  1. Check all conditions on each iteration like it was before
  2. Have to versions for cython and python (not preferred I guess)
  3. Make things faster some other way

I have an idea how we can achieve 3rd option:
Instead of checking each value on iteration for every condition, we can perform it once on model creation, store it in special fields and then simply get allowed_keys in ._calculate_keys:
exclude_defaultsBaseModel().__default_fields__
exclude_noneBaseModel().__null_fields__
and then:

if exclude_none:
    keys -= self.__null_fields__

if exclude_defaults:
    keys -= self.__default_fields__

Upd. just done that, all seems to work. But of course needed to change validate_model signature to:

Union[
    Tuple['DictStrAny', 'SetStr', Optional[ValidationError]],  # for backward compatibility
    Tuple['DictStrAny', 'SetStr', 'SetStr', 'SetStr', Optional[ValidationError]],
]

@samuelcolvin
Copy link
Member

I'm not sure:

  1. this would slow down validation even though very often __null_fields__ and __default_fields__ won't be used
  2. They would also need to be kept up to date in __setattr__

Surely to work around the cython issue, the simplest solution would be to use a list for intermediate values of iteritems?

E.g. something like:

    iteritems = [(k, v) for k, v in iteritems if k in allowed_keys]

?

Or would this loose all the performance improvement?

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other approach (though ugly) would be to have a choice of loops:

  • loop if both allowed_keys and exclude_none are None
  • different loops for the case one is None, the other is None, neither are None

This would be ugly but not actually that long to write?

Then maybe we could change __iter__ to just return self.__dict__.items()?

pydantic/main.py Outdated
if allowed_keys is not None:
iteritems = ((k, v) for k, v in iteritems if k in allowed_keys)
if exclude_none:
iteritems = ((k, v) for k, v in iteritems if v is not None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess even if we have one loop we could move this check to before _get_value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that surely good idea

pydantic/main.py Outdated
if by_alias:
fields = self.__fields__
iteritems = ((fields[k].alias if k in fields else k, v) for k, v in iteritems)
if to_dict or allowed_keys is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess even if we have one loop we could avoid calling _get_value in many cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. that's my point too 👍

@Bobronium
Copy link
Contributor Author

Bobronium commented Nov 21, 2019

Surely to work around the cython issue, the simplest solution would be to use a list for intermediate values of iteritems?
E.g. something like:

    iteritems = [(k, v) for k, v in iteritems if k in allowed_keys]

It definitely will kill the performance, as it will actually construct new list in place for every condition. So not an option

Then maybe we could change iter to just return self.dict.items()?

Seems like the right thing to do 👍

About __null_fields__ and __default_fields__:

this would slow down validation even though very often __null_fields__ and __default_fields__ won't be used

It will add only this to validate_model:

if v_ is None:
    null_fields.add(name)
if v_ == field.default:
    default_fields.add(name)

I don't think it'll notably hit the performance. On the other hand __null_fields__ and __default_fields__ will be used at every .dict(), .json(), .copy() to improve the performance

They would also need to be kept up to date in setattr

Good point. I guess, it's true only if validate_assignments enabled and won't be so hard to do.

How about I commit changes to a new branch, so you'll be able to see it and run needed benchmarks? Or keep it in this one?

Upd.
Aw... Ok, when we dealing with .copy() things could get messy because of update arg. So we'll need to sync __null_fields__ and __default_fields__ with its values, or... just ignore it, as data in update not validated anyway.
Upd. 2
Also we have .construct() method, which is not validate any data

Maybe it really doesn't worth it. Anyway, waiting for feedback.

@dmontagu
Copy link
Contributor

dmontagu commented Nov 22, 2019

For what it's worth, in general, I would be in favor of special casing the "default" cases for some of these recursive methods in order to achieve better performance. I think currently we are paying an unreasonably high price for checks that are nearly always unused. (Though, to be fair, pydantic is still obviously very fast.)

But I think we need to be aggressive about making sure it's done in a way that is maintainable.


Given the complexities noted by @samuelcolvin and in your immediately previous comment @MrMrRobat, I worry that adding __null_fields__ and modifying various (mostly-unrelated) functions to keep things in sync is likely to be rather painful to maintain over time.

To the extent that it addresses the same performance issues, I would be much more in favor of just declaring a separate module-level function (e.g., called something like _default_iter()) which is used when the various kwargs are such that we don't need to perform runtime checks in each recursive call, and delegating to that function when possible. That way, maintenance just amounts to ensuring this function is equivalent to BaseModel._iter when called with no arguments, so correctness can be verified in a single place (and it would be easy to write many test cases).

The approach described above seems to imply that you'd need to be thinking about the effects of many distantly-related functions to ensure correctness (copy, validate_model, __setattr__, maybe others?). This adds both to the cognitive burden of maintenance, and to the difficulty of writing a comprehensive test suite since the logic becomes much more path-dependent (i.e., "unit" tests no longer cover most cases).

@codecov
Copy link

codecov bot commented Nov 25, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@ba53c63). Click here to learn what that means.
The diff coverage is 100%.

@@           Coverage Diff            @@
##             master   #1017   +/-   ##
========================================
  Coverage          ?    100%           
========================================
  Files             ?      21           
  Lines             ?    3669           
  Branches          ?     719           
========================================
  Hits              ?    3669           
  Misses            ?       0           
  Partials          ?       0
Impacted Files Coverage Δ
pydantic/datetime_parse.py 100% <ø> (ø)
pydantic/errors.py 100% <100%> (ø)
pydantic/class_validators.py 100% <100%> (ø)
pydantic/version.py 100% <100%> (ø)
pydantic/typing.py 100% <100%> (ø)
pydantic/tools.py 100% <100%> (ø)
pydantic/dataclasses.py 100% <100%> (ø)
pydantic/networks.py 100% <100%> (ø)
pydantic/validators.py 100% <100%> (ø)
pydantic/schema.py 100% <100%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba53c63...7c73905. Read the comment docs.

dependabot-preview bot and others added 9 commits November 29, 2019 02:32
…st episode (pydantic#1025)

* add testimonials section with reference to python bytes podcast episode

* added description to changes directory
Bumps [twine](https://github.com/pypa/twine) from 3.0.0 to 3.1.0.
- [Release notes](https://github.com/pypa/twine/releases)
- [Changelog](https://github.com/pypa/twine/blob/master/docs/changelog.rst)
- [Commits](pypa/twine@3.0.0...3.1.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
* Support typing.Literal in python 3.8

* Improve import pattern for Literal

* Update references to  in docs

* Try to get build to pass
* Add support for mapping types as custom root

* Incorporate feedback

* Add changes

* Incorporate feedback

* Add docs and tests

* Fix linting issue

* Incorporate more feedback

* Add more specific match
* Add parse_as_type function

* Add changes

* Incorporate feedback

* Add naming tests

* Fix double quotes

* Fix docs example

* Reorder parameters; add dataclass and mapping tests

* Rename parse_as_type to parse_obj, and add parse_file

* Incorporate feedback

* Incorporate feedback

* use custom root types
* Add better support for validator reuse

* Clean up classmethod unpacking

* Add changes

* Fix coverage check

* Make 3.8 compatible

* Update changes/940-dmontagu.md

Co-Authored-By: Samuel Colvin <s@muelcolvin.com>

* Make allow_reuse discoverable by adding to error message

* switch _check_validator_name to _prepare_validator
@samuelcolvin
Copy link
Member

samuelcolvin commented Dec 2, 2019

Your rebase seems to have gone wrong, you have lots of files here which don't relate to this PR.

You might need to rebase to get tests working correctly

@samuelcolvin samuelcolvin mentioned this pull request Jan 10, 2020
9 tasks
# Conflicts:
#	docs/install.md
#	docs/usage/models.md
#	docs/usage/types.md
#	pydantic/class_validators.py
#	pydantic/tools.py
#	pydantic/utils.py
#	tests/requirements.txt
#	tests/test_edge_cases.py
#	tests/test_tools.py
#	tests/test_validators.py
@Bobronium
Copy link
Contributor Author

Hope last commits fixed the issue. Sorry for inconvenience!

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, just need to work out how this should work with #1139

Bobronium and others added 3 commits February 11, 2020 21:51
Co-Authored-By: Samuel Colvin <samcolvin@gmail.com>
Co-Authored-By: Samuel Colvin <samcolvin@gmail.com>
@Bobronium
Copy link
Contributor Author

I guess, all ready 🎉

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the slow response, I've been travelling.

I think this is good except the one line I noticed might not be required?

Co-Authored-By: Samuel Colvin <samcolvin@gmail.com>
@samuelcolvin samuelcolvin merged commit 7d9614e into pydantic:master Feb 27, 2020
@samuelcolvin
Copy link
Member

awesome, thank you.

alexdrydew pushed a commit to alexdrydew/pydantic that referenced this pull request Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants