New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for declaring infinite generators #1152
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1152 +/- ##
======================================
Coverage 100% 100%
======================================
Files 20 20
Lines 3445 3479 +34
Branches 665 674 +9
======================================
+ Hits 3445 3479 +34
Continue to review full report at Codecov.
|
@honnibal had the idea that we could do some validation for the first item without consuming the iterator, with something like I could document how to do that in a validator if it sounds okay. |
I went ahead and added docs for validating the first value: https://5e18740d4edea1000b2bd21d--pydantic-docs.netlify.com/usage/types/#infinite-generators-with-validation-for-first-value That made me discover that there was an extra detail needed for it to work. So I'm now storing the param type as a sub-field, with its automatic type and shape, to allow using it for normal validation. It's documented and there's an extra test for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise this is looking great.
for i in m.infinite: | ||
print(i) | ||
if i == 10: | ||
break | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this again, it was useful in the first example, but hopefully they've got the idea now.
docs/usage/types.md
Outdated
@@ -157,6 +160,36 @@ with custom properties and validation. | |||
``` | |||
_(This script is complete, it should run "as is")_ | |||
|
|||
### Infinite Generators | |||
|
|||
If you have a generator you can use `Sequence` as described above. In that case, the generator will be consumed and its values will be validated with the sub-type of `Sequence` (e.g. `int` in `Sequence[int]`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a generator you can use `Sequence` as described above. In that case, the generator will be consumed and its values will be validated with the sub-type of `Sequence` (e.g. `int` in `Sequence[int]`). | |
If you have a generator you can use `Sequence` as described above. In that case, the generator will be consumed and stored on the model as a list and its values will be validated with the sub-type of `Sequence` (e.g. `int` in `Sequence[int]`). |
(I think?)
Also can you manually wrap lines in this file. (I think all other line are?)
docs/usage/types.md
Outdated
|
||
pydantic can't validate the values automatically for you because it would require consuming the infinite generator. | ||
|
||
#### Infinite Generators with Validation for First Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Infinite Generators with Validation for First Value | |
#### Validating the for first value |
docs/usage/types.md
Outdated
|
||
#### Infinite Generators with Validation for First Value | ||
|
||
You can create a [Validator](validators.md) to validate the first value in an infinite generator and still not consume it entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a [Validator](validators.md) to validate the first value in an infinite generator and still not consume it entirely. | |
You can create a [validator](validators.md) to validate the first value in an infinite generator and still not consume it entirely. |
pydantic/fields.py
Outdated
@@ -416,6 +420,12 @@ def _type_analysis(self) -> None: # noqa: C901 (ignore complexity) | |||
self.key_field = self._create_sub_type(self.type_.__args__[0], 'key_' + self.name, for_keys=True) | |||
self.type_ = self.type_.__args__[1] | |||
self.shape = SHAPE_MAPPING | |||
# Equality check as almost everything inherit form Iterable, including str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Equality check as almost everything inherit form Iterable, including str | |
# Equality check as almost everything inherits form Iterable, including str |
pydantic/fields.py
Outdated
@@ -1,10 +1,12 @@ | |||
import warnings | |||
from collections import abc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from collections import abc | |
from collections.abc import Iterable as CollectionsIterable |
?
I was confused for a minute, confusing this with Lib/abc.py
.
pydantic/fields.py
Outdated
@@ -416,6 +420,12 @@ def _type_analysis(self) -> None: # noqa: C901 (ignore complexity) | |||
self.key_field = self._create_sub_type(self.type_.__args__[0], 'key_' + self.name, for_keys=True) | |||
self.type_ = self.type_.__args__[1] | |||
self.shape = SHAPE_MAPPING | |||
# Equality check as almost everything inherit form Iterable, including str | |||
# check for typing.Iterable and abc.Iterable, as it could receive one even when declared with the other | |||
elif origin == Iterable or origin == abc.Iterable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif origin == Iterable or origin == abc.Iterable: | |
elif origin in {Iterable, CollectionsIterable}: |
or is this slower in cython?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea but this looks cleaner. Let's try it.
pydantic/fields.py
Outdated
This intentionally doesn't validate values to allow infinite generators. | ||
""" | ||
|
||
e: Optional[Exception] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to define this?
pydantic/fields.py
Outdated
iterable = iter(v) | ||
except TypeError: | ||
e = errors_.IterableError() | ||
return v, ErrorWrapper(e, loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return v, ErrorWrapper(e, loc) | |
return v, ErrorWrapper(errors_.IterableError(), loc) |
surely?
Thanks for the thorough code review @samuelcolvin ! I just implemented the changes. |
* ✨ Implement support for infinite generators with Iterable * ✅ Add tests for infinite generators * 🎨 Fix format * 📝 Add docs for infinite generators * 📝 Add changes file * ✨ Store sub_field with original type parameter to allow custom validation * 📝 Add example for validating first value in infinite generators * 🔥 Remove unused import in example * ✅ Add test for infinite generator with first-value validation * ♻️ Update fields with code review * 📝 Update example from code review * 📝 Update docs and format from code review
Change Summary
This implements support for adding typing declarations for infinite generators.
The current implementation allows annotating a generator with
Sequence[subType]
, but it consumes the generator to validate the internal data, so an infinite generator would just hang on model creation, while the model consumes the generator.The recommended way to annotate a generator that doesn't support
len
nor__getitem__
(e.g. an infinite generator) is withIterable[subType]
: https://mypy.readthedocs.io/en/latest/cheat_sheet_py3.html#standard-duck-typesThis implementation intentionally doesn't consume the generator, assuming it could be infinite or expensive, so it doesn't validate the iterated values. But it allows supporting these infinite generators in pydantic models.
The added docs give an explanation/warning about these generators not being validated and how they are there mainly to support infinite generators, but having
Sequence
as the (probably) preferred way to annotate consumable generators.The use case for this is that pydantic will be a requirement for https://github.com/explosion/spaCy and https://github.com/explosion/thinc (already on the
develop
branches). pydantic is used throughout the libraries, and for some configurations, the libraries take possibly infinite or expensive generators. For things like Machine Learning dataset loading, etc. So it's not viable to have the generators consumed at model creation.Related issue number
Checklist
changes/<pull request or issue id>-<github username>.md
file added describing change(see changes/README.md for details)