Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for declaring infinite generators #1152

Merged
merged 12 commits into from Jan 13, 2020

Conversation

tiangolo
Copy link
Member

@tiangolo tiangolo commented Jan 6, 2020

Change Summary

This implements support for adding typing declarations for infinite generators.

The current implementation allows annotating a generator with Sequence[subType], but it consumes the generator to validate the internal data, so an infinite generator would just hang on model creation, while the model consumes the generator.

The recommended way to annotate a generator that doesn't support len nor __getitem__ (e.g. an infinite generator) is with Iterable[subType]: https://mypy.readthedocs.io/en/latest/cheat_sheet_py3.html#standard-duck-types

This implementation intentionally doesn't consume the generator, assuming it could be infinite or expensive, so it doesn't validate the iterated values. But it allows supporting these infinite generators in pydantic models.

The added docs give an explanation/warning about these generators not being validated and how they are there mainly to support infinite generators, but having Sequence as the (probably) preferred way to annotate consumable generators.

The use case for this is that pydantic will be a requirement for https://github.com/explosion/spaCy and https://github.com/explosion/thinc (already on the develop branches). pydantic is used throughout the libraries, and for some configurations, the libraries take possibly infinite or expensive generators. For things like Machine Learning dataset loading, etc. So it's not viable to have the generators consumed at model creation.

Related issue number

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
    (see changes/README.md for details)

@codecov
Copy link

codecov bot commented Jan 6, 2020

Codecov Report

Merging #1152 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1152   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          20      20           
  Lines        3445    3479   +34     
  Branches      665     674    +9     
======================================
+ Hits         3445    3479   +34
Impacted Files Coverage Δ
pydantic/errors.py 100% <100%> (ø) ⬆️
pydantic/fields.py 100% <100%> (ø) ⬆️
pydantic/schema.py 100% <100%> (ø) ⬆️
pydantic/generics.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e169bd6...dbe25cc. Read the comment docs.

@tiangolo
Copy link
Member Author

@honnibal had the idea that we could do some validation for the first item without consuming the iterator, with something like value = next(iterator); return (value, itertools.chain([value], iterator).

I could document how to do that in a validator if it sounds okay.

@tiangolo
Copy link
Member Author

I went ahead and added docs for validating the first value: https://5e18740d4edea1000b2bd21d--pydantic-docs.netlify.com/usage/types/#infinite-generators-with-validation-for-first-value

That made me discover that there was an extra detail needed for it to work.

So I'm now storing the param type as a sub-field, with its automatic type and shape, to allow using it for normal validation.

It's documented and there's an extra test for that.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise this is looking great.

Comment on lines 34 to 38
for i in m.infinite:
print(i)
if i == 10:
break

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this again, it was useful in the first example, but hopefully they've got the idea now.

@@ -157,6 +160,36 @@ with custom properties and validation.
```
_(This script is complete, it should run "as is")_

### Infinite Generators

If you have a generator you can use `Sequence` as described above. In that case, the generator will be consumed and its values will be validated with the sub-type of `Sequence` (e.g. `int` in `Sequence[int]`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you have a generator you can use `Sequence` as described above. In that case, the generator will be consumed and its values will be validated with the sub-type of `Sequence` (e.g. `int` in `Sequence[int]`).
If you have a generator you can use `Sequence` as described above. In that case, the generator will be consumed and stored on the model as a list and its values will be validated with the sub-type of `Sequence` (e.g. `int` in `Sequence[int]`).

(I think?)

Also can you manually wrap lines in this file. (I think all other line are?)


pydantic can't validate the values automatically for you because it would require consuming the infinite generator.

#### Infinite Generators with Validation for First Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Infinite Generators with Validation for First Value
#### Validating the for first value


#### Infinite Generators with Validation for First Value

You can create a [Validator](validators.md) to validate the first value in an infinite generator and still not consume it entirely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create a [Validator](validators.md) to validate the first value in an infinite generator and still not consume it entirely.
You can create a [validator](validators.md) to validate the first value in an infinite generator and still not consume it entirely.

@@ -416,6 +420,12 @@ def _type_analysis(self) -> None: # noqa: C901 (ignore complexity)
self.key_field = self._create_sub_type(self.type_.__args__[0], 'key_' + self.name, for_keys=True)
self.type_ = self.type_.__args__[1]
self.shape = SHAPE_MAPPING
# Equality check as almost everything inherit form Iterable, including str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Equality check as almost everything inherit form Iterable, including str
# Equality check as almost everything inherits form Iterable, including str

@@ -1,10 +1,12 @@
import warnings
from collections import abc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from collections import abc
from collections.abc import Iterable as CollectionsIterable

?

I was confused for a minute, confusing this with Lib/abc.py.

@@ -416,6 +420,12 @@ def _type_analysis(self) -> None: # noqa: C901 (ignore complexity)
self.key_field = self._create_sub_type(self.type_.__args__[0], 'key_' + self.name, for_keys=True)
self.type_ = self.type_.__args__[1]
self.shape = SHAPE_MAPPING
# Equality check as almost everything inherit form Iterable, including str
# check for typing.Iterable and abc.Iterable, as it could receive one even when declared with the other
elif origin == Iterable or origin == abc.Iterable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif origin == Iterable or origin == abc.Iterable:
elif origin in {Iterable, CollectionsIterable}:

or is this slower in cython?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea but this looks cleaner. Let's try it.

This intentionally doesn't validate values to allow infinite generators.
"""

e: Optional[Exception] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to define this?

iterable = iter(v)
except TypeError:
e = errors_.IterableError()
return v, ErrorWrapper(e, loc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return v, ErrorWrapper(e, loc)
return v, ErrorWrapper(errors_.IterableError(), loc)

surely?

@tiangolo
Copy link
Member Author

Thanks for the thorough code review @samuelcolvin ! I just implemented the changes.

@samuelcolvin samuelcolvin merged commit 496551c into pydantic:master Jan 13, 2020
@tiangolo tiangolo deleted the infinite-generator branch January 13, 2020 16:15
andreshndz pushed a commit to cuenca-mx/pydantic that referenced this pull request Jan 17, 2020
* ✨ Implement support for infinite generators with Iterable

* ✅ Add tests for infinite generators

* 🎨 Fix format

* 📝 Add docs for infinite generators

* 📝 Add changes file

* ✨ Store sub_field with original type parameter to allow custom validation

* 📝 Add example for validating first value in infinite generators

* 🔥 Remove unused import in example

* ✅ Add test for infinite generator with first-value validation

* ♻️ Update fields with code review

* 📝 Update example from code review

* 📝 Update docs and format from code review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants