V2: hypothesis plugin rewrite #4682

samuelcolvin · 2022-10-28T10:19:26Z

As per #4516, pydantic/_hypothesis.py has been ravaged, I'm sorry about that, it seemed just about worthwhile keeping it in place while preventing mypy/flake8 errors, but I couldn't stretch to keeping it working with the new code in that PR.

Once #4516 is merged, we therefore need to rewrite it and move it into _internal to respect the new use of Annotated for most constraints. I don't know how helpful the new #4680 conversion table might be too?

@Zac-HD as the expert on hypothesis and originally developer of the module, it would of course be amazing if you could work on this. But I understand if that's not possible, please let me know either way here. Also any extra insights or suggestions you have about how to go about this would be much appreciated.

The text was updated successfully, but these errors were encountered:

Zac-HD · 2022-10-30T22:47:22Z

I can definitely offer high-intensity code review and design advice, but unfortunately I'm already well over capacity for OSS projects at the moment¹. I'd expect this to fall into three main pieces:

Supporting annotated-types constraints in Check for annotated-types constraints in st.from_type(Annotated[T, ...]) HypothesisWorks/hypothesis#3356. If it's not Pydantic-specific metadata, the relevant support code should all live in Hypothesis itself rather than Pydantic 😁
Make any Pydantic-specific constraints also implement the annotated_types.GroupedMetadata protocol, wherever that would be enough for Hypothesis to generate valid instances. Where it wouldn't be enough, we'll probably need to register strategies or something, details TBD when I can consider some examples in detail (this may require new hooks in Hypothesis, though I really hope not)
Plugging everything together so that we can resolve models as well as fields; I expect this to look similar to the current WeakSet/register_type_strategy() code - hopefully a little cleaner - and it's pretty easy unless there are dependencies between fields (e.g. "instance.a must be less than instance.b")

Happy to have a call sometime if that's useful, probably once (1) above is well under way.

open-sourcing https://github.com/Zac-HD/hypofuzz has kept me busy, and there's still a lot of low-hanging fruit there. ↩

smhavens · 2022-11-18T17:54:42Z

Hello, I was wonder how difficult this issue would be to complete? I and my partner @marcoo47 are interested in this and would be our first contribution.

Zac-HD · 2022-11-18T19:54:08Z

I think it's going to be a fairly large and intricate challenge, and recommend getting solid experience contributing to Hypothesis and Pydantic before taking it on 🙂

samuelcolvin · 2023-05-05T09:43:41Z

see annotated-types/annotated-types#37

tonibofarull · 2024-04-19T09:57:27Z

Any news on this? 😢

Viicos · 2024-04-19T11:51:33Z

I started looking into this a while ago by first fixing HypothesisWorks/hypothesis#3356. However, the implementation is not perfect and some annotated types are not yet supported. If I find some time, I can take a look at this although I expect challenges to be faced during the implementation

Zac-HD · 2024-04-19T18:18:17Z

Hypothesis also handles collection-length bounds and some regex filters: HypothesisWorks/hypothesis#3795

This issue might be a nice project for people to work on at the PyCon sprints in a few weeks 🙂

butterlyn · 2024-05-01T00:40:04Z

Are there any recommended workarounds while this issue is still open, aside from reverting to Pydantic V1? Perhaps limiting which Pydantic V2 features are used or how they're used in a project, in combination with hypothesis-jsonschema?

iurisilvio · 2024-05-01T04:41:30Z

@butterlyn I generated the model jsonschema and used hypothesis-jsonschema to generate data. Very straightforward in most cases, but a few tricks:

Types with too many restrictions are very slow to generate data. I bypassed this restriction using custom formatter rules.
Recursive models are not supported.

butterlyn · 2024-05-01T05:27:23Z

I bypassed this restriction using custom formatter rules

Thanks @iurisilvio, that makes a lot of sense. Could you please elaborate on this trick you mention about custom formatter rules? Is this a Pydantic or a Hypothesis feature?

iurisilvio · 2024-05-01T05:51:00Z

The hypothesis-jsonschema from_schema function accept custom_rules. You can pass any strategy there as {"foo": st.integers()}
Then you add json_schema_extra={"format": "foo"} to your pydantic field.

Your model.model_json_schema() will be generated with the extra defined and hypothesis-jsonchema will use the custom rule to generate data instead of building a rule in runtime.

It is useful for strings with patterns and limited sizes, because hypothesis first generate a string matching the pattern, then check the size constraints. It fails most of the time for high min lengths for example, making the data generation very slow.

If you publish your jsonschema, maybe you don't want to add the extra format, you can patch the jsonschema manually, but it is difficult to make it as a generic solution.

iurisilvio · 2024-05-01T06:52:15Z

A working example:

from typing import Annotated

from hypothesis import given, strategies as st
from hypothesis_jsonschema import from_schema
from pydantic import BaseModel, Field, StringConstraints

class MyModel(BaseModel):
    foo: Annotated[
        str, StringConstraints(min_length=36, pattern=r"^[abc]+$")
    ] = Field(json_schema_extra={"format": "foo"})


schema = MyModel.model_json_schema()

# or hack without `json_schema_extra` defined
# schema["properties"]["foo"]["format"] = "foo"

@given(from_schema(schema))
def test_foo_fail(data):
    pass


@given(
    from_schema(
        schema,
        custom_formats={
            "foo": st.text(
                alphabet=st.sampled_from(["a", "b", "c"]),
                min_size=36,
            )}
    )
)
def test_foo_pass(data):
    pass

Viicos · 2024-05-10T15:47:36Z

@Zac-HD (sorry for the ping),

when we implemented annotated-types to hypothesis, we roughly went for a way to map at.BaseMetadata -> BaseStrategy.filter condition.

While Pydantic uses mostly the BaseMetadata subclasses from annotated-types, it still uses a couple custom subclasses (one of them being dynamically created, probably for a good reason).

Would there be a way to add a register mechanism for annotated constraints? If so, should it be in the same form as it is currently? (i.e. my custom base metadata subclass -> filter condition).

Thanks in advance

Edit: Pydantic also uses arbitrary classes meant to be used in Annotated metadata (e.g. UuidVersion). For them, we can either:

tweak them to be a BaseMetadata subclass (not sure how feasible this is).
Allow registering arbitrary hypothesis Annotated metadata classes, not only annotated_types.BaseMetadata subclasses as suggested above.

Zac-HD · 2024-05-10T22:08:25Z

To provide filters, it'd be best for Hypothesis if you could make your annotations into either subclasses of at.Predicate, or of at.GroupedMetadata which returns a Predicate.

For arbitrary other annotations, I think there are basically two options Hypothesis could support.

Insist that they should be GroupedMetadata instances, and yield either other known constraints (which wouldn't work for UUIDs), or a SearchStrategy object (which would need some one-off work in Hypothesis, but fits well)
Somehow hook into the jsonschema (via hypothesis-jsonschema, which would need a lot of work) or core schema (whole new project), and generate from those. We don't have capacity to maintain that on the Hypothesis side in either case, unfortunately.

Viicos · 2024-05-13T21:02:47Z

(plan still WIP).

Hypothesis gained the ability to yield strategies from the GroupedMetadata.__iter__ method (as per the documentation, any object can be).

Meaning it is possible to do:

from typing import Annotated
from uuid import UUID

from annotated_types import GroupedMetadata
from hypothesis.strategies import from_type, uuids

class MyMetadata(GroupedMetadata):
    def __iter__(self):
        yield uuids(version=1)

st = from_type(Annotated[UUID, MyMetadata()])
#> uuids(version=1)

As Pydantic implements constraints in different ways, I'll list the available types and how they are going to be supported in hypothesis.

Types implemented using `annotated-types`

For example, PositiveInt = Annotated[int, annotated_types.Gt(0)]. These are already supported by hypothesis, they do not require extra work.

List

PositiveInt
NegativeInt
NonPositiveInt
NonNegativeInt
PositiveFloat
NegativeFloat
NonPositiveFloat
NonNegativeFloat
conint
confloat
conbytes
conset
confrozenset
conlist
condate

Arbitrary classes used as annotations

For example, SecretStr. These can easily be registered using register_type_strategy.

ImportString (if used as is, without being subscribed -- if used like ImportString[T], see above): As done in Pydantic V1, the math module can probably be used.
Json (if used as is, without being subscribed -- if used like Json[T], see above).
_SecretBase: Should look at Add Secret base type #8519 to understand the class hierarchy.
PaymentCardNumber: Deprecated, probably the type from pydantic-extra-types should be supported instead.
ByteSize
PastDate, FutureDate, AwareDatetime, NaiveDatetime, PastDatetime, FutureDatetime: When used as a type: foo: PastDate (if used with Annotated, see below).

Classes used with `Annotated`

This gets tricky. For some, it can be implemented without too much issues, but others need to know the annotated type. For instance, Json:

from pydantic import BaseModel, Json

class Model(BaseModel):
    json_obj: Json[Any]
    # Which translates to:
    # json_obj: Annotated[Any, Json()]
    # While `Json` could subclass `at.GroupedMetadata` and implement `__iter__` to yield a strategy,
    # it needs to know about the annotated type (in this case `Any`), which is not the case currently.

AllowInfNan: same issue as example above, needs to know about the annotated type.
StringConstraints: already subclasses annotated_types.GroupedMetadata but yields pydantic_general_metadata(), which accepts any metadata. Probably some refactoring should happen here, still need to investigate.
UuidVersion: should subclass annotated_types.GroupedMetadata and implement __iter__ to yield hypothesis.strategies.uuids(...).
ImportString[T]: translates to Annotated[T, ImportString()]. For example, ImportString[Annotated[float, Ge(3)]] will not accept math.e. While we could (with some significant work) categorize members of the math module to have a fixed set of objects available that could match the constraints, we need to decide what to do if no known member exists (e.g. ImportString[str] will not be resolvable, there's no str member in the math module).
Json[T]: as said in the example above, needs to be changed to know about the annotated type.
PastDate, FutureDate, AwareDatetime, NaiveDatetime, PastDatetime, FutureDatetime: When used as with Annotated: foo: Annotated[date, PastDate()].
EncodedBytes, EncodedStr: TODO

Classes that should have no effect

No changes needed (e.g. only affects validation):

List

Strict
PathType: This class will enforce a path to either be a dir, a file or to not exist yet. I think having real files to exist on the file system is out of scope for hypothesis.

Needs investigation

JsonValue

Notes for myself:

EncodedStr docstring uses bytes in some places, should be fixed

Zac-HD · 2024-05-13T21:47:51Z

condate doesn't yet do the filter-rewriting trick for efficiency, but it could. I'll open an issue for the PyCon sprints.
{Past,Future}{Date,Datetime} is somewhat tricky, because the reference point changes between runs. In particular, these shrink towards the first day/microsecond of 2000, so the past is probably fine but a minimal example in the future is likely to be invalid when rerun. I don't have a good solution to this but will keep thinking about it.
- Otherwise, you could use at.Lt(now)/at.Gt(now) once the condate support is in.
- Uh, will it be a problem if we generate a datetime between the-moment-the-strategy-was-created and the-moment-of-validation? Because I don't see a way to avoid that; you can bet on fast-tests-without-clock-changes if you want and just say that the future is not-the-near-future, but then your tests can't detect bugs which only occur close to the present.
{Aware,Naive}Datetime - can we use at.Timezone(...)/at.Timezone(None) for these?
- Unfortunately Hypothesis is in trouble here: our default from_type(datetime) strategy has .tzinfo=None, and you can't change that by filtering. Non-filter-based approaches would violate an otherwise-consistent mental model. Changing the default to timzone=st.none()|st.timezones() would be a pretty big deprecation, and also have pretty nasty ergonomics.
  - so probably you'll need to yield a strategy for aware datetimes too, sorry.

samuelcolvin added help wanted Pull Request welcome Change Suggested alteration to pydantic, not a new feature nor a bug labels Oct 28, 2022

samuelcolvin assigned dmontagu Feb 20, 2023

samuelcolvin unassigned dmontagu Apr 17, 2023

Kludex added the hypothesis related to Hypothesis testing library label Apr 25, 2023

dmontagu self-assigned this Apr 28, 2023

samuelcolvin unassigned dmontagu May 4, 2023

hirotasoshu mentioned this issue May 4, 2023

Add hypothesis plugin pydantic/pydantic-extra-types#34

Closed

JacobCoffee mentioned this issue May 23, 2023

Enhancement: Pydantic V2 Support litestar-org/polyfactory#218

Closed

lig self-assigned this May 24, 2023

lig mentioned this issue May 25, 2023

✨ Implement Hypothesis integration #5871

Closed

5 tasks

lig removed their assignment Dec 14, 2023

Zac-HD mentioned this issue May 13, 2024

Improve handling of GroupedMetadata HypothesisWorks/hypothesis#3986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2: hypothesis plugin rewrite #4682

V2: hypothesis plugin rewrite #4682

samuelcolvin commented Oct 28, 2022

Zac-HD commented Oct 30, 2022

smhavens commented Nov 18, 2022 •

edited

Zac-HD commented Nov 18, 2022

samuelcolvin commented May 5, 2023

tonibofarull commented Apr 19, 2024

Viicos commented Apr 19, 2024

Zac-HD commented Apr 19, 2024

butterlyn commented May 1, 2024 •

edited

iurisilvio commented May 1, 2024

butterlyn commented May 1, 2024

iurisilvio commented May 1, 2024 •

edited

iurisilvio commented May 1, 2024 •

edited

Viicos commented May 10, 2024 •

edited

Zac-HD commented May 10, 2024

Viicos commented May 13, 2024 •

edited

Zac-HD commented May 13, 2024

V2: hypothesis plugin rewrite #4682

V2: hypothesis plugin rewrite #4682

Comments

samuelcolvin commented Oct 28, 2022

Zac-HD commented Oct 30, 2022

Footnotes

smhavens commented Nov 18, 2022 • edited

Zac-HD commented Nov 18, 2022

samuelcolvin commented May 5, 2023

tonibofarull commented Apr 19, 2024

Viicos commented Apr 19, 2024

Zac-HD commented Apr 19, 2024

butterlyn commented May 1, 2024 • edited

iurisilvio commented May 1, 2024

butterlyn commented May 1, 2024

iurisilvio commented May 1, 2024 • edited

iurisilvio commented May 1, 2024 • edited

Viicos commented May 10, 2024 • edited

Zac-HD commented May 10, 2024

Viicos commented May 13, 2024 • edited

Types implemented using annotated-types

Arbitrary classes used as annotations

Classes used with Annotated

Classes that should have no effect

Needs investigation

Zac-HD commented May 13, 2024

smhavens commented Nov 18, 2022 •

edited

butterlyn commented May 1, 2024 •

edited

iurisilvio commented May 1, 2024 •

edited

iurisilvio commented May 1, 2024 •

edited

Viicos commented May 10, 2024 •

edited

Viicos commented May 13, 2024 •

edited

Types implemented using `annotated-types`

Classes used with `Annotated`