Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2: hypothesis plugin rewrite #4682

Open
samuelcolvin opened this issue Oct 28, 2022 · 16 comments
Open

V2: hypothesis plugin rewrite #4682

samuelcolvin opened this issue Oct 28, 2022 · 16 comments
Labels
Change Suggested alteration to pydantic, not a new feature nor a bug help wanted Pull Request welcome hypothesis related to Hypothesis testing library

Comments

@samuelcolvin
Copy link
Member

As per #4516, pydantic/_hypothesis.py has been ravaged, I'm sorry about that, it seemed just about worthwhile keeping it in place while preventing mypy/flake8 errors, but I couldn't stretch to keeping it working with the new code in that PR.

Once #4516 is merged, we therefore need to rewrite it and move it into _internal to respect the new use of Annotated for most constraints. I don't know how helpful the new #4680 conversion table might be too?

@Zac-HD as the expert on hypothesis and originally developer of the module, it would of course be amazing if you could work on this. But I understand if that's not possible, please let me know either way here. Also any extra insights or suggestions you have about how to go about this would be much appreciated.

@samuelcolvin samuelcolvin added help wanted Pull Request welcome Change Suggested alteration to pydantic, not a new feature nor a bug labels Oct 28, 2022
@Zac-HD
Copy link
Contributor

Zac-HD commented Oct 30, 2022

I can definitely offer high-intensity code review and design advice, but unfortunately I'm already well over capacity for OSS projects at the moment1. I'd expect this to fall into three main pieces:

  1. Supporting annotated-types constraints in Check for annotated-types constraints in st.from_type(Annotated[T, ...]) HypothesisWorks/hypothesis#3356. If it's not Pydantic-specific metadata, the relevant support code should all live in Hypothesis itself rather than Pydantic 😁
  2. Make any Pydantic-specific constraints also implement the annotated_types.GroupedMetadata protocol, wherever that would be enough for Hypothesis to generate valid instances. Where it wouldn't be enough, we'll probably need to register strategies or something, details TBD when I can consider some examples in detail (this may require new hooks in Hypothesis, though I really hope not)
  3. Plugging everything together so that we can resolve models as well as fields; I expect this to look similar to the current WeakSet/register_type_strategy() code - hopefully a little cleaner - and it's pretty easy unless there are dependencies between fields (e.g. "instance.a must be less than instance.b")

Happy to have a call sometime if that's useful, probably once (1) above is well under way.

Footnotes

  1. open-sourcing https://github.com/Zac-HD/hypofuzz has kept me busy, and there's still a lot of low-hanging fruit there.

@smhavens
Copy link

smhavens commented Nov 18, 2022

Hello, I was wonder how difficult this issue would be to complete? I and my partner @marcoo47 are interested in this and would be our first contribution.

@Zac-HD
Copy link
Contributor

Zac-HD commented Nov 18, 2022

I think it's going to be a fairly large and intricate challenge, and recommend getting solid experience contributing to Hypothesis and Pydantic before taking it on 🙂

@samuelcolvin
Copy link
Member Author

@tonibofarull
Copy link

Any news on this? 😢

@Viicos
Copy link
Contributor

Viicos commented Apr 19, 2024

I started looking into this a while ago by first fixing HypothesisWorks/hypothesis#3356. However, the implementation is not perfect and some annotated types are not yet supported. If I find some time, I can take a look at this although I expect challenges to be faced during the implementation

@Zac-HD
Copy link
Contributor

Zac-HD commented Apr 19, 2024

Hypothesis also handles collection-length bounds and some regex filters: HypothesisWorks/hypothesis#3795

This issue might be a nice project for people to work on at the PyCon sprints in a few weeks 🙂

@butterlyn
Copy link

butterlyn commented May 1, 2024

Are there any recommended workarounds while this issue is still open, aside from reverting to Pydantic V1? Perhaps limiting which Pydantic V2 features are used or how they're used in a project, in combination with hypothesis-jsonschema?

@iurisilvio
Copy link

@butterlyn I generated the model jsonschema and used hypothesis-jsonschema to generate data. Very straightforward in most cases, but a few tricks:

  • Types with too many restrictions are very slow to generate data. I bypassed this restriction using custom formatter rules.
  • Recursive models are not supported.

@butterlyn
Copy link

I bypassed this restriction using custom formatter rules

Thanks @iurisilvio, that makes a lot of sense. Could you please elaborate on this trick you mention about custom formatter rules? Is this a Pydantic or a Hypothesis feature?

@iurisilvio
Copy link

iurisilvio commented May 1, 2024

The hypothesis-jsonschema from_schema function accept custom_rules. You can pass any strategy there as {"foo": st.integers()}
Then you add json_schema_extra={"format": "foo"} to your pydantic field.

Your model.model_json_schema() will be generated with the extra defined and hypothesis-jsonchema will use the custom rule to generate data instead of building a rule in runtime.

It is useful for strings with patterns and limited sizes, because hypothesis first generate a string matching the pattern, then check the size constraints. It fails most of the time for high min lengths for example, making the data generation very slow.

If you publish your jsonschema, maybe you don't want to add the extra format, you can patch the jsonschema manually, but it is difficult to make it as a generic solution.

@iurisilvio
Copy link

iurisilvio commented May 1, 2024

A working example:

from typing import Annotated

from hypothesis import given, strategies as st
from hypothesis_jsonschema import from_schema
from pydantic import BaseModel, Field, StringConstraints

class MyModel(BaseModel):
    foo: Annotated[
        str, StringConstraints(min_length=36, pattern=r"^[abc]+$")
    ] = Field(json_schema_extra={"format": "foo"})


schema = MyModel.model_json_schema()

# or hack without `json_schema_extra` defined
# schema["properties"]["foo"]["format"] = "foo"

@given(from_schema(schema))
def test_foo_fail(data):
    pass


@given(
    from_schema(
        schema,
        custom_formats={
            "foo": st.text(
                alphabet=st.sampled_from(["a", "b", "c"]),
                min_size=36,
            )}
    )
)
def test_foo_pass(data):
    pass

@Viicos
Copy link
Contributor

Viicos commented May 10, 2024

@Zac-HD (sorry for the ping),

when we implemented annotated-types to hypothesis, we roughly went for a way to map at.BaseMetadata -> BaseStrategy.filter condition.

While Pydantic uses mostly the BaseMetadata subclasses from annotated-types, it still uses a couple custom subclasses (one of them being dynamically created, probably for a good reason).

Would there be a way to add a register mechanism for annotated constraints? If so, should it be in the same form as it is currently? (i.e. my custom base metadata subclass -> filter condition).

Thanks in advance

Edit: Pydantic also uses arbitrary classes meant to be used in Annotated metadata (e.g. UuidVersion). For them, we can either:

  • tweak them to be a BaseMetadata subclass (not sure how feasible this is).
  • Allow registering arbitrary hypothesis Annotated metadata classes, not only annotated_types.BaseMetadata subclasses as suggested above.

@Zac-HD
Copy link
Contributor

Zac-HD commented May 10, 2024

To provide filters, it'd be best for Hypothesis if you could make your annotations into either subclasses of at.Predicate, or of at.GroupedMetadata which returns a Predicate.

For arbitrary other annotations, I think there are basically two options Hypothesis could support.

  1. Insist that they should be GroupedMetadata instances, and yield either other known constraints (which wouldn't work for UUIDs), or a SearchStrategy object (which would need some one-off work in Hypothesis, but fits well)
  2. Somehow hook into the jsonschema (via hypothesis-jsonschema, which would need a lot of work) or core schema (whole new project), and generate from those. We don't have capacity to maintain that on the Hypothesis side in either case, unfortunately.

@Viicos
Copy link
Contributor

Viicos commented May 13, 2024

(plan still WIP).

Hypothesis gained the ability to yield strategies from the GroupedMetadata.__iter__ method (as per the documentation, any object can be).

Meaning it is possible to do:

from typing import Annotated
from uuid import UUID

from annotated_types import GroupedMetadata
from hypothesis.strategies import from_type, uuids

class MyMetadata(GroupedMetadata):
    def __iter__(self):
        yield uuids(version=1)

st = from_type(Annotated[UUID, MyMetadata()])
#> uuids(version=1)

As Pydantic implements constraints in different ways, I'll list the available types and how they are going to be supported in hypothesis.

Types implemented using annotated-types

For example, PositiveInt = Annotated[int, annotated_types.Gt(0)]. These are already supported by hypothesis, they do not require extra work.

List
  • PositiveInt
  • NegativeInt
  • NonPositiveInt
  • NonNegativeInt
  • PositiveFloat
  • NegativeFloat
  • NonPositiveFloat
  • NonNegativeFloat
  • conint
  • confloat
  • conbytes
  • conset
  • confrozenset
  • conlist
  • condate

Arbitrary classes used as annotations

For example, SecretStr. These can easily be registered using register_type_strategy.

  • ImportString (if used as is, without being subscribed -- if used like ImportString[T], see above): As done in Pydantic V1, the math module can probably be used.
  • Json (if used as is, without being subscribed -- if used like Json[T], see above).
  • _SecretBase: Should look at Add Secret base type #8519 to understand the class hierarchy.
  • PaymentCardNumber: Deprecated, probably the type from pydantic-extra-types should be supported instead.
  • ByteSize
  • PastDate, FutureDate, AwareDatetime, NaiveDatetime, PastDatetime, FutureDatetime: When used as a type: foo: PastDate (if used with Annotated, see below).

Classes used with Annotated

This gets tricky. For some, it can be implemented without too much issues, but others need to know the annotated type. For instance, Json:

from pydantic import BaseModel, Json

class Model(BaseModel):
    json_obj: Json[Any]
    # Which translates to:
    # json_obj: Annotated[Any, Json()]
    # While `Json` could subclass `at.GroupedMetadata` and implement `__iter__` to yield a strategy,
    # it needs to know about the annotated type (in this case `Any`), which is not the case currently.
  • AllowInfNan: same issue as example above, needs to know about the annotated type.
  • StringConstraints: already subclasses annotated_types.GroupedMetadata but yields pydantic_general_metadata(), which accepts any metadata. Probably some refactoring should happen here, still need to investigate.
  • UuidVersion: should subclass annotated_types.GroupedMetadata and implement __iter__ to yield hypothesis.strategies.uuids(...).
  • ImportString[T]: translates to Annotated[T, ImportString()]. For example, ImportString[Annotated[float, Ge(3)]] will not accept math.e. While we could (with some significant work) categorize members of the math module to have a fixed set of objects available that could match the constraints, we need to decide what to do if no known member exists (e.g. ImportString[str] will not be resolvable, there's no str member in the math module).
  • Json[T]: as said in the example above, needs to be changed to know about the annotated type.
  • PastDate, FutureDate, AwareDatetime, NaiveDatetime, PastDatetime, FutureDatetime: When used as with Annotated: foo: Annotated[date, PastDate()].
  • EncodedBytes, EncodedStr: TODO

Classes that should have no effect

No changes needed (e.g. only affects validation):

List
  • Strict
  • PathType: This class will enforce a path to either be a dir, a file or to not exist yet. I think having real files to exist on the file system is out of scope for hypothesis.

Needs investigation

  • JsonValue

Notes for myself:

  • EncodedStr docstring uses bytes in some places, should be fixed

@Zac-HD
Copy link
Contributor

Zac-HD commented May 13, 2024

  • condate doesn't yet do the filter-rewriting trick for efficiency, but it could. I'll open an issue for the PyCon sprints.

  • {Past,Future}{Date,Datetime} is somewhat tricky, because the reference point changes between runs. In particular, these shrink towards the first day/microsecond of 2000, so the past is probably fine but a minimal example in the future is likely to be invalid when rerun. I don't have a good solution to this but will keep thinking about it.

    • Otherwise, you could use at.Lt(now)/at.Gt(now) once the condate support is in.
    • Uh, will it be a problem if we generate a datetime between the-moment-the-strategy-was-created and the-moment-of-validation? Because I don't see a way to avoid that; you can bet on fast-tests-without-clock-changes if you want and just say that the future is not-the-near-future, but then your tests can't detect bugs which only occur close to the present.
  • {Aware,Naive}Datetime - can we use at.Timezone(...)/at.Timezone(None) for these?

    • Unfortunately Hypothesis is in trouble here: our default from_type(datetime) strategy has .tzinfo=None, and you can't change that by filtering. Non-filter-based approaches would violate an otherwise-consistent mental model. Changing the default to timzone=st.none()|st.timezones() would be a pretty big deprecation, and also have pretty nasty ergonomics.
      • so probably you'll need to yield a strategy for aware datetimes too, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Change Suggested alteration to pydantic, not a new feature nor a bug help wanted Pull Request welcome hypothesis related to Hypothesis testing library
Projects
No open projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

10 participants