Skip to content

Conversation

skewty
Copy link
Contributor

@skewty skewty commented Dec 2, 2019

Change Summary

Tricks Cython into allowing str subclassing

Related issue number

Fixes #1060

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
    (see changes/README.md for details)

@codecov
Copy link

codecov bot commented Dec 2, 2019

Codecov Report

Merging #1061 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1061   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          20      20           
  Lines        3332    3349   +17     
  Branches      659     661    +2     
======================================
+ Hits         3332    3349   +17
Impacted Files Coverage Δ
pydantic/validators.py 100% <100%> (ø) ⬆️
pydantic/types.py 100% <100%> (ø) ⬆️
pydantic/mypy.py 100% <0%> (ø) ⬆️
pydantic/utils.py 100% <0%> (ø) ⬆️
pydantic/schema.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 151143d...f9bbbcb. Read the comment docs.

@skewty skewty marked this pull request as ready for review December 2, 2019 00:33
@dmontagu
Copy link
Contributor

dmontagu commented Dec 2, 2019

@skewty would you mind augmenting the test you've added to check for subclasses of each of the strict types?

Based on the response in cython/cython#3256 it sounds like there is a chance a similar issue could exist for the other strict type validators (or really anything in pydantic annotated as returning a builtin when it might actually return a subclass).

I don't think we need to go crazy, and there might not actually be anything to change here, but I think it's worth checking for issues with StrictInt, StrictBool, etc. while we are fixing this.

pass

class Model(BaseModel):
v: MyStrictStr
Copy link
Contributor

@dmontagu dmontagu Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, could you add class MyStrictInt(StrictInt): pass (and similar for StrictFloat and StrictBool), add them as fields on Model, and pass appropriate values in the m = Model(... line? Basically just paralleling what you've done here for MyStrictStr.

No need to add any more with pytest.raises(ValidationError) checks, since those shouldn't be affected by this issue.


If the tests pass after this change, we can have confidence the other types aren't broken; if they don't, fixing them should be as easy as changing int to Union[int] in the appropriate validators.

@samuelcolvin
Copy link
Member

Humm, I've checked and this seems to have a small but measurable detrimental effect on performance, perhaps ~5%.

Is it something that really useful enough to make the performance decrease a price worth paying?

@samuelcolvin
Copy link
Member

humm, maybe I'm wrong about the performance impact. I can't get it to be repeatable.

@skewty
Copy link
Contributor Author

skewty commented Dec 2, 2019

Subclassing StrictBool doesn't quite work so nicely because class StrictBool(int) The value a subclass gets for False is therefore just 0 instead of False. You cannot set bool as a base type. You get "TypeError: type 'bool' is not an acceptable base type". And, obviously, returning True or False would return the "wrong type". Should a warning be generated if somebody tries to subclass Constrained/StrictBool or would you prefer an attempt at making it subclassable?

@dmontagu
Copy link
Contributor

dmontagu commented Dec 2, 2019

I’m fine with not addressing StrictBool in this PR.

I think it probably makes sense to add a validator that checks for a subclass of StrictBool and returns the actual bool, or something along those lines, but I think that’s less likely to matter in practice and can be addressed later.

If you can get this to work for int str and float following this pattern let’s just go with that (assuming no substantial performance degradations).

@samuelcolvin
Copy link
Member

@dmontagu would it be possible for you to try this too and see if you can identify a performance impact?

From you comment #799 (comment) I can imagine it makes sense for there to be no performance impact, but I'm still hesitant.

@dmontagu
Copy link
Contributor

dmontagu commented Dec 3, 2019

@samuelcolvin I did a lot of runs using the following script to compare master vs a rebased version of skewty/master:

import numpy as np
import time


from pydantic import StrictStr, BaseModel, ValidationError


class Model(BaseModel):
    a: StrictStr


def benchmark_str():
    n_rounds = 15
    n_iterations_per_round = 100000
    success_times = []
    fail_times = []
    for _ in range(n_rounds):
        t0 = time.time()
        for _ in range(n_iterations_per_round):
            Model(a="hello")

        t1 = time.time()
        for _ in range(n_iterations_per_round):
            try:
                Model(a=123)
            except ValidationError:
                pass
        t2 = time.time()
        success_times.append(t1 - t0)
        fail_times.append(t2 - t1)
        print(f"StrictStr: {t1 - t0:.3f}s success, {t2 - t1:.3f}s fail")
    print(np.mean(success_times[5:]))
    print(np.mean(fail_times[5:]))

benchmark_str()

There is a lot of fluctuation from run to run, but after doing many runs back and forth, it seems to me that there is not a meaningful difference between the "success" times, but the "fail" times average about 1.5% higher.

Given the rarity of use of this type, and the relatively small increase in validation times, I'm personally okay with it.


After thinking about this more, if we wanted to keep all of the performance benefits but still allow arbitrary strict types, I think there is another fundamentally different route we could go: replacing the StrictX types with NewType-generated types, and adding a flag to the NewType that pydantic could detect during model creation. This could look something like:

StrictStr = NewType("StrictStr", str)
StrictStr.__pydantic_strict__ = True

This would mean that at runtime there is no difference between a str and a StrictStr, but we could still make use of the __pydantic_strict__ flag to modify the validator-generation logic during initialization.

You wouldn't be able to subclass, but if you are only doing it for typing-related reasons, you can create nested NewTypes, so you could do:

MyStrictStr = NewType("MyStrictStr", StrictStr)

and it would work like a subclass for type-checking purposes.

Given it would break subclasses I guess this would need to be a v2 thing, but I think it's worth thinking about. It would make it much easier to make arbitrary StrictX types -- we could just do isinstance checks for the NewType's parent type.

@skewty
Copy link
Contributor Author

skewty commented Dec 4, 2019

As being discussed in #1065, it may look more like this:

StrictStr = NewType("StrictStr", str)
StrictStr.__schema_attributes__ = SchemaAttributes(x, y, z)  # could just be a dict

I would like the define more complex validation inside the class declaration if possible.
Having the validation always included for free when I use the constrained / strict variant of a variable type is the big win for me.

typing.NewType doesn't play nice with the class keyword.

from typing import NewType
Str = NewType("Str", str)
class MyStr(Str):
...    __foo__ = "bar"
...
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: function() argument 1 must be code, not str

But this works:

MyStr = NewType("MyStr", NewType("Str", str))

so it follows that:

from typing import NewType
from pydantic import BaseModel, SchemaAttributes

StrictStr = NewType("StrictStr", str)
StrictStr.__schema_attributes__ = SchemaAttributes(min_length=15)

class MyStrictStr(StrictStr):

    @classmethod
    def __get_validators__(cls):
        yield cls.validate_prefix

    @classmethod
    def validate_prefix(cls, value):
        if str(value).startswith("foo:"):
            return value
        raise ValueError('Nope!')

would also fail with TypeError: function() argument 1 must be code, not str.

What are you proposing for custom validation? Re-usable validator could be manually added to the class where the StrictX is being included.

@dmontagu
Copy link
Contributor

dmontagu commented Dec 4, 2019

I was imagining that StrictX didn’t get any custom validation beyond ensuring it was the right type. If you wanted custom validation, you could use your own custom type not derived from StrictX, and manually add the isinstance check. If that’s less useful or otherwise not what we want, I’m also happy to drop the NewType idea entirely.

@skewty
Copy link
Contributor Author

skewty commented Dec 4, 2019

That would work for me. I would like to make use of SchemaAttributes though. I like the idea of some free JSON schema and validation.

So maybe something like this?

from pydantic import BaseModel, SchemaAttributes

class FooStr(str):
    __schema_attributes__ = SchemaAttributes(min_length=15)

    @classmethod
    def __get_validators__(cls):
        yield cls.validate_prefix

    @classmethod
    def validate_prefix(cls, value):
        if str(value).startswith("foo:"):
            return value
        raise ValueError('Nope!')

@samuelcolvin
Copy link
Member

This discussion went slightly off-topic.

Is this ready to merge? Are we sure it doesn't effect performance?

@dmontagu
Copy link
Contributor

@samuelcolvin

There is a lot of fluctuation from run to run, but after doing many runs back and forth, it seems to me that there is not a meaningful difference between the "success" times, but the "fail" times average about 1.5% higher.

By fail I mean when validation fails. This is specific to the StrictStr.

@samuelcolvin samuelcolvin merged commit 3804998 into pydantic:master Dec 10, 2019
@samuelcolvin
Copy link
Member

Sorry @dmontagu, I got confused.

@skewty thank you very much.

andreshndz pushed a commit to cuenca-mx/pydantic that referenced this pull request Jan 17, 2020
* Trick Cython into allowing str subclassing

* Added a changes file

* Removed 2nd paragraph from changes file so it passes tests

* Augmented with tests for StrictInt/Float

* Removed commented out StrictBool subclass testing
alexdrydew pushed a commit to alexdrydew/pydantic that referenced this pull request Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subclassing StrictStr / ConstrainedStr causes ValidationError
3 participants