-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Trick Cython into allowing str subclassing #1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1061 +/- ##
======================================
Coverage 100% 100%
======================================
Files 20 20
Lines 3332 3349 +17
Branches 659 661 +2
======================================
+ Hits 3332 3349 +17
Continue to review full report at Codecov.
|
|
@skewty would you mind augmenting the test you've added to check for subclasses of each of the strict types? Based on the response in cython/cython#3256 it sounds like there is a chance a similar issue could exist for the other strict type validators (or really anything in pydantic annotated as returning a builtin when it might actually return a subclass). I don't think we need to go crazy, and there might not actually be anything to change here, but I think it's worth checking for issues with |
| pass | ||
|
|
||
| class Model(BaseModel): | ||
| v: MyStrictStr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, could you add class MyStrictInt(StrictInt): pass (and similar for StrictFloat and StrictBool), add them as fields on Model, and pass appropriate values in the m = Model(... line? Basically just paralleling what you've done here for MyStrictStr.
No need to add any more with pytest.raises(ValidationError) checks, since those shouldn't be affected by this issue.
If the tests pass after this change, we can have confidence the other types aren't broken; if they don't, fixing them should be as easy as changing int to Union[int] in the appropriate validators.
|
Humm, I've checked and this seems to have a small but measurable detrimental effect on performance, perhaps ~5%. Is it something that really useful enough to make the performance decrease a price worth paying? |
|
humm, maybe I'm wrong about the performance impact. I can't get it to be repeatable. |
|
Subclassing |
|
I’m fine with not addressing StrictBool in this PR. I think it probably makes sense to add a validator that checks for a subclass of StrictBool and returns the actual bool, or something along those lines, but I think that’s less likely to matter in practice and can be addressed later. If you can get this to work for int str and float following this pattern let’s just go with that (assuming no substantial performance degradations). |
|
@dmontagu would it be possible for you to try this too and see if you can identify a performance impact? From you comment #799 (comment) I can imagine it makes sense for there to be no performance impact, but I'm still hesitant. |
|
@samuelcolvin I did a lot of runs using the following script to compare master vs a rebased version of skewty/master: import numpy as np
import time
from pydantic import StrictStr, BaseModel, ValidationError
class Model(BaseModel):
a: StrictStr
def benchmark_str():
n_rounds = 15
n_iterations_per_round = 100000
success_times = []
fail_times = []
for _ in range(n_rounds):
t0 = time.time()
for _ in range(n_iterations_per_round):
Model(a="hello")
t1 = time.time()
for _ in range(n_iterations_per_round):
try:
Model(a=123)
except ValidationError:
pass
t2 = time.time()
success_times.append(t1 - t0)
fail_times.append(t2 - t1)
print(f"StrictStr: {t1 - t0:.3f}s success, {t2 - t1:.3f}s fail")
print(np.mean(success_times[5:]))
print(np.mean(fail_times[5:]))
benchmark_str()There is a lot of fluctuation from run to run, but after doing many runs back and forth, it seems to me that there is not a meaningful difference between the "success" times, but the "fail" times average about 1.5% higher. Given the rarity of use of this type, and the relatively small increase in validation times, I'm personally okay with it. After thinking about this more, if we wanted to keep all of the performance benefits but still allow arbitrary strict types, I think there is another fundamentally different route we could go: replacing the StrictStr = NewType("StrictStr", str)
StrictStr.__pydantic_strict__ = TrueThis would mean that at runtime there is no difference between a str and a You wouldn't be able to subclass, but if you are only doing it for typing-related reasons, you can create nested NewTypes, so you could do: MyStrictStr = NewType("MyStrictStr", StrictStr)and it would work like a subclass for type-checking purposes. Given it would break subclasses I guess this would need to be a v2 thing, but I think it's worth thinking about. It would make it much easier to make arbitrary |
|
As being discussed in #1065, it may look more like this: StrictStr = NewType("StrictStr", str)
StrictStr.__schema_attributes__ = SchemaAttributes(x, y, z) # could just be a dictI would like the define more complex validation inside the class declaration if possible.
from typing import NewType
Str = NewType("Str", str)
class MyStr(Str):
... __foo__ = "bar"
...
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: function() argument 1 must be code, not strBut this works: MyStr = NewType("MyStr", NewType("Str", str))so it follows that: from typing import NewType
from pydantic import BaseModel, SchemaAttributes
StrictStr = NewType("StrictStr", str)
StrictStr.__schema_attributes__ = SchemaAttributes(min_length=15)
class MyStrictStr(StrictStr):
@classmethod
def __get_validators__(cls):
yield cls.validate_prefix
@classmethod
def validate_prefix(cls, value):
if str(value).startswith("foo:"):
return value
raise ValueError('Nope!')would also fail with What are you proposing for custom validation? Re-usable validator could be manually added to the class where the StrictX is being included. |
|
I was imagining that |
|
That would work for me. I would like to make use of So maybe something like this? from pydantic import BaseModel, SchemaAttributes
class FooStr(str):
__schema_attributes__ = SchemaAttributes(min_length=15)
@classmethod
def __get_validators__(cls):
yield cls.validate_prefix
@classmethod
def validate_prefix(cls, value):
if str(value).startswith("foo:"):
return value
raise ValueError('Nope!') |
|
This discussion went slightly off-topic. Is this ready to merge? Are we sure it doesn't effect performance? |
By fail I mean when validation fails. This is specific to the StrictStr. |
* Trick Cython into allowing str subclassing * Added a changes file * Removed 2nd paragraph from changes file so it passes tests * Augmented with tests for StrictInt/Float * Removed commented out StrictBool subclass testing
Change Summary
Tricks Cython into allowing str subclassing
Related issue number
Fixes #1060
Checklist
changes/<pull request or issue id>-<github username>.mdfile added describing change(see changes/README.md for details)