-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bytes
type validator
#80
Conversation
Codecov Report
@@ Coverage Diff @@
## main #80 +/- ##
==========================================
- Coverage 93.53% 92.15% -1.39%
==========================================
Files 37 35 -2
Lines 3157 2779 -378
Branches 23 21 -2
==========================================
- Hits 2953 2561 -392
- Misses 204 218 +14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking amazing, thank you so much.
Let me know if you have any questions.
src/input/input_abstract.rs
Outdated
@@ -44,4 +44,10 @@ pub trait Input: fmt::Debug + ToPy + ToLocItem { | |||
fn lax_set<'data>(&'data self) -> ValResult<GenericSequence<'data>> { | |||
self.strict_set() | |||
} | |||
|
|||
fn strict_bytes(&self) -> ValResult<Vec<u8>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use a slice here with a lifetime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use a slice here. We're creating a new value when we parse the input, and we need an owner for the new value.
Creating a slice here will be temporary and then it will be dropped immediately.
#[strum(message = "Value must be a valid bytes")] | ||
BytesType, | ||
#[strum(message = "Bytes must have at least {min_length} characters")] | ||
BytesTooShort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per #76, I think we should have one TooShort
error which we us for strings and bytes.
You can either leave this and we'll fix them all in one PR, or do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to choose a generic word like "Input" instead of "Bytes/Stirng"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per #76, I think we should have one
TooShort
error which we us for strings and bytes.You can either leave this and we'll fix them all in one PR, or do it here.
Let's fix them all in one PR. I can get to it after this one.
src/input/input_json.rs
Outdated
fn strict_bytes(&self) -> ValResult<Vec<u8>> { | ||
match self { | ||
JsonInput::String(s) => Ok(s.clone().into_bytes()), | ||
JsonInput::Int(int) => Ok(int.to_ne_bytes().to_vec()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this a lot.
As per this, I think that when to coerce and when to raise an error should follow the following rule:
if there's a (single) obvious representation in the field type AND converting to that looses no information, coerce, otherwise error.
"Single" added now while thinking about this.
I think what you have here is not the same as what we have in pydantic v1. In pydantic v1 12.5
would be converted to b'12.5'
, while with what you have here I think it would be converted to b'@)\x00\x00\x00\x00\x00\x00'
.
This highlights that there's no single obvious representation of an int or float in bytes.
I therefore think we should remove int
and float
automatic conversion.
My plan for next week is to write a long form blog post on my plans for pydantic v2 which should cover all this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stumbled on this issue too while writing this. After giving this a thought, there's really no way to know exactly what value the user wants. I'd imagine arbitrarily choosing coercion will add frustration.
if there's a (single) obvious representation in the field type AND converting to that looses no information, coerce, otherwise error.
This sounds like a good rule of thumb. If there's no single obvious representation, letting the user work around it will remove ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed 👍.
The benchmark doesn't look too good if I understand it correctly:
|
looks like something must be wrong, I'll have a look. |
Ok, solved. You had two problems: First, you weren't using an optimised build of pydantic-core (
But we're still paying a penalty to convert As per my recent work on dates and times see #82 , we can improve performance further by avoiding the conversion completely by using an enum - that's what I've added here. With that the performance becomes:
I've added the I've also cleaned up your benchmarks somewhat, you had a benchmark with just a list with a single element which wasn't doing anything. |
Sounds great. Is there anything else missing for the bytes validator? |
Conflicts, otherwise LGTM. |
thanks so much for this. |
Part of #9
Add
bytes
type validator