New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix field validation on Base64Bytes and Base64Str #9263
base: main
Are you sure you want to change the base?
Conversation
please review |
CodSpeed Performance ReportMerging #9263 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some informationally driven Q's for you.
base_schema: Any = handler(source) | ||
del base_schema['type'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to remove the type from the base schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question!
return core_schema.with_info_after_validator_function( | ||
function=self.decode, | ||
schema=core_schema.bytes_schema(), | ||
schema=core_schema.bytes_schema(**base_schema), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't _convert_schema
be more appropriate here, instead of unpacking and recreating, perform the union?
def test_base64_with_valid_min_length() -> None: | ||
class Model(BaseModel): | ||
base64_value: Base64Bytes = Field(min_length=3) | ||
|
||
value = b'Zm9v' | ||
m = Model.model_construct(base64_value=value) | ||
assert m.base64_value == value | ||
assert Model.model_json_schema() == { | ||
'properties': { | ||
'base64_value': { | ||
'format': 'base64', | ||
'minLength': 3, | ||
'title': 'Base64 Value', | ||
'type': 'string', | ||
} | ||
}, | ||
'required': ['base64_value'], | ||
'title': 'Model', | ||
'type': 'object', | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a test for non url safe base64 encoding too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ybressler left some good questions / feedback here.
I'd be inclined to think that this might have more to do with applying known metadata - you might be able to get a cleaner fix there (just a hunch, haven't confirmed).
I removed the type because it should always be bytes, otherwise the data (Base6Str) won't be converted to bytes before decoding, thus raising an error. I'm not sure if the best solution is to use a different encoder for the EncodedStr class (which avoids the conversion to bytes) or another approach. By forcing the value into bytes in my current solution (by removing the type, as the type comes from the core_schema because otherwise a TypeError is raised since the bytes_schema function does not have a type argument (coming from the unpack)), we ensure that the data must and will be converted to bytes in the first place. |
Thanks for sharing your thought process. It sounds like existing encoder is not correct - that we should be using a diff encoder, is that right? If this is the case, why not make a new class and manipulate the properties there? This will create a much more extendable solution. |
@ybressler Yes, that's a good idea. But, the problem will still persist. Here's my analysis:To have the complete metadata (min, max length, ...), we must necessarily call the For example, using just MyEncodedStr = Annotated[bytes, EncodedStr(encoder=MyEncoder)]
class Model(BaseModel):
my_encoded_str: MyEncodedStr
>> Value error, Cannot decode data [type=value_error, input_value='**undecodable**', input_type=str] this will use a type="str" from source instead of bytes. |
Hmph, if the encoder is behaving incorrectly, perhaps we need to pivot to fixing the bug in Edit: perhaps I misunderstood, after taking a closer look. I'll consult with the team about this issue regarding where a fix would make the most sense. |
Change Summary
Fix field validation on Base64Bytes and Base64Str.
Related issue number
fix #9251
Checklist
Selected Reviewer: @Kludex