Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix field validation on Base64Bytes and Base64Str #9263

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LouisGobert
Copy link
Contributor

@LouisGobert LouisGobert commented Apr 16, 2024

Change Summary

Fix field validation on Base64Bytes and Base64Str.

Related issue number

fix #9251

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @Kludex

@LouisGobert
Copy link
Contributor Author

please review

Copy link

codspeed-hq bot commented Apr 16, 2024

CodSpeed Performance Report

Merging #9263 will not alter performance

Comparing LouisGobert:fix-field-option-on-base64 (d3795cf) with main (ae71183)

Summary

✅ 13 untouched benchmarks

Copy link
Contributor

@ybressler ybressler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some informationally driven Q's for you.

Comment on lines +2347 to +2348
base_schema: Any = handler(source)
del base_schema['type']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to remove the type from the base schema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question!

return core_schema.with_info_after_validator_function(
function=self.decode,
schema=core_schema.bytes_schema(),
schema=core_schema.bytes_schema(**base_schema),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't _convert_schema be more appropriate here, instead of unpacking and recreating, perform the union?

Comment on lines +5511 to +5530
def test_base64_with_valid_min_length() -> None:
class Model(BaseModel):
base64_value: Base64Bytes = Field(min_length=3)

value = b'Zm9v'
m = Model.model_construct(base64_value=value)
assert m.base64_value == value
assert Model.model_json_schema() == {
'properties': {
'base64_value': {
'format': 'base64',
'minLength': 3,
'title': 'Base64 Value',
'type': 'string',
}
},
'required': ['base64_value'],
'title': 'Model',
'type': 'object',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test for non url safe base64 encoding too

@sydney-runkle sydney-runkle added the relnotes-fix Used for bugfixes. label Apr 19, 2024
Copy link
Member

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ybressler left some good questions / feedback here.

I'd be inclined to think that this might have more to do with applying known metadata - you might be able to get a cleaner fix there (just a hunch, haven't confirmed).

@LouisGobert
Copy link
Contributor Author

@sydney-runkle @ybressler Hy,

I removed the type because it should always be bytes, otherwise the data (Base6Str) won't be converted to bytes before decoding, thus raising an error.

I'm not sure if the best solution is to use a different encoder for the EncodedStr class (which avoids the conversion to bytes) or another approach. By forcing the value into bytes in my current solution (by removing the type, as the type comes from the core_schema because otherwise a TypeError is raised since the bytes_schema function does not have a type argument (coming from the unpack)), we ensure that the data must and will be converted to bytes in the first place.

@ybressler
Copy link
Contributor

Thanks for sharing your thought process. It sounds like existing encoder is not correct - that we should be using a diff encoder, is that right? If this is the case, why not make a new class and manipulate the properties there? This will create a much more extendable solution.

@LouisGobert
Copy link
Contributor Author

@ybressler Yes, that's a good idea. But, the problem will still persist.

Here's my analysis:

To have the complete metadata (min, max length, ...), we must necessarily call the handler(source) function from the EncodedBytes.__get_pydantic_core_schema__ function. If we just use core_schema.bytes_schema(), the information will not be complete (as it is currently the case).
However, if we call handler(source) alone, the type may not match the actual type of the expected field. There will be no conversion to this expected type, thus leading to errors.

For example, using just schema=handler(source) in EncodedBytes.__get_pydantic_core_schema__ and running the example test from the docstring:

MyEncodedStr = Annotated[bytes, EncodedStr(encoder=MyEncoder)]

class Model(BaseModel):
    my_encoded_str: MyEncodedStr

>> Value error, Cannot decode data [type=value_error, input_value='**undecodable**', input_type=str]

this will use a type="str" from source instead of bytes.

@sydney-runkle
Copy link
Member

sydney-runkle commented Apr 30, 2024

@LouisGobert,

Hmph, if the encoder is behaving incorrectly, perhaps we need to pivot to fixing the bug in pydantic-core?

Edit: perhaps I misunderstood, after taking a closer look. I'll consult with the team about this issue regarding where a fix would make the most sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Base64Bytes field doesn't raises validation error for min_length constraint
4 participants