Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slots to dataclass schema #617

Merged
merged 14 commits into from
May 25, 2023
Merged

Add slots to dataclass schema #617

merged 14 commits into from
May 25, 2023

Conversation

hramezani
Copy link
Member

@hramezani hramezani commented May 19, 2023

This is for fixing the dataclass slots problem reported in pydantic/pydantic#5797

I've created this PR based on @samuelcolvin suggestion

Selected Reviewer: @dmontagu

@codspeed-hq
Copy link

codspeed-hq bot commented May 19, 2023

CodSpeed Performance Report

Merging #617 dataclass_slots (559e0c6) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 120 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

@adriangb adriangb force-pushed the dataclass_slots branch 2 times, most recently from 7e79af7 to cfd877f Compare May 19, 2023 17:47
Comment on lines 1196 to 1198
@pytest.mark.skipif(sys.version_info < (3, 10), reason='slots are only supported for dataclasses in Python > 3.10')
def test_slots() -> None:
kwargs = {'slots': True}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hramezani please make this test pass. We should also add a test for a function validator stuck in between the DataclassSchema and DataclassArgsSchema for both before and after variants. See for example test_dataclass_field_before_validator.

@adriangb
Copy link
Member

Approving as long as you get those tests passing 😀

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM in principle. Let me know if you need help with failing test.

src/validators/dataclass.rs Outdated Show resolved Hide resolved
src/validators/dataclass.rs Outdated Show resolved Hide resolved
src/validators/dataclass.rs Outdated Show resolved Hide resolved

fn is_exact_instance(&self, class: &PyType) -> bool {
self.get_type().is(class)
fn input_is_instance(&self, class: &PyType) -> Option<&PyAny> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but given this returns an Option<&PyAny> rather than a bool the name seems a bit weird to me (I would expect it to return a bool since the name sounds like an assertion). I'd suggest downcast or similar, but also fine keeping as is if you prefer.

src/serializers/shared.rs Outdated Show resolved Hide resolved
@@ -441,6 +442,17 @@ impl BuildValidator for DataclassValidator {
None
};

let slots = match schema.get_as::<&PyList>(intern!(py, "slots"))? {
Copy link
Collaborator

@dmontagu dmontagu May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we implicitly making the assumption that dataclasses are all-slots or no-slots-at-all? (Just want to make sure things work correctly since we just confirmed that's not the case, since you can add non-slots fields in subclasses of slots dataclasses.)

Seems to me if you always take the __dict__ (if present) and grab the __slots__, things should work, but of course the devil is in the implementation details.

let new_dict = dict.copy()?;
let new_dict = if let Some(ref slots) = self.slots {
let slots_dict = PyDict::new(py);
for slot in slots {
Copy link
Collaborator

@dmontagu dmontagu May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it looks like this implicitly assumes everything is a slot, I think if you start by trying to grab the __dict__ and ignoring if it fails, then grabbing the slots one by one, it should be good. Alternatively you might store a bool on the struct (or in a once_cell or whatever) indicating whether there are any non-slots fields in the schema; I guess that's probably the most performant way to modify this. (Short of trying to do something fancy based on class layout when slots exist..)

@@ -595,7 +624,14 @@ impl DataclassValidator {
input: &'data impl Input<'data>,
) -> ValResult<'data, ()> {
let (dc_dict, post_init_kwargs): (&PyAny, &PyAny) = val_output.extract(py)?;
force_setattr(py, dc, intern!(py, "__dict__"), dc_dict)?;
if self.slots.is_some() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue here where you probably want to know which fields correspond to slots and which don't and handle appropriately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a reasonable world where we tell people "if you use slots in your pydantic dataclasses, then all subclasses must be slots dataclasses, and all fields must have slots". But if we do that, then I think we need to tell people up front that their type has a problem, rather than delaying the issue to runtime validation, or worse, silently not fully initializing the objects.

But I think it would probably be preferable if we just handled mixed slots/__dict__ dataclasses properly automatically and didn't force handling on users

@@ -3132,6 +3134,7 @@ def dataclass_schema(
metadata: Any other information you want to include with the schema, not used by pydantic-core
serialization: Custom serialization schema
frozen: Whether the dataclass is frozen
slots: The slots to use for the dataclass, set only if `slots=True` on the dataclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
slots: The slots to use for the dataclass, set only if `slots=True` on the dataclass
slots: The slots to use for the dataclass, set only if `slots=True` on the dataclass or one of its bases

Copy link
Collaborator

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed what I think is one more issue, related to validate_assignment when the dataclass has mixed slots and non-slots fields.

Also, sanity check — are we still using the slots field of the schema in a meaningful way? It seems that, other than the validate_assignment thing, we're just doing getattrs on all the fields. So maybe slots doesn't need to be tracked now? I'm not sure if that's right though. (Probably a good idea to track it for the sake of future performance improvements though.)

dmontagu

This comment was marked as duplicate.

@samuelcolvin
Copy link
Member

I think this is ready to please review.

@samuelcolvin
Copy link
Member

I'm going to merge this as I need it to work on something else. But please free free to give more feedback.

@samuelcolvin samuelcolvin merged commit 26fa27d into main May 25, 2023
@samuelcolvin samuelcolvin deleted the dataclass_slots branch May 25, 2023 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants