-
-
Notifications
You must be signed in to change notification settings - Fork 32.6k
bpo-33129: Add kwarg support for dataclass' generated init method #19206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
Update! While revising tests, I noticed that there might be an existing contradiction with regard to In the test, it says that attempting to use |
Is the |
The |
This functionality seems to be different from the original discussion and PR at #6238 for the issue. The issue is to mark an attribute as keyword only in |
Hi there @tirkarthi, yes! I could be misunderstanding it as well. The pain point I'm addressing is very close to the pain point that the original Python issue describes, so I felt it made sense to use it here rather than add yet another overlapping issue onto the pile. I also suggested, in the "Alternatives" part of my original post, an implementation that would be almost the same as what the original issue specifically called for. If accepted, I think that this PR will resolve the issue indirectly. Also, as a side note: I signed the CLA form twice and changed my username on the Issues website for the second attempt, but the bot unfortunately hasn't recognized it yet. I'm sure I must have overlooked something. |
So you want to unilaterally discard unknown kwargs? |
Sure! Though I would say "irrelevant" rather than "unknown". We can be more explicit, simple, readable, sparse, and practical with our Dataclasses. We can also have the added benefit of not being affected by unnecessary runtime exceptions when the underlying data model changes unexpectedly (e.g. if another team deploys a new, unrelated feature that adds a new field to the Mongo collection our service interacts with). This is incredibly helpful for large-scale production software, where data collections can contain very many fields, whereas our microservice or app may only be concerned with a small subset of those columns. If unilaterally doing this is a blocker, this could just as well live behind a flag in the |
Unilaterally doing it is absolutely a blocker. And I think I'd be opposed to adding a flag. It just doesn't seem like something we want to support. Maybe if you called it "ignore_unexpected_init_params_and_increase_the_likelihood_of_undetectable_bugs_and_typos", I might agree to it. But seriously, it just seems like a bad design. |
Understood, thank you for your review! |
You could easily write a helper (perhaps as another decorator) that does this for you. It might have to use the |
I'm confused: I got here following: https://bugs.python.org/issue33493 But this seems to be a different thing altogether. I was looking for keyword-only parameters: Is that feature discussed elsewhere? I think it would be really handy. For my use case, I'd like a dataclass global flag, so all of the fields would be keyword only, rather than having to specify it for each field. But others might have other use-cases. As for this PR: I've needed a similar thing, I handled it by having a custom "from_py_json" method: (py_json is what I call python that is json-compatible, i.e. what you get from json.load) In that method It looks only for the fields in dataclass_fields, and then ignores the rest. I've also got other special code for serializing / deserializing dataclasses, I may publish it as a package some day. |
Hey @ChrisBarker-NOAA, there was another PR from 2018 by another person that Eric was probably referring to. This PR uses the same ticket number but addresses the underlying issue indirectly, so it's understandable to be confused! If you search the other Closed PRs in this repo, you should be able to see what he was probably referring to. |
Thanks: this looks like it: #6238 Which seems to have stalled out. Oh well. |
Hello there! Thanks for reviewing my PR.
Context
The
@dataclass
feature shines brightly in the context of web development because we need to have a consistent expectation from our many APIs about the shape of their response JSONs for things like DTOs. In a microservice or monorepo architecture, consistency with our DTOs is a must.For example, consider an API endpoint that returns a user object such as:
This now allows us to interact with an API response object as though it were any other class!
This feature is incredibly useful in the context of front-end and back-end web development because we can know what exactly we're working with when we interact with other services and apps, such as a microservice in your company's Kubernetes cluster or perhaps a third party API that handles user authentication and feature permissions.
Use Case
However, there is currently a limitation to this wonderful feature: what if we are only concerned with some subset of the returned data from that API? Or, alternatively, what if the API we are consuming suddenly includes new members in its JSON object (e.g.
last_name
)?If we attempt to run our hypothetical software again, we find that because the
@dataclass
members do not perfectly align with what was promised... suddenly our program crashes, integration tests fail, acceptance tests fail, the microservice is unable to perform in the production environment, people get paged, et cetera.At time of writing, one workaround we could do is to leverage the
field(init=False)
feature, like so:However, this does not address the pain point because we still have to go back and account for changes in the response object preemptively every time something changes in how microservices or APIs interact with our software, in order to prevent future runtime crashes! It's not realistic to predict these breaking external changes, even if those changes come from another team within the same company.
Wouldn't it be great if a
@dataclass
could just work with only the information we want, without having to specify every single object member in our response body?Implementation
This pain point can be healed with just a one line change in
dataclasses.py
by adding**kwargs
to allow for unaccounted data members!After incorporating this one-line change, we can now use
@dataclass
in a much more robust fashion. In the above example, we are now able to create aGetUserResponse
object that does not save thelast_name
member. We can still include it in our class specification, but our program will no longer crash at runtime just because of the sudden change in our signature!Alternatives
If writing this directly into the generated
__init__
method is a blocker, we could instead add a layer of indirection with a new@dataclass
signature flag (e.g.@dataclass(strict=False)
) that will only add**kwargs
to the generated__init__
method if the flag is set appropriately. I'll be happy to accomodate that if you guys want to go in that direction instead.Well, that's my PR. Thanks for reading!
https://bugs.python.org/issue33129