Skip to content

bpo-33129: Add kwarg support for dataclass' generated init method #19206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

wanderrful
Copy link

@wanderrful wanderrful commented Mar 28, 2020

Hello there! Thanks for reviewing my PR.

Context

The @dataclass feature shines brightly in the context of web development because we need to have a consistent expectation from our many APIs about the shape of their response JSONs for things like DTOs. In a microservice or monorepo architecture, consistency with our DTOs is a must.

For example, consider an API endpoint that returns a user object such as:

@dataclass
class GetUserResponse:
  id: int
  first_name: str

This now allows us to interact with an API response object as though it were any other class!

def get_user(id: int) -> GetUserResponse:
  response_body: dict = request('get', f'http://foo.bar/api/user/{id}').json()
  return GetUserResponse(**response_body)

This feature is incredibly useful in the context of front-end and back-end web development because we can know what exactly we're working with when we interact with other services and apps, such as a microservice in your company's Kubernetes cluster or perhaps a third party API that handles user authentication and feature permissions.

Use Case

However, there is currently a limitation to this wonderful feature: what if we are only concerned with some subset of the returned data from that API? Or, alternatively, what if the API we are consuming suddenly includes new members in its JSON object (e.g. last_name)?

TypeError: __init__() got an unexpected keyword argument 'last_name'

If we attempt to run our hypothetical software again, we find that because the @dataclass members do not perfectly align with what was promised... suddenly our program crashes, integration tests fail, acceptance tests fail, the microservice is unable to perform in the production environment, people get paged, et cetera.

At time of writing, one workaround we could do is to leverage the field(init=False) feature, like so:

@dataclass
class GetUserResponse:
  ...
  last_name: str = field(init=False)

However, this does not address the pain point because we still have to go back and account for changes in the response object preemptively every time something changes in how microservices or APIs interact with our software, in order to prevent future runtime crashes! It's not realistic to predict these breaking external changes, even if those changes come from another team within the same company.

Wouldn't it be great if a @dataclass could just work with only the information we want, without having to specify every single object member in our response body?

Implementation

This pain point can be healed with just a one line change in dataclasses.py by adding **kwargs to allow for unaccounted data members!

After incorporating this one-line change, we can now use @dataclass in a much more robust fashion. In the above example, we are now able to create a GetUserResponse object that does not save the last_name member. We can still include it in our class specification, but our program will no longer crash at runtime just because of the sudden change in our signature!

Alternatives

If writing this directly into the generated __init__ method is a blocker, we could instead add a layer of indirection with a new @dataclass signature flag (e.g. @dataclass(strict=False)) that will only add **kwargs to the generated __init__ method if the flag is set appropriately. I'll be happy to accomodate that if you guys want to go in that direction instead.

Well, that's my PR. Thanks for reading!

https://bugs.python.org/issue33129

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@wanderrful

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@wanderrful
Copy link
Author

Update! While revising tests, I noticed that there might be an existing contradiction with regard to ClassVar and the replace() method.

In the test, it says that attempting to use replace() on a ClassVar member should itself raise an exception... but in the source code for dataclasses.py we just say continue instead of raising a ValueError. I chose to leave the comment there and just revise the assertion in the test, but I'll be happy to go back and make a different adjustment if you guys need that.

@ericvsmith
Copy link
Member

Is the replace test related to your change?

@wanderrful
Copy link
Author

wanderrful commented Mar 28, 2020

The replace test is one of those unit tests affected by this change, and I noticed it while adjusting the test to accommodate it. Just wanted to call it out while I work on revising the other affected unit tests.

@tirkarthi
Copy link
Member

This functionality seems to be different from the original discussion and PR at #6238 for the issue. The issue is to mark an attribute as keyword only in __init__ but this PR just adds **kwargs to __init__ if I understand this right.

@wanderrful
Copy link
Author

wanderrful commented Mar 29, 2020

Hi there @tirkarthi, yes! I could be misunderstanding it as well.

The pain point I'm addressing is very close to the pain point that the original Python issue describes, so I felt it made sense to use it here rather than add yet another overlapping issue onto the pile. I also suggested, in the "Alternatives" part of my original post, an implementation that would be almost the same as what the original issue specifically called for. If accepted, I think that this PR will resolve the issue indirectly.

Also, as a side note: I signed the CLA form twice and changed my username on the Issues website for the second attempt, but the bot unfortunately hasn't recognized it yet. I'm sure I must have overlooked something.

@ericvsmith
Copy link
Member

So you want to unilaterally discard unknown kwargs?

@wanderrful
Copy link
Author

Sure! Though I would say "irrelevant" rather than "unknown".

We can be more explicit, simple, readable, sparse, and practical with our Dataclasses. We can also have the added benefit of not being affected by unnecessary runtime exceptions when the underlying data model changes unexpectedly (e.g. if another team deploys a new, unrelated feature that adds a new field to the Mongo collection our service interacts with). This is incredibly helpful for large-scale production software, where data collections can contain very many fields, whereas our microservice or app may only be concerned with a small subset of those columns.

If unilaterally doing this is a blocker, this could just as well live behind a flag in the dataclass signature so that developers can leverage this only if they want to, in which case I'd be happy to add and adjust unit tests for it.

@ericvsmith
Copy link
Member

Unilaterally doing it is absolutely a blocker. And I think I'd be opposed to adding a flag. It just doesn't seem like something we want to support. Maybe if you called it "ignore_unexpected_init_params_and_increase_the_likelihood_of_undetectable_bugs_and_typos", I might agree to it. But seriously, it just seems like a bad design.

@wanderrful
Copy link
Author

Understood, thank you for your review!

@wanderrful wanderrful closed this Mar 30, 2020
@ericvsmith
Copy link
Member

You could easily write a helper (perhaps as another decorator) that does this for you. It might have to use the inspect module or dataclasses.fields() to see what the required parameters are and ignore the rest.

@ChrisBarker-NOAA
Copy link
Contributor

I'm confused: I got here following:

https://bugs.python.org/issue33493

But this seems to be a different thing altogether. I was looking for keyword-only parameters:
I would like my users to not have the option of being confused and putting in arguments in the wrong order.

Is that feature discussed elsewhere? I think it would be really handy.

For my use case, I'd like a dataclass global flag, so all of the fields would be keyword only, rather than having to specify it for each field. But others might have other use-cases.

As for this PR:

I've needed a similar thing, I handled it by having a custom "from_py_json" method: (py_json is what I call python that is json-compatible, i.e. what you get from json.load)

In that method It looks only for the fields in dataclass_fields, and then ignores the rest.

I've also got other special code for serializing / deserializing dataclasses, I may publish it as a package some day.

@wanderrful
Copy link
Author

wanderrful commented Apr 2, 2020

Hey @ChrisBarker-NOAA, there was another PR from 2018 by another person that Eric was probably referring to. This PR uses the same ticket number but addresses the underlying issue indirectly, so it's understandable to be confused! If you search the other Closed PRs in this repo, you should be able to see what he was probably referring to.

@ChrisBarker-NOAA
Copy link
Contributor

Thanks: this looks like it: #6238

Which seems to have stalled out.

Oh well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants