Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added initvars support to post_init_post_parse #748

Merged
merged 4 commits into from Aug 16, 2019

Conversation

@Raphael-C-Almeida
Copy link
Contributor

Raphael-C-Almeida commented Aug 15, 2019

Change Summary

Currently only __post_init__ has initvars support which is lacking on __post_init_post_parse__, that's seams inconsistent considering the use case of both methods.
My PR only add this functionality along with it's test case.

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.rst file added describing change
    (see changes/README.md for details)
@codecov

This comment has been minimized.

Copy link

codecov bot commented Aug 15, 2019

Codecov Report

Merging #748 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #748   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          15     15           
  Lines        2723   2723           
  Branches      535    536    +1     
=====================================
  Hits         2723   2723
@dmontagu

This comment has been minimized.

Copy link
Collaborator

dmontagu commented Aug 15, 2019

Is there an application you have in mind where you would benefit from having access to the initvars in __post_init_post_parse__? I would think that you typically would not need access to the initvars after parsing, and the __post_init_post_parse__ would just be used to do something with the parsed values.

If that is typical, it seems like requiring the initvars be accepted as arguments to the __post_init_post_parse__ is more likely to pollute the class definition than be useful (it would typically have unused variables, in particular).

But I think this could still make sense as a capability if there is a practical application that is not well-served by alternatives?

@HeavenVolkoff

This comment has been minimized.

Copy link
Contributor

HeavenVolkoff commented Aug 15, 2019

@dmontagu IMHO, I think this behaviour is befitting of pydantic's pre-v0.27 behaviour, on which __post_init__ would only run after validation and you would still have access to InitVars. Considering changes after v0.27 - on which __post_init_post_parse__ was introduced to reestablish a hook for after a Dataclass' initialization and pydantic's validation - I'd argue that adding support for InitVars in __post_init_post_parse__ follows naturally.

Nevertheless, I do agree that this is a breaking change for people already using __post_init_post_parse__ and that it would be a hassle adding unused arguments to it if you only want to use InitVars in __post_init__.

With that in mind, I propose the introduction of a flag in pydantic's Config to enable __post_init_post_parse__ InitVars support. What do you think?

@Raphael-C-Almeida

This comment has been minimized.

Copy link
Contributor Author

Raphael-C-Almeida commented Aug 15, 2019

A valid use case would be to make some data transformations that depend on the data type and the InitVars. Currently, you can do that on post_init, but you would have to manually check the type before making the operations/transformations using the InitVars. This is not optimal, since you would have to manually check something that pydantic will check anyway.

If it were possible to access InitVars on _post_init_post_parse, this manual check would not be required.

It seems fair to me that if there is a need for post_init to receive arguments in order to make some operation/transformation on the data before validation, then post_init_post_parse would also need to receive them for the same purpose.

@dmontagu

This comment has been minimized.

Copy link
Collaborator

dmontagu commented Aug 15, 2019

I personally don’t have a very strong opinion here as I generally preferBaseModel over dataclasses (pydantic or otherwise).

That said, I would find the change a lot more compelling if I saw a realistic-feeling example where I felt like there was a real benefit to having the initvars in the __post_init_post_parse__. The test in this pull request, which admittedly does test the function is working as intended, doesn’t really convey usefulness to me. More broadly, off the top of my head I can’t think of anything you’d want to do with the initvars in the __post_init_post_parse__ that wouldn’t be better handled by a validator.

It wouldn’t take much to convince me (please share the use case if you have one!), but so far it just feels very abstract; I’m not sure it’s worth changing if no one is going to use it anyway.

@dmontagu

This comment has been minimized.

Copy link
Collaborator

dmontagu commented Aug 15, 2019

A valid use case would be to make some data transformations that depend on the data type and the InitVars.

This makes sense to me in theory, but still feels very abstract. Do you have a real example? That might shut me up 😅

@samuelcolvin

This comment has been minimized.

Copy link
Owner

samuelcolvin commented Aug 15, 2019

I definitely don't think the Config option would be a good idea, switches should be avoided and only be added if there's no other solution possible, here it's simple just to ignore init vars if you don't want to use them.

Regarding the feature: I'm in the same boat as @dmontagu, one decent concrete example and I'll accept this.

@Raphael-C-Almeida

This comment has been minimized.

Copy link
Contributor Author

Raphael-C-Almeida commented Aug 15, 2019

Here is what an example of a use case would look like if I had to implement it on the current version of pydantic:

import typing as T 
from pathlib import Path 
from dataclasses import InitVar 
 
from pydantic import ValidationError, validator 
from pydantic.dataclasses import dataclass 
 
@dataclass 
class Archive: 
    path: T.Union[Path, str] 
    base_path: InitVar[T.Optional[T.Union[Path, str]]] = None 
 
    @validator("path") 
    def validate_path(cls, v: T.Any) -> Path: 
        return Path(v) 
 
    def __post_init__(self, base_path): 
        if isinstance(self.path, Path) or isinstance(self.path, str): 
            base_path = Path(base_path) 
            self.path = base_path.joinpath(self.path) 
        else: 
            raise ValidationError( 
                f"unsupported type for path component: {type(self.path)}" 
            ) 
 
post_init = Archive("test.pdf", base_path="/usr/local") 
print(post_init) 

I would have to manually validate the type of the property before doing the necessary operation...

If the PR was accepted, the code would be the following:

import typing as T 
from pathlib import Path 
from dataclasses import InitVar 
 
from pydantic import validator 
from pydantic.dataclasses import dataclass 
 
@dataclass 
class ArchivePostParse: 
    path: T.Union[Path, str] 
    base_path: InitVar[T.Optional[T.Union[Path, str]]] = None 
 
    @validator("path") 
    def validate_path(cls, v: T.Any) -> Path: 
        return Path(v) 
 
    def __post_init_post_parse__(self, base_path): 
        base_path = Path(base_path) 
        self.path = base_path.joinpath(self.path) 
         

post_init_post_parse = ArchivePostParse("test.pdf", base_path="/usr/local") 
print(post_init_post_parse) 
@dmontagu

This comment has been minimized.

Copy link
Collaborator

dmontagu commented Aug 16, 2019

@Raphael-C-Almeida Your example has completely convinced me -- I now fully believe this should be supported in 1.0.

I hadn't actually seen InitVar used before; your example makes it much clearer how they would be used, and also why you would want to use them with the parsed results.

I'm going to add this example as a test to make it clear why this is useful.

@HeavenVolkoff

This comment has been minimized.

Copy link
Contributor

HeavenVolkoff commented Aug 16, 2019

@dmontagu @samuelcolvin Considering @Raphael-C-Almeida 's example (a good one by the way), I think adding validation to InitVars should also be included in pydantic. This would render extra validations for InitVars in __post_init_post_parse__ unnecessary.

David Montague
@dmontagu

This comment has been minimized.

Copy link
Collaborator

dmontagu commented Aug 16, 2019

@HeavenVolkoff I think that would involve a relatively large refactor of pydantic.dataclasses._process_class since:

  1. You'd have to use one model to perform the parsing of all data (including the InitVars), and a separate model as the __pydantic_model__ underlying the dataclass (since you presumably wouldn't want the InitVar-fields to be fields on the underlying model).
    • This would also likely add an ~2x overhead to parsing for models with InitVars.
  2. You'd have to write something new to read the fields off the model, since that is currently being done by dataclasses.fields.
    • I read through the dataclasses module, and it looks like the process of determining which fields are InitVar fields is pretty hairy, and involves a lot of private APIs. (But maybe there is a public API for reading these off the dataclass?)
  3. Users may want to be able to annotate the InitVar type, but not parse it before it is passed to __post_init_post_parse__ (e.g. if it is not a type pydantic knows how to parse, which requires a config setting or will cause errors). I think supporting both ways would be problematic for a variety of reasons (too many levers), but given the potential limitations it would impose, it might be better to just leave it unparsed.

I don't think this is an unreasonable request, but this seems like a rather niche use case, could require a substantial amount of effort, and I'd guess the result would be substantially more fragile than the current implementation.

I'd review a PR for it, but either way it should probably be separate from this one.

@HeavenVolkoff

This comment has been minimized.

Copy link
Contributor

HeavenVolkoff commented Aug 16, 2019

@dmontagu I agree, this is a more complex request and you raised some valid points. I'll do a more in depth research on this matter to see if I can come up with a valid implementation to solve this in pydantic then open a new PR.

@samuelcolvin samuelcolvin merged commit 5b5451c into samuelcolvin:master Aug 16, 2019
10 checks passed
10 checks passed
Header rules No header rules processed
Details
Pages changed 3 new files uploaded
Details
Redirect rules No redirect rules processed
Details
Mixed content No mixed content detected
Details
codecov/project 100% (+0%) compared to 321cde0
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details
samuelcolvin.pydantic Build #20190816.2 succeeded
Details
samuelcolvin.pydantic (Job Python36) Job Python36 succeeded
Details
samuelcolvin.pydantic (Job Python37) Job Python37 succeeded
Details
@samuelcolvin

This comment has been minimized.

Copy link
Owner

samuelcolvin commented Aug 16, 2019

great example, makes lots of sense, I've accepted the change.

Regarding validating InitVars, I think it's unnecessarily and as @dmontagu says, lot's of work.

If you really wanted to validate InitVars, you could build another model/dataclass of those attributes, validate the data then pass that to your main model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.