-
-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Titles, aliases, and description for SchemaModels #331
Comments
thanks @jdb78! I think this would be a fantastic addition to pandera! I think before delving more into implementation details, I'd like to understand a little bit better what additional context would be helpful. I understand the reasoning behind
Implies something more that I'm not sure how adding a Currently, pandera reports the failure cases that were the reason behind the failed check (in the case that the check returns a boolean vector), e.g. here, and also gives the user access to a dataframe of all the failure cases across all the checks via lazy validation, see here. P.S. #329 adds support for field aliases in via the class-based API |
I am also interested in this enhancement. In my case, I'm looking to pair it with the to and from yaml capabilities of pandera. The extra metadata supporting a title and/or a description will turn the produced yaml files into a full-fledged data dictionary. I would like to read something like the below yaml file (note I also think a title/description to the DataFrame schema as well is a good idea).
I will have capacity to submit a pull request, but before I go about generating one, I wanted to pick your brain on what you were thinking.
As field attribute -
As standalone attributes -
Also, thank you for the time you've put toward pandera! Dustin |
Awesome, thanks @dustindall!
I think all schemas and schema components should have a title and description, so that would be
The implementation should be fairly straightforward since these attributes shouldn't really effect any part of the validation process. The three parts I can think of are:
That would be awesome! It might make sense if you added the Also feel free to ping me here if you need any help re: setting up your dev environment. |
Thanks @dustindall for being willing to submit a PR :) Some pointers for the
Maybe it's just me, but I feel like the distinction between |
agreed. Is there a good reason (besides consistency) to have a AFAIK, a dataframe only has a One solution would be to deprecate |
Hey @tfwillems I saw you submitted #440, let me know if you're willing to make a PR for this :) It should be pretty straight-forward to add these properties to all the relevant objects, since they don't touch the validation parts of the library. |
Let me first say how much I love pandera and the appreciate the effort to maintain and write such a great package!
Is your feature request related to a problem? Please describe.
Errors can be difficult to debug without extra context. Say, a check fails, the only information to the user is that the column failed that particular check.
Now, would it not be better for the user to understand why the column should pass this check? A very simple way to do this is to give more context, i.e. more information about what this column actually contains, where it comes from, etc. This can also help to actually understand why the issue appears and how it can be debugged.
Describe the solution you'd like
pydantic allows titles and descriptions, aliases, etc. in their
Field
object. I wonder if this could be added to pandera. Together with some smart sphinx plugin, you could even turn this information into a full and automatic data dictionary (see https://sphinx-pydantic.readthedocs.io/). Maybe it is possible to make Fields subclasses of the pydantic fields and schemas subclasses of BaseModel (not sure I am considering all challenges)?The text was updated successfully, but these errors were encountered: