Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct across multiple columns #38

Closed
Maarten-vd-Sande opened this issue Sep 21, 2020 · 7 comments
Closed

Distinct across multiple columns #38

Maarten-vd-Sande opened this issue Sep 21, 2020 · 7 comments

Comments

@Maarten-vd-Sande
Copy link

Maarten-vd-Sande commented Sep 21, 2020

I am not sure if this is supported currently, or how to implement it as custom validator.

I want distinct values across two columns, so that this is okay:

sample    value
1         2
2         2

But this is not:

sample    value
1         2
1         2
@multimeric
Copy link
Owner

multimeric commented Sep 22, 2020

Unfortunately not, the design of pandas_schema 0.X.X is such that every validation is on a per-column basis. This will be fixed in 1.X.X, and indeed I have a demonstration of this behaviour here: https://github.com/TMiguelT/PandasSchema/blob/9452513fbd2f58acc6ca8c3ff94062b07f3f7ffd/test/test_df_validations.py#L50-L61.

But who knows when that will be released., because it's been hard to find the time to finish it.

@Maarten-vd-Sande
Copy link
Author

Maarten-vd-Sande commented Sep 23, 2020

Great this is already in the works! But then I already have a feature request for it to work on a subset of columns 😇 . Seems like the current implementation does not support this right?

If you want (and if I have time, not too soon), I could start a PR for this

@multimeric
Copy link
Owner

The current (and future) releases support this: https://tmiguelt.github.io/PandasSchema/#pandas_schema.schema.Schema.validate

@Maarten-vd-Sande
Copy link
Author

Ah I see. Just to be sure I understand: I then would use DistinctRowValidation and use validate on a list of columns?

Feel free to close the issue (either now or with the new release). Thanks for all your help 👍

@multimeric
Copy link
Owner

Right, so you want unique rows but across a subset of the columns. I don't think you can currently do that in the future release but I'll look into it.

@multimeric
Copy link
Owner

I'll keep the issue open since it's still not solved in a release.

@multimeric
Copy link
Owner

Closing in favour of the more general #57 that I just opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants