Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow InListValidation to ignore missings #19

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RoyalTS
Copy link

@RoyalTS RoyalTS commented Nov 25, 2018

resolves #13

@multimeric
Copy link
Owner

Waiting to see if this is solved by allow_empty, as discussed in #13. Even if you need slightly different behaviour to this, I think this PR should involve tweaking the behaviour of the Column object, just the InListValidation

@multimeric multimeric added the waiting for reply Needs more feedback from someone in the issue before action can be taken label Oct 3, 2019
@deponovo
Copy link

deponovo commented Aug 4, 2020

Hi, is there a chance these modifications are going to be merged? I am also interested in skipping NaNs in my application.

@multimeric
Copy link
Owner

As I said to the original author of this PR, please try allow_empty on the parent Column first, and if that doesn't work, explain why this feature is needed and is different from allow_empty.

@deponovo
Copy link

deponovo commented Aug 4, 2020

You are right. I actually tried allow_empty, but wrongly and that's why I posted the previous question. Just for documentation in case somebody comes across the same issue. Here my wrong configuration:

my_schema = Schema([
    Column("a", [InRangeValidation(min=-1, max=10, allow_empty=True)]),
])

Here the correct one:

my_schema = Schema([
    Column("a", [InRangeValidation(min=-1, max=10)], allow_empty=True),
])

Now it works as per the referred needs.

@Natalie-Caruana
Copy link

Natalie-Caruana commented Nov 3, 2020

Hi, I'm also having some issues with missing values when implementing InListValidation. Test example below:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation
schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('TestAllowEmpty', [InListValidation([0, 1, 2])],allow_empty=True),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])
test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,TestAllowEmpty,Customer ID
Gerald ,Hampton,82,0,2582GABK
Yuuwa,Miyake,270,1,7951WVLW
Edyta,Majewska ,50,2,775ANSID
'''))
test_data.at[0,'TestAllowEmpty'] = pd.NA

errors = schema.validate(test_data)
for error in errors:
    print(error)

I get the following error
AttributeError: Can only use .str accessor with string values!

Since the "TestAllowEmpty" column has missing value, test_data["TestAllowEmpty"].dtypes = dtype('O'), hence neither a categorical dtype nor a numeric dtype. So the validation source code validated = (series.str.len() > 0) & simple_validation is raising an error since entries are not strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for reply Needs more feedback from someone in the issue before action can be taken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore NaN values in validation
4 participants