Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaT issue #55

Open
Belightar opened this issue Mar 25, 2021 · 5 comments
Open

NaT issue #55

Belightar opened this issue Mar 25, 2021 · 5 comments

Comments

@Belightar
Copy link

Greetings, @shawnbrown

to be short,

my pd.Series is like:
Date
0 NaT
1 NaT
2 NaT
3 2010-12-31
4 2010-12-31
Name: Date, dtype: datetime64[ns]
the type of NaT is:
<class 'pandas._libs.tslibs.nattype.NaTType'>
when I use the following code:

with accepted(Extra(pd.NaT)):
validate(data, requirement)

I found that it the NaTs can not be recognized. I tried many types of Extra and tried using function but all faild.

here I need your help. Thanks for your work.

@shawnbrown
Copy link
Owner

Hello--thanks for filing this issue. I'd like to replicate your problem as accurately as I can before I start addressing the issue.

I have some sample code below but I'm not sure what you're using as the requirement:

from datetime import datetime
import pandas as pd
from datatest import validate

data = pd.Series([
    None,
    None,
    None,
    datetime(2010, 12, 31),
    datetime(2010, 12, 31),
])

requirement = ???  # <- What is this?
validate(data, requirement)

Can you tell me what your requirement value is?

@Belightar
Copy link
Author

Belightar commented Mar 29, 2021

Thanks for you reply.

from datetime import datetime, timedelta
import pandas as pd
from datatest import validate, accepted, Extra

data = pd.Series([
    None,
    None,
    None,
    datetime(2010, 12, 31),
    datetime(2010, 12, 31),
])

Today = datetime.today()
Tomorrow = Today + timedelta(days=1)

def date_requirement(var_datetime):
    return pd.Timestamp(year=2000, month=1, day=1) < var_datetime < \
            pd.Timestamp(year=Tomorrow.year, month=Tomorrow.month, day=Tomorrow.day)

with accepted(Extra(pd.NaT)):
    validate(data, date_requirement)

Here I want to accept the NaT type data. I tried pd.NaT, np.datetime64('NaT'), or NanToken method mentioned in the document and the results are the same:

datatest.ValidationError: does not satisfy date_requirement() (3 differences): [
    Invalid(numpy.datetime64('NaT')),
    Invalid(numpy.datetime64('NaT')),
    Invalid(numpy.datetime64('NaT')),
]

@shawnbrown
Copy link
Owner

Ah, OK. As a stopgap, you can use the accepted.args() method together with the pd.isna() function:

...

with accepted.args(pd.isna):
    validate(data, date_requirement)

The accepted.args() method accepts differences whose args satisfy a given predicate. And by using pd.isna() as the predicate, you can accept differences that contain NaT, NaN, or other "missing value" objects.

For a longer term solution, I want to bring the handling of these NaT values inline with how datatest handles other NaN values (as documented here). I will follow up when I have addressed this issue more thoroughly.

@Belightar
Copy link
Author

Belightar commented Mar 29, 2021

Thank you so much.
Your code works well in my project.
And yes, I also used pd.isna to judge whether it is pd.NaT or not. (Is this the only way?) I simply droped those rows then do the datatest.
I've used python and programed for 3 years and haven't realized there're differences among bool, np.bool_ or pd.NaT, pd.Nan, np.nan, nan before.
I've learnt alot from your work, and thanks for your patience again.

@shawnbrown
Copy link
Owner

shawnbrown commented Mar 29, 2021

I'm glad you found it helpful. I noticed that your date_requirement() function is checking for an interval. If it suits your needs, you could also use the validate.interval() method:

...

begin_date = pd.Timestamp(year=2000, month=1, day=1)
tomorrow = pd.Timestamp(datetime.today() + timedelta(days=1))

with accepted.args(pd.isna):
    validate.interval(data, begin_date, tomorrow)

One difference with this approach is that time differences trigger Deviation objects that contain a timedelta. There are some how-to documents for date handling that you mignt find helpful as well:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants