Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Introduce type-safe constructors for Timestamp and Timedelta. #58475

Open
1 of 3 tasks
randolf-scholz opened this issue Apr 29, 2024 · 2 comments
Open
1 of 3 tasks
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@randolf-scholz
Copy link
Contributor

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The default constructors pd.Timestamp.__new__ and pd.Timedelta.__new__ can return NaT, which is a different type. This can lead to silent type errors, depending on the type-checker used. Consider the following example:

import numpy as np
from pandas import Timedelta, Timestamp

t: Timestamp = Timestamp("2024-04-29T18:00:00")
t2: Timestamp = Timestamp(np.datetime64("nat"))  # actually NaTType!
dt: Timedelta = Timedelta(1, "h")
dt2: Timedelta = Timedelta(np.timedelta64("nat"))  # actually NaTType!

Type-checking results:

  • mypy --strict: No errors (w/ and w/o pandas-stubs)
  • pyright with useLibraryCodeForTypes = true: 4 errors (only formally correct result)
  • pyright with useLibraryCodeForTypes = false: no errors

Feature Description

Introduce new constructors timestamp and timedelta (in analogy to how pyarrow does constructors), which are guaranteed to return pd.Timestamp and pd.Timedelta types, or raise an exception in the case when NaT is encountered.

Alternative Solutions

Split pd.NaT into two different types, Timestamp("NaT") and Timedelta("NaT") (as is the case in numpy), which are instances of the respective types. (#24983)

Additional Context

@randolf-scholz randolf-scholz added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 29, 2024
@randolf-scholz
Copy link
Contributor Author

These constructors can be very simple wrappers, a rough sketch:

def timedelta(value: Any = ..., unit: Optional[str] = None, **kwargs: Any) -> Timedelta:
    """Utility function that ensures that the constructor does not return NaT."""
    td = (
        Timedelta(unit=unit, **kwargs)
        if value is Ellipsis
        else Timedelta(value, unit=unit, **kwargs)
    )
    if isinstance(td, NaTType):
        raise ValueError("Constructor returned NaT")
    return td


def timestamp(value: Any = ..., **kwargs: Any) -> Timestamp:
    """Utility function that ensures that the constructor does not return NaT."""
    ts = Timestamp(**kwargs) if value is Ellipsis else Timestamp(value, **kwargs)
    if isinstance(ts, NaTType):
        raise ValueError("Constructor returned NaT")
    return ts

@jbrockmendel
Copy link
Member

There’s an issue about introducing a separate NaTD specific to Timedelta. If you did that ( and the same for Period), then NaT could become a Timestamp, and you would get type-safety in the constructors without new constructors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants