Skip to content

ENH: Compatibility between isin() methods and python-holidays #46123

@alexmalins

Description

@alexmalins

Is your feature request related to a problem?

Pandas DataFrame & Series isin() methods are currently incompatible with python-holidays objects. Holidays objects are dict-like objects for determining whether a specific date is a holiday or not. There are two issues I've found:

  • pd.DataFrame.isin() overwrites dict-like objects with collections.defaultdict(), which disregards the designed behaviour of some objects that inherit from dict such as holidays objects.

  • pd.Series.isin() ends up casting holidays-like objects to a list. Holidays objects can dynamically build holidays on the fly, but the type casting makes this fail.

See code examples at bottom showing the failures.

Describe the solution you'd like

Ideally the isin() methods would support holidays objects. This means pd.DataFrame.isin() should treat holidays objects like as if a list (iterable) had been passed because the keys of holidays objects store the actual holiday dates. This is different from the current behaviour in treating it like a normal dict and trying to match key:value combinations to column:field value in the DataFrame.

The pd.Series.isin() method should support the ability of holidays objects to dynamically check if a date is a holiday on the fly.

API breaking implications

If the solution adds isinstance(values, holidayobject) checks to the isin() methods, I guess this won't change the existing API for non-holidays objects. However this might require holidays becomes a new dependency of pandas... which is maybe a problem?!

Describe alternatives you've considered

A hacky fix is to always pre-populate the list of holidays into holiday objects via the years=... parameter in their constructor, then supply list(holiday_object) as the parameter to the DataFrame.isin() method. This makes things work.

However it is not at all obvious for users that this is necessary, and on-the-fly checking of holiday dates won't work this way, and more importantly, if users don't realize this pandas will silently give incorrect results if a holidays object is supplied to isin() directly.

Additional context

Incorrect behaviour:

>>> import holidays
>>> import pandas as pd

# Series.isin() example failure - on-the-fly checking if a date is a holiday won't work
>>> s = pd.Series([pd.Timestamp("2022-12-25")])
>>> uk = holidays.UK()
>>> s.isin(uk)
0    False
dtype: bool

# DataFrame.isin() always fails because it treats the holidays object like a dict
# when it should be treated like a list (iterable)
>>> uk = holidays.UK(years=2022)
>>> df = pd.DataFrame([pd.Timestamp("2022-12-25")])
>>> df.isin(uk)
       0
0  False

Hacky fixes:

>>> uk = holidays.UK(years=2022)
>>> s.isin(uk)
0    True
dtype: bool

>>> df.isin(list(uk))
       0
0  True

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions