# 1-Column Filters

In this notebook, we'll show how we can use ``helicast`` to filter columns in a pandas
DataFrame. Everything is implemented in the ``helicast.column_filters`` subpackage.

The classes inherit from both Pydantic (for type checking and safety) and scikit-learn
(for maximum compatibility with ML frameworks)!

In [None]:
import pandas as pd

# Here are all the objects that can be used as scikit-learn transformers onto
# pandas DataFrame! Their main job is to select/remove columns based on some rule :)
from helicast.column_filters import (
    AllSelector,
    DTypeRemover,
    DTypeSelector,
    NameRemover,
    NameSelector,
    RegexRemover,
    RegexSelector,
)

In [None]:
def read_data() -> pd.DataFrame:
    df = pd.read_csv("../data/victoria-daily-electricity.csv")
    df["school_day"] = df["school_day"].astype("category")
    df["holiday"] = df["holiday"].astype("category")
    df["date"] = pd.to_datetime(df["date"])
    df = df.ffill()
    df = df.convert_dtypes(dtype_backend="pyarrow")
    # df["date"] = pd.DatetimeIndex(df["date"])
    return df


df = read_data()
display(df.dtypes)

In [None]:
# Let's select all the columns whose name startswith "demmand".
# The RegEx for that is "^demand"
transform = RegexSelector(patterns="^demand")

# Calling `fit` doesn't do anything, it's just there for API compatibility issues
# --> fit_transform and transform are equivalent methods :)
transform.fit_transform(df)

In [None]:
# Because the column filters are sklearn transformers, they can be visualized as such!
display(transform)

# And all the sklearn magic can happen :)
print(transform.get_params())


# And all the sklearn magic can happen :)
display(transform.set_params(patterns="demand$"))

In [None]:
# You can combine rules!
# Here is an example with the bitwise and operator &
transform = RegexSelector(patterns="day$") & DTypeSelector(dtypes="category")

transform.fit_transform(df)

In [None]:
# You can combine rules!
# Here is an example with the bitwise OR operator |
transform = RegexSelector(patterns="day$") | DTypeSelector(dtypes="number")

transform.fit_transform(df)

In [None]:
# The combinaiton of column filters is a column filter object
(RegexSelector(patterns="day$") | DTypeSelector(dtypes="number"))

In [None]:
# There are some smart logical handling behind the scence (using De Morgan's law)
# Here we have "not (A or B)" which becomes "not A and not B" :)
~(RegexSelector(patterns="day$") | DTypeSelector(dtypes="number"))

## Conclusion and outlook

You can do lots of stuff with those column filters! You can select/remove by
* dtype:
    - ``DTypeSelector``
    - ``DTypeRemover``
* regex:
    - ``RegexSelector``
    - ``RegexRemover``
* name:
    - ``NameSelector``
    - ``NameRemover``
For completeness, there is also a "dummy" filter that selects everything, the 
``AllSelector``.


All those classes inherits from the ``ColumnFilter`` class, which is the base abstract
class.