The databackend
package allows you to register a subclass, without
needing to import the subclass itself. This is useful for implementing
actions over optional dependencies.
For this example, we’ll implement a function, fill_na()
, that fills in
missing values in a DataFrame. It works with DataFrame objects from two
popular libraries: pandas
and polars
. Importantly, neither library
needs to be installed.
The code below defines “abstract” parent classes for each of the DataFrame classes in the two libraries.
from databackend import AbstractBackend
class AbstractPandasFrame(AbstractBackend):
_backends = [("pandas", "DataFrame")]
class AbstractPolarsFrame(AbstractBackend):
_backends = [("polars", "DataFrame")]
Note that the abstract classes can be used as stand-ins for the real
thing in issubclass()
and isinstance
.
from pandas import DataFrame
issubclass(DataFrame, AbstractPandasFrame)
isinstance(DataFrame(), AbstractPandasFrame)
True
📝 Note that you can use
AbstractPandasFrame.register_backend("pandas", "DataFrame")
, as an alternative way to register backends.
The fill_na()
function below uses custom handling for pandas and
polars.
def fill_na(data, x):
if isinstance(data, AbstractPolarsFrame):
return data.fill_nan(x)
elif isinstance(data, AbstractPandasFrame):
return data.fillna(x)
else:
raise NotImplementedError()
Notice that neither pandas
nor polars
need to be imported when
defining fill_na()
.
Here is an example of calling fill_na()
on both kinds of DataFrames.
# test polars ----
import polars as pl
df = pl.DataFrame({"x": [1, 2, None]})
fill_na(df, 3)
# test pandas ----
import pandas as pd
df = pd.DataFrame({"x": [1, 2, None]})
fill_na(df, 3)
x
0 1.0
1 2.0
2 3.0
The key here is that a user could have only pandas, or only polars, installed. Importantly, doing the isinstance checks do not import any libraries!
databackend
shines when combined with generic function
dispatch.
This is a programming approach where you declare a function
(e.g. fill_na()
), and then register each backend specific
implementation on the function.
Python has a built-in function implementing this called
functools.singledispatch
.
Here is an example of the previous fill_na()
function written using
it.
from functools import singledispatch
@singledispatch
def fill_na2(data, x):
raise NotImplementedError(f"No support for class: {type(data)}")
# handle polars ----
@fill_na2.register
def _(data: AbstractPolarsFrame, x):
return data.fill_nan(x)
# handle pandas ----
@fill_na2.register
def _(data: AbstractPandasFrame, x):
return data.fillna(x)
Note two important decorators:
@singledispatch
defines a default function. This gets called if no specific implementations are found.@fill_na2.register
defines specific versions of the function.
Here’s an example of it in action.
# example ----
import pandas as pd
import polars as pl
df = pl.DataFrame({"x": [1, 2, None]})
fill_na2(df, 3)
df = pd.DataFrame({"x": [1, 2, None]})
fill_na2(df, 3)
x
0 1.0
1 2.0
2 3.0
Under the hood, AbstractBackend
behaves similarly to python’s builtin
abc.ABC
class.
from abc import ABC
class MyABC(ABC):
pass
from io import StringIO
MyABC.register(StringIO)
# StringIO is a "virtual subclass" of MyABC
isinstance(StringIO("abc"), MyABC)
True
The key difference is that you can specify the virtual subclass using
the tuple ("<mod_name>", "<class_name>")
.
When issubclass(SomeClass, AbstractBackend)
runs, then…
- The standard ABC caching mechanism is checked, and potentially returns the answer immediately.
- Otherwise, a subclass hook cycles through registered backends.
- The hook runs the subclass check for any backends that are imported
(e.g. are in
sys.modules
).
Technically, AbstractBackend
inherits all the useful metaclass things
from abc.ABCMeta
, so these can be used also.