## Dispatching

In its simplest form, dispatching involves overloading methods based on their arguments.  Implementations that do this based on argument types like [functools.singledispatch](https://docs.python.org/3/library/functools.html) and [multipledispatch](https://github.com/mrocklin/multipledispatch) make this straightforward.

This approach is problematic with duck array types used in functions that take multiple arguments though. A simple example here shows how \_\_array_ufunc__ and \_\_array_function__ make interoperability between backends possible to an extent, but the backend that ultimately handles an operation is arbitrary (it's the first one that doesn't throw NotImplemented) so the result type will often vary based on the argument order, as does the success of the operation:

```python
import dask.array as da
import sparse
a1 = da.array([1, 0])
a2 = sparse.COO.from_numpy(np.array([1, 0]))

# dask is better at handling arrays from other backends dispatched to ufunc 
# while sparse only supports sparse or numpy arrays (the first case here
# works because dask is applying the function on numpy array chunks)
type(a1 + a2) # -> dask.array.core.Array
type(a2 + a1) # -> operand type(s) all returned NotImplemented

# similarly:
a3 = a1.map_blocks(sparse.COO.from_numpy)
type(np.stack([a1, a3])) # -> dask.array.core.Array
type(np.stack([a3, a1])) # -> All arrays must be instances of SparseArray.
type(np.stack([a1, a2])) # -> All arrays must be instances of SparseArray.
```

This means that code using multipledispatch would need to have overloads that match to each combination of backend types and perform coercion where necessary, likely targeting one of the backends present in the arguments.  Some drawbacks to this are that defining overloads for more than 2 arguments or lists of arrays is hard and that the "target" backend is implicit in the implementation rather than controlled by the user.

A solution to this proposed in [unumpy](https://github.com/Quansight-Labs/unumpy) (via [uarray](https://github.com/Quansight-Labs/uarray)) is to make the target backend for any numpy functions explicit, as well as provide hooks for coercion of arguments to that backend. This is a good solution for much of the simpler functionality in a genetics toolkit but poses an issue for any more complex domain-specific algorithms in that it would be perfectly reasonable to expect that more than one array backend will be useful for doing things efficiently.  From this perspective, the target implementations become something more like "algorithms" than "backends" and they should be free to make use of whatever array backends are most beneficial for a particular step (and coercion of arguments is simple when the target backend is clear).  An example would be kinship estimation via CuPy followed by maximal independent set selection using a sparse array backend for relatedness pruning.

A further consideration is that dispatch may be something we want to eventually automate, rather than forcing our users to always think about it (e.g. an array with 1% sparsity should not go to an implementation that relies on a sparse backend, tiny arrays should not go to a chunked backend, big dask arrays should not be force into a numpy backend, etc.).

This prototype shows a small framework that is based to a degree on uarray and is centered around doing dispatching with all of the following in one place:

- **User preferences**: These always get highest priority in choosing an implementation
- **Configuration**: This can be scoped with a context manager to fix implementations for a small block of code
- **Arguments**: The arrays themselves for type, shape, and content analysis

Both uarray and multipledispatch are degenerate cases for this.

In [138]:
import sys
sys.path.append(".")
from lib import api
xr.set_options(display_style='html');

The abstraction consists of a "Fronted", a "Backend" and an API of stub functions.

An example Frontend would look like this where `Frontend` is an internal implementation used by the `MyFrontend` class.  A class like `MyFrontend` would exist for every major piece of functionality in the library that benefits from moderately complex dispatching.

In [None]:
# This base class is defined once somewhere
class Frontend:
    
    def __init__(self, config: Configuration = None):
        self.config = config
        self.backends = dict()
        
    def register(self, backend: Backend) -> None:
        # Frontends need to be aware of backends in order to choose intelligently between them
        if backend.domain != self.domain:
            raise ValueError('Backend with domain {backend.domain} not compatible with frontend domain {self.domain}')
        self.backends[backend.id] = backend

    def resolve(self, fn: Callable, *args, **kwargs) -> Backend:
        # Choose a backend to dispatch to based on as much information as possible:
        
        # First look for overrides in arguments passed to the function
        backend_id = kwargs.get('backend')
        
        # Next look for overrides in configuration
        backend_id = backend_id or self.config.get(str(self.domain.append('backend')))
        
        # Check to see what if any backends have the required packages installed
        backend = [be for be in self.backends.values() if is_compatible(be)]
        if backend is None:
            raise ValueError(f'No suitable backend found for function {fn.__name__} (domain = {self.domain})')
            
        # ** Analyze fn/args/kwargs here, in the future **
        
        return backend

    def dispatch(self, fn: Callable, *args, **kwargs):
        self.resolve(fn, *args, **kwargs).dispatch(fn, *args, **kwargs)
        
    def add(self, fn: Callable):
        # Wrap a function to be dispatched and preserve docs
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            return self.dispatch(fn, *args, **kwargs)
        return wrapper
    
# This is defined somewhere close to where the API to dispatch over lives (at least within the same package)
class MyFrontend(Frontend):
    domain = 'genetics.method'
    
frontend = MyFrontend()

The actual API methods are simply stubs somewhere that have access to the frontend created above:

In [None]:
class API:
    
    @frontend.add
    def ld_prune(ds: Dataset) -> Dataset:
        """ All documentation goes here """
        pass

The backends should be separated from the frontend to make it as easy as possible to isolate imports for their optional dependencies.  They do however need to know which frontend they are associated with and register themselves to it:

In [None]:
class AdvancedBackend(Backend):
    
    domain = 'genetics.method'
    id = 'advanced'
    
    def ld_prune(ds: Dataset) -> Dataset:
        # A potential mixed-backend workload:
        # - Do pairwise calcs on GPU with CuPy
        # - Do variant selection for those in high LD based on sparse arrays
        # - Return selected indexes as Dask array with dense numpy chunks
        ...
        
    def requirements() -> Sequence[Requirements]:
        return [
            # Packages are obvious requirments but this could also eventually include system resource constraints
            Requirement('cupy', minimal_version='1.0'),
            Requirement('sparse', minimal_version='0.5'),
            Requirement('dask') # any version
        ]
    
class SimpleBackend(Backend):
    
    domain = 'genetics.method'
    id = 'simple'
    
    def ld_prune(ds: Dataset) -> Dataset:
        # Do everything assuming numpy
        ...

# As long as this step is isolated to the module the backend is defined in, it is 
# easy to make sure that nothing is imported that is not installed:
frontend.register(AdvancedBackend())
frontend.register(SimpleBackend())

Usage then looks like this:

In [None]:
import api

ds: Dataset = ...
    
# Choose the best backend automatically based on environment, config, and arguments
# * To start, this is just an arbitrary choice based on which backends have installed dependencies
api.ld_prune(ds, backend='auto') 


# Choose the backend explicitly
api.ld_prune(ds, backend='simple')

# OR explicitly within a block
with api.config.context('genetics.method.backend', 'simple'):    
    api.ld_prune(ds)
    
# OR explicitly globally
api.config.set('genetics.method.backend', 'simple')
api.ld_prune(ds)