Skip to content

Using soft dependencies

Levi John Wolf edited this page Jul 16, 2016 · 1 revision

Often, developers are faced with tradeoffs between wanting to use the best new code in the various packages in the scientific python ecosystem and asking their users to install those packages. Many packages square this apparent circle by using implicit or soft dependencies:

A soft dependency exists when a piece of code that

  1. depends on a package that is not in a project's dependency description and
  2. does not fail on ImportError when the dependency is not in the environment.

In what follows, I'll show a function I've already made for PySAL, requires, that can be used to enable safe soft-dependency programming at multiple levels, and attempt to illustrate best practices for soft dependencies for the PySAL project.

Two explicit implementations, no decorator

First, let's consider the case where we have one reference implementation and one enhanced implementation of some basic unit of code. The enhanced implementation should be a totally separate piece of the original implementation. Think of this as useful when you have a working standalone cython or numba implementation of all or part of some code, or if you're using scikit for something.

Here, I'll start talking about functions, but similar logic can be used with classes.

We can characterize some code as more soft or less soft depending on how the function avoids failing the import. For instance, assume the project expects numpy and the standard library module warnings, but does not expect scipy. Further, let's assume that we can implement the function we want without our soft dependency, but would prefer that our soft dependency be used if available.

import numpy as np
import warnings as wn
np.random.seed(12345)
M = np.random.normal(0,1,size=(100,100))
M = M.T @ M

First, let's consider a reasonably simple log determinant for the positive semidefinite matrix, M:

def log_determinant(a):
    sgn, ld = np.linalg.slogdet(a)
    return sgn * ld

"Conjoined" implementation

If we wanted to use lu decomposition from scipy (which may be faster than slogdet for some matrices), we could write the following function:

def log_determinant(a):
    """ compute log determinant using either scipy LU or numpy slogdet"""
    try:
        import scipy as s
        LU, P = s.linalg.lu_factor(a)
        answer= np.log(np.abs(LU.diagonal())).sum()
    except ImportError:
        wn.warn('No scipy, using numpy instead.')
        sgn, ld = np.linalg.slogdet(a)
        answer= sgn*ld
    return answer

This conjoined implementation embeds the exception-handling logic of the import alongside of the function itself. This pattern:

  • Pays the compilation cost of import scipy exactly once when the function is first run
  • Has a try/except cost every time when run, if scipy is unavailable
  • Warns the user what they're getting, if it's not what we might hope it should be.

I think this pattern is exceptionally risky for a few reasons.

  • If the return type of one implementation changes, the results can be unpredictable to troubleshoot.
  • At run time, there's no way to introspect the function and determine what implementation is being run

"Contained" implementation

We can do better.

To do so, let's "fuse" the binding at its initial import. This way, we pick the best possible implementation at the outset. We can do this by making the try/except block contain the two implementations:

try:
    import scipy
    def log_determinant(a):
        """ compute log determinant using scipy LU"""
        LU, P = s.linalg.lu_factor(a)
        return np.log(np.abs(LU.diagonal())).sum()
except ImportError:
    def log_determinant(a):
        """ compute log determinant using scipy LU"""
        sgn, ld = np.linalg.slogdet(a)
        return sgn*ld

In this case, we:

  • pay the compilation cost of import scipy exactly once when the function is first defined
  • have a try/except cost only at the definition time
  • does not explicitly warn the user what they're getting, but does document where the function comes from on inspection.

I think this pattern should be preferred to Conjoined patterns because the implementation can be better introspected and the import cost is only payed on definition, not run.

"Conjoined-None" implementation

Another way that we can do this is to import the module at the top of the function and set some sentinel value if the module is not available.

try:
    import scipy as s
except ImportError:
    s = None
    
def log_determinant(a):
    if s is not None:
        LU, P = s.linalg.lu_factor(a)
        return np.log(np.abs(LU.diagonal())).sum()
    else:
        sgn, ld = np.linalg.slogdet(a)
        return sgn*ld

This is somewhat more desirable than the directly conjoined implementation, since it pays the import cost exactly once. But, if the scipy-enhanced implementation is relatively long, it's easier to accidentally include calls to scipy elsewhere. This is because the try/except is isolated from the rest of the function, and the if/else check looks like any other check. So, while this may be more efficient when a package takes a bit to import, consider using the contained implementation.

Many Explicit Implementations

I think the case of multiple "enhanced" implementations superfluous for this use csae. If the user should ever want to choose between implementations, you should write an explicit loader/dispatcher pattern, where you get to pick whatever driver to use for the function.

The multi-enhancement case is a decidedly more complex instance than the 1+enhancement case, and, where possible, we should pick the best possible enhanced implementation available within the extended environment, or explicitly dispatch.

In the case of many explicit implementations, the best pattern seems to me to be successive contained blocks, with less-desirable implementations sitting lower-down in the hierarchy being accessed.

Explicit Dispatcher implementation

If the author prefers not to use multiple contained blocks or if it is important to preserve the separate implementations *at run time, use a *single dispatcher function with "private" implementations:

def log_determinant(a, using='numpy'):
    return _dispatch_map[using.lower()](a)

def _log_determinant_numpy(a):
    """ compute log determinant using scipy LU"""
    LU, P = s.linalg.lu_factor(a)
    return np.log(np.abs(LU.diagonal())).sum()

def _log_determinant_scipy(a):
    """ compute log determinant using numpy.linalg.slogdet"""
    sgn, ld = np.linalg.slogdet(a)    
    return sgn*ld

_dispatch_map = {'numpy':_log_determinant_numpy, 
                 'scipy':_log_determinant_scipy}

Indeed, collecting/extracting the keys could be automated within a module with a little metaprogramming, if the naming convention on the functions was enforced:

def make_dispatch_map(name_stem):
    dispatch_map = dict()
    for k,v in globals().items():
        if k.startswith('_' + name_stem):
            clean_k = k.split(name_stem)[-1].strip('_')
            dispatch_map.update({clean_k:v})
    return dispatch_map
make_dispatch_map('log_determinant')
{'numpy': <function __main__._log_determinant_numpy>,
 'scipy': <function __main__._log_determinant_scipy>}

When the make_dispatch_map function is executed within a module, it considers only the variables global to that module's scope. But, this is not recommended in cases where the dispatch map is relatively easy to build by hand.

Decorators that autoconvert code

A decorator is a function that takes a function and returns a function. Its special syntax is immediately recognizable:

@decorator
def function(a,b,c):
    return a + b + c

I think this may be the most useful for decorators like @autojit, that will jit-compile a function. It may also be useful for @cythonize. For this case, let us consider cases where the only difference between a reference implementation and the enhanced implementation is that the enhanced one is decorated by some speedup decorator.

"No-Op" implementation

In these cases, the best way to implement this behavior is to write a decorator that does nothing if the speedup decorator is unavailable:

try:
    from numba import autojit
except ImportError:
    def autojit(func):
        return func
    
@autojit
def log_determinant(a):
    sgn, ld = np.linalg.slogdet(a)
    return sgn, ld

Enhancements with no reference implementations.

If we want to put an enhancement in core that soft depends on something outside of the minimal set, it can be easily supported by doing the following:

  1. Either use explicit try/except blocks or use the @requires decorator
  2. Use unittest.@skipIf in the unittests

Optional Core Modules

If you'd like something to appear in PySAL core but the module contains more than the minimal dependency set, simply wrap the module import in a try/except statement in the appropriate __init__.py.

For example, If we wrote a new module in esda, we would add the following line to the esda/__init__.py:

import warnings as _warnings
import mapclassify
import moran
import smoothing
import getisord
import geary
import join_counts
import gamma
try:
    import fast_gamma
except ImportError:
    _warnings.warn('cannot load fast gamma implementation')

Then, if you want to make it available at higher levels, you must repeat the try/except at each level.

However, if you wanted fast_gamma to be a drop-in replacement, you could overwrite its indended target:

import warnings as _warnings
import mapclassify
import moran
import smoothing
import getisord
import geary
import join_counts
try:
    import fast_gamma as gamma
except ImportError:
    _warnings.warn('cannot load fast gamma implementation')
    import gamma

and then all higher levels would correctly import either improved gammma or default gamma.

Using @requires

The @requires decorator has been available in pysal.spreg since the last release, and will be moved to common in the upcoming release.

from pysal.spreg.opt import requires

Requires makes an enhanced function safe by making sure that the function is replaced by something that PySAL can run, namely print or warn. Alternatively, the function can fail silently if requested:

@requires('dont_have_this')
def log_determinant(a):
    return dont_have_this.fastest_log_determinant(a)
log_determinant(M)
missing dependencies: ['dont_have_this']
not running log_determinant
@requires('dont_have_this', verbose=False)
def log_determinant(a):
    return dont_have_this.fastest_log_determinant(a)
log_determinant(M)

At worst, this would cause the user to get a warning & a None when a function is called without sufficient dependencies. You can use @required on class definitions as well. It's essentially a more efficient wrapper for the following contained implementation:

try:
    import dont_have_this
    def log_determinant(a):
        return dont_have_this.fastest_log_determinant(a)
except ImportError:
    def log_determinant(a):
        warn("Don't have the dependencies required to use this")

As an author of a module or a contributor to an existing module, you and the tagged reviewers for the contribution need to decide if users should see that their intended user experience includes the function with the soft dependency.

For example, a class that wraps statsmodels statistical models to use with PySAL analytics might not be shown to users without statsmodels. If we want to suggest that the function is part of the expected user experience, use @requires. Otherwise, use a contained implementation like referred to above, but don't implement a reference:

try:
    import dont_have_this
    def log_determinant(a):
        return dont_have_this.fastest_log_determinant(a)
except ImportError:
    pass

This will cause any code later on that expects to find log_determinant to immediately fail, since the containing module will not find a log_determinant function. In a way, this will make the error more obvious if, later on, log_determinant is expected in the wrong environment.

But, at a point, you and the reviewers need to decide whether it makes sense to expose the function, even if the function will always return None and warnings.

Using @skipIf

PySAL requires that unittests are written for provided functionality. To properly tun unittests on and off when a package is available, decorate the required test methods with unittest.skipIf:

import unittest as ut

try:
    import pandas as pd
except ImportError:
    pd = None

try:
    import theano as th
except ImportError:
    th = None

@ut.skipIf(pd is None, 'pandas is missing')
class Tester(ut.TestCase):
    def setUp(self):
        pass
    
    def test_pandas(self):
        pass
    
    @ut.skipIf(th is None, 'theano is missing')
    def test_theano(self):
        pass

This example will skip the whole test case if pandas is missing, only run test_pandas if pandas is available and theano is not, and will run all tests if theano is available as well as pandas.

Clone this wiki locally