### Defining Custom Optimizers

Currently, apricot implements a range of different optimization approaches. In the same way that any optimizer in neural network libraries can be applied to any model, any optimizer in apricot can be applied to any submodular function and thus is agnostic to the details about it. This property is convenient when it comes to defining custom optimizers, because it means that one can focus entirely on the definition of the optimizer and then use it on any built-in (or even custom) function in apricot. 

In [1]:
%pylab inline
numpy.random.seed(0)

Populating the interactive namespace from numpy and matplotlib


The skeleton of an optimizer object is simple, and has the following form.

In [2]:
from apricot.optimizers import BaseOptimizer

class SkeletonOptimizer(BaseOptimizer):
    # The optimizer object should always inherit from BaseOptimizer
    # or an object that inherits from BaseOptimizer.

    def __init__(self, function=None, random_state=None, n_jobs=None, 
        verbose=False):
        # The optimzier object should take in the above parameters
        # and pass them in to the super function. Any optimizer-specific
        # hyperparameters should also be passed in here.
        
        super(SkeletonOptimizer, self).__init__(function=function, 
            random_state=random_state, n_jobs=n_jobs, verbose=verbose)

    def select(self, X, k, sample_cost=None):
        # This is the key function. The `select` method should be called a
        # single time and returns either a subset of size k or a subset with
        # a weighted sum of less than or equal to k. 
        
        raise NotImplementedError

The `select` method is where the optimization algorithm is implemented. It uses the `_calculate_gains` and `_select_next` methods in the selection objects to determine the benefit of adding each element in and inform the selector of the next item to add to the set, respectively.

Let's implement a new version of the naive greedy algorithm. This algorithm, at each iteration, calculates the gain that each element would provide if added to the subset, and then selects the item with the best gain. Due to the diminishing returns property of submodular functions, it is not guaranteed that the second best item at any particular iteration will be the best item the next iteration.

In [3]:
class SimpleNaiveGreedy(BaseOptimizer):
    def __init__(self, function=None, random_state=None, n_jobs=None, 
        verbose=False):
        # The naive greedy algorithm has no hyperparameters.
        super(SimpleNaiveGreedy, self).__init__(function=function, 
            random_state=random_state, n_jobs=n_jobs, verbose=verbose)

    def select(self, X, k, sample_cost=None):
        # This is a version of the naive greedy algorithm that ignores
        # sample weights, for demonstration purposes.
        
        for i in range(k):
            gains = self.function._calculate_gains(X)
            idx = numpy.argmax(gains)
            best_idx = self.function.idxs[idx]

            self.function._select_next(X[best_idx], gains[idx], best_idx)

            if self.verbose == True:
                self.function.pbar.update(1)

Simple! All we had to do is get the gains of each element, get the element with the largest gain (and index it properly), and then tell the selector which item was the best! Obviously, not all optimization strategies are this straightforward. For instance, the stochastic greedy algorithm involves evaluating subsets of points at each iteration and choosing the best item from different subsets at each iteration. Control over the precise elements being evaluated is important for that optimizer.

How do we use this custom optimizer, though? Well, we just pass it in to the `optimizer` parameter that each selection has.

In [4]:
from apricot import FeatureBasedSelection

model1 = FeatureBasedSelection(100, 'sqrt', optimizer='naive')
model2 = FeatureBasedSelection(100, 'sqrt', optimizer=SimpleNaiveGreedy())

Okay, now let's run it and make sure we get the same results.

In [5]:
X = numpy.exp(numpy.random.randn(10000, 100))

model1.fit(X)
model2.fit(X)

numpy.all(model1.ranking == model2.ranking), (model1.gains - model2.gains).sum()

(True, 0.0)

Great! Do they take the same amount of time?

In [6]:
%timeit model1.fit(X)
%timeit model2.fit(X)

1.27 s ± 9.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.21 s ± 9.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Looks like they do. That makes sense because most of the expensive calculation is being done within the selection object. The only thing being done here is taking an argmax of a vector, which is basically what the built-in optimizer is doing anyway.