# Sensitivity Analysis



## Computing Variable Importance using Sensitivity Analysis

### Algorithm

1. The Jacobian ($\pmb{J}$) is the partial derivative of the of the outcome variable ($y$) with respect to each input variable ($x_i; \text{ where } i=1,\cdots, n$). It is made adimensional by multiplying with the predictor value by the output value.

$$
\pmb{J}_{\vec{x}} = \left[\frac{\partial f(\vec{x})}{\partial x_{1}}\cdot\frac{x_{1}}{y}\;\;\;\;\;\; \frac{\partial f(\vec{x})}{\partial x_{2}}\cdot\frac{x_{2}}{y}\;\; \cdots\;\; \frac{\partial f(\vec{x})}{\partial x_{n}}\cdot\frac{x_{n}}{y}\right]
$$

2. Compute the absolute value of the Jacobian for all the instances in the training dataset ($D^{train}$). This corresponds to the variable importance of each predictor variable according to the model induced from the training dataset.

$$
VarImp = \frac{1}{|D^{train}|}\sum_{\vec{x} \in D^{train}} |\pmb{J}_{\vec{x}}|
$$

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import model_from_json

import numpy as np
import pandas as pd

In [2]:
train_data = pd.read_csv("./data/iris_train.csv")
train_data_np = train_data.to_numpy()

In [3]:
# Loading the ANN model
model = model_from_json(open('./model/model_architecture.json').read())
model.load_weights('./model/model_weights.h5')
model.compile(loss='binary_crossentropy',
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

In [4]:
def abs_adimensional_jacobian_1output(x, model):
    """
    Computes the adimensional Jacobian of an input vector and 
    a Keras model with a single output node.
    
    Input—
    x: Floating numpy vector of inputs.
    model: Keras model.
    
    Output—
    jacobian: Absolute adimensional numpy Jacobian vector for 
        scalar output wrt each input.
    """
    x_tensor = tf.convert_to_tensor(x.reshape(1,-1), dtype=tf.float32)
    with tf.GradientTape() as g:
        g.watch(x_tensor)
        y_tensor = model(x_tensor)
    jacobian = g.jacobian(y_tensor, x_tensor)
    jacobian = jacobian.numpy()[0][0][0]
    
    # Hadamard product between input-variable-gradient and 
    # variable-value/loss to remove the dimensions
    input_by_loss = x/y_tensor.numpy()[0][0]
    adim_jacobian = np.multiply(jacobian, input_by_loss)
    
    return np.absolute(adim_jacobian)

In [5]:
abs_adimensional_jacobian_1output(np.array([5,2,1,0.2]), model)

array([0.0222397 , 0.06126562, 0.03610831, 0.00580287])

In [6]:
def variable_importance_sensitivity_analysis(data, model):
    """
    Computes variable importance of each input variable using sensitivity analysis.
    
    Input—
    data: Data as a numpy matrix.
    model: Keras model.
    
    Output—
    variable_importance: average absolute gradient wrt each input over all instances
    of 'data'.
    """
    abs_jacobian = np.apply_along_axis(abs_adimensional_jacobian_1output,
                       1,
                       data, model=model)
    sum_jacobian = np.sum(abs_jacobian, axis=0)
    
    return sum_jacobian/data.shape[0]

In [7]:
variable_importance_sensitivity_analysis(train_data_np, model)

array([1.12177903, 1.97653917, 6.43293472, 1.40579731])

## Novelty Index using Sensitivity Analysis

### Novelty Index Algorithm

1. Get weights from **sensitivity analysis**. The relative importance of each input variable $x_{i}$ is proportional to its non-dimensional gradient in the neural network. The gradient is the partial derivative of the network output ($y$) with respect to the input variable $x_{i}$.

$$
\vec{S}_{\vec{x}} = \frac{\partial f(\vec{x})}{\partial x_{i}}\cdot \frac{x_{i}}{y};\;\;\;\;\;\; \forall i \in \vec{x}
$$

2. Weigh instances by their sensitivity values. Project all the training instances ($x^{j}; j=1,\cdots m$) and the test instance ($x^{t}$), weighed by their *absolute* sensitivity weights ($|S_{x_{i}}|; i=1\cdots n$).

$$
\vec{x}^{\,j} := |\vec{S}_{\vec{x}^{\,j}}| \circ \vec{x}^{\,j};\;\;\;\;\;\; \text{ where }j=1,\cdots,m,t.
$$

3. Compute the smallest Euclidean distance between the weighted test instance and any weighted train instance.

$$
d_{min}(\vec{x}^{\,t},D^{train}) = \min_{j} \sqrt{\sum_{i=1}^{n} (\vec{x}^{\,j} - \vec{x}^{\,t})^2};\;\;\;\;\;\; \forall \vec{x}^{\,j} \in D^{train}
$$

4. Compute the median Euclidean distance between each training instance with every other training instance.

$$
d_{median}(D^{train}) = median \left( d_{min}\left(\vec{x}^{\,j}, D^{train}\setminus\vec{x}^{\,j}\right) \right);\;\;\;\;\;\; \forall \vec{x}^{\,j} \in D^{train}
$$

5. Calculate Novelty Index ($\eta$), which is the ratio between the smallest Euclidean distance between the test instance and all train instances ($d_{min}$), and the median Euclidean distance between each training instance with every other training instance ($d_{median}$).

$$
\eta(\vec{x}^{\,t}, D^{train}) = \frac{d_{min}(\vec{x}^{\,t},D^{train})}{d_{median}(D^{train})}
$$



In [8]:
def weigh_instance_by_sensitivity(x, model):
    """
    Weighs an input numpy vector for the Keras model by its sensitivity.
    
    Input—
    x: Floating numpy vector of inputs.
    model: Keras model.
    
    Output—
    weighted_x: Input x weighted by its sensitivity weight.
    """
    sens_wts = abs_adimensional_jacobian_1output(x, model)
    # Hadamard product between absolute sensitivity-weights and query
    weighted_x = np.multiply(sens_wts, x)
    
    return weighted_x

In [9]:
weigh_instance_by_sensitivity(np.array([1,2,3,4]), model)

array([ 0.39747321,  2.36322359, 15.75715986,  3.9102108 ])

In [10]:
def weigh_matrix_by_sensitivity(reference, model):
    """
    Weighs all rows of the reference data matrix by its sensitivity in the 
    input Keras model.
    
    Input—
    reference: Input data matrix to be weighted by sensitivity.
    
    Output— 
    Input data matrix weighed by sensitivity.
    """
    return np.apply_along_axis(weigh_instance_by_sensitivity, 
                               1, 
                               reference, model=model)

In [11]:
def min_weighted_euclidean_dist(query, reference, model):
    """
    Compute the euclidean distance between the query (weighted by the sensitivity)
    and each data point in reference (weighed by their sensitivities), then return
    the minimum of the distances.
    
    Input—
    query: Data point queried to the Keras model.
    reference: Training data or any representation of training data
        used to build the input model.
    model: Keras model.
    
    Output—
    min_dist: minimum Euclidean distance between weighted query and weighted
        reference data points.
    """
    weighted_query = weigh_instance_by_sensitivity(query, model)
    weighted_ref = weigh_matrix_by_sensitivity(reference, model)
    dists = np.apply_along_axis(np.linalg.norm, 
                                1, 
                                (weighted_ref-weighted_query))
    min_dist = min(dists)
    return min_dist

In [12]:
def second_min_weighted_euclidean_dist(query, reference, model):
    """
    Compute the minimum Euclidean distance between weighted query and weighted
    reference datapoints excluding the distance comparing query to itself.
    
    Input—
    query: Data point from within the reference set.
    reference: Training data or any representation of training data
        used to build the input model.
    model: Keras model.
    
    Output—
    second_min_dist: Min euclidean distance between query and all reference
        data points excluding the query.
    """
    weighted_query = weigh_instance_by_sensitivity(query, model)
    weighted_ref = weigh_matrix_by_sensitivity(reference, model)
    dists = np.apply_along_axis(np.linalg.norm, 
                                1, 
                                (weighted_ref-weighted_query))
    
    # because the min would be the instance compared to itself
    second_min_dist = np.sort(dists, 
                              kind="mergesort")[1]
    return second_min_dist

In [13]:
def median_ref_wt_euclidean_dist(reference, model):
    """
    Median euclidean distance between every data point in the reference
    compared to every other data point in the reference, each weighted
    by their sensitivities.
    
    Input—
    reference: data representing the training data that model was trained on.
    model: Keras model.
    
    Output—
    Median euclidean distance between each data point in reference compared 
        to every other data point.
    """
    min_dists = np.apply_along_axis(second_min_weighted_euclidean_dist, 
                        1, reference, 
                        reference=reference,
                        model = model)
    return np.median(min_dists)

In [14]:
def in_forbidden_zone(query):
    """
    Tells if the query is unintelligible for the Keras model.
    
    Input—
    query: Query to be used for the Keras model
    
    Output—
    Boolean True if the query is legal for the Keras model,
        False otherwise.
    
    """
    return np.any(query<0)

In [32]:
def compute_novelty_index(query, nov_deno, reference, model):
    """
    Computes novelty index.
    
    Input—
    query: Query to be used for the Keras model as a numpy row vector.
    nov_deno: Median euclidean distance between each data point in reference compared 
        to every other data point in the reference.
    reference: Representation of training data.
    model: Keras model.
    
    Output—
    Novelty index
    
    """
    if(in_forbidden_zone(query)):
        return np.inf
    
    nov_nume = min_weighted_euclidean_dist(query, reference, model)
    
    return nov_nume/nov_deno

In [17]:
deno = median_ref_wt_euclidean_dist(train_data_np, model)

In [36]:
deno

1.7835123384693827

In [38]:
query = np.array([5, 2, 10.2, 0.2]).reshape(1,-1)
compute_novelty_index(query, deno, train_data_np, model)

15.412492114194928

## Limitations of Sensitivity Analysis
1. The model $f(x)$ is typically non-linear. So, sensitivity depends upon the input. In some regions the sensitivity for the same variable-value may be large, and small in others.
    1. Averaging the sensitivity provides a possible approximation of its variable importance.
2. Input variables have different scales.
    1. Standardize them or make them adimensional.
3. Sensitivity tells us about the model, not the underlying data. If two input variables are highly correlated, the model may prefer one over the other.
    1. One possible solution is to use Dropouts while training model, so the model generalizes better.
4. Partial derivatives do not have any meaning for categorical variables. Only applies to continuous variables, which is a theoretical concept. In practice, all variable measurements are discrete.
