# Summary

This notebook will explore explainability as it relates to Deep Learning models; particuarly clasification tasks.

### Update

This is currently a work-in progress, but is looking promising.
It can identify active neurons and identifying different neurons used for different output classes.
It can look at the weights for each of these neurons and see what input features are relevant for what active neurons.

## Justification

A neuron in a neural network is composed of a linear function followed by a non-linear activation function.  The purpose of this notebook is to explore input features/values lead to specific classifications.  I've never had great intuition on how different neurons support classification in a multi-class setting, so I wanted to use this notebook to explore this a little further and improve upon my intuition.

## Approach

The way to do this is to inspect the neurons in the neaural network; look at the feature weights and activated nodes for different output classes.  The intuition is that certain nodes would learn to identify features that identify a specific output and these features would be activated for specific output classes.

In [1]:
# imports
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
import torch
from torch import nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from typing import Tuple

# Dataset and Dataloader

Use the IRIS dataset, create a dataset and dataloader and a simple train/test split.

In [2]:
iris = datasets.load_iris()

In [3]:

x_train, x_test, y_train, y_test = train_test_split(iris.data,
                                                    iris.target,
                                                    test_size=0.1)

In [4]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

Standardizing the train and test data is essential and helps the model converge faster and find a lower mimimum for the loss.

In [5]:
# standardize x values
# Compute mean and standard deviation
mean = np.mean(x_train, axis=0)
std = np.std(x_train, axis=0)

# Perform standardization
# When standardizing the test set it is important to use the training set mean/std  
#  otherwise information about your test data will bleed into your evaluation.
x_train = (x_train - mean) / std
x_test = (x_test - mean) / std

To create a custom Dataset class, implement the interface provided by torch.utils.data.Dataset.  This includes functions __init__, __len__, __getitem__

In [6]:
class IrisDataset(Dataset):

    def __init__(self, x, y):
        self.x = torch.tensor(data=x,
                              dtype=torch.float32)
        self.y = torch.tensor(data=y,
                              dtype=torch.long)

    def __len__(self) -> int:
        return len(self.x)
    
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor]:
        return self.x[index], self.y[index]


Create instances of the IrisDataset class.  
One for training data and one for test data.

In [7]:
train_data = IrisDataset(x=x_train, y=y_train)
test_data = IrisDataset(x=x_test, y=y_test)

In [8]:
print(f'Train: {len(train_data)}')
print(f'Test: {len(test_data)}')

Train: 135
Test: 15


The DataLoader class is used to build training batches and optionally shuffle the data at each epoch.  
Use a large batch size for this trivial problem.

In [9]:
train_dataloader = DataLoader(train_data, batch_size=256, shuffle=True)

# Model

The neural network model  class defined as a subclass of torch.nn.Module.  
The __init__ function is used to define the network, input and output shapes, hidden layers, and the final layer.
The node of a hidden layer in a neural network is a linear function, followed by a non-linear activation function.  The network definition below includes each linear function and each activation function.

The code below will create a simple network with 1 hidden layer.  The inputs initially have 4 dimensions, the hidden layer will expand this to 12 nodes dimensions, the output layer will collapse down to 3 dimensions (the number of output classes).

Modified to return activations.

In [10]:
class IrisNetwork(nn.Module):
    def __init__(self):
        super().__init__()

        self.hidden1 = nn.Linear(4, 12)
        self.activation1 = nn.ReLU()
        self.output = nn.Linear(12, 3)


    def forward(self, x):

        hidden1 = self.hidden1(x)
        activation1 = self.activation1(hidden1)
        output = self.output(activation1)

        # Log the explainability aspects if in training mode.
        #if not self.training:
        #    print(hidden1.weight)
        #    print(hidden1.bias)
        #     print(f'hidden1: {hidden1.shape}')
        #     print(hidden1)
        #     print(f'activation1: {activation1.shape}')
        #     print(activation1)
        #     print(f'output: {output.shape}')
        #     print(output)
            
        return (output, activation1)

Create an instance of IrisNetwork and print its structure.

In [11]:
model = IrisNetwork().to("cpu")
print(model)

IrisNetwork(
  (hidden1): Linear(in_features=4, out_features=12, bias=True)
  (activation1): ReLU()
  (output): Linear(in_features=12, out_features=3, bias=True)
)


# Optimization

A neural network needs a loss function and an optimizer in order to be trained.

The IRIS dataset will use CrossEntropyLoss which expects logits output equal to the number of output classes.

In [12]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1.5e-1)

# Train

Training requires iterations over the batches of data made available via the DataLoader.  
Each pass over the entire dataset is considered an epoch, the training loop will usually be run for multiple epochs.  
   
The example below trains for a large number of epochs (150).  This is more than enough to build a perfect classifier for this trivial problem.


In [13]:
def train_loop(dataloader, model, loss_fn, optimizer):
    model.train() # Puts model into train mode.
    epoch_loss = 0.0
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to("cpu"), y.to("cpu")

        optimizer.zero_grad()   # 0 out gradents for next computation.

        # forward propagation
        pred = model(X)         # Get predicted values for inputs
        loss = loss_fn(pred[0], y) # Compute loss

        # Backpropagation
        loss.backward()         # Compute gradients
        optimizer.step()        # Update trainable parameters.

        # print/track loss
        epoch_loss += loss.item()

    return epoch_loss

In [14]:
epochs = 125
for t in range(epochs):
    epoch_loss = train_loop(train_dataloader, model, loss_fn, optimizer)
    print(f'Epoch {t+1} loss: {epoch_loss}')


Epoch 1 loss: 1.04132878780365
Epoch 2 loss: 0.6209415793418884
Epoch 3 loss: 0.4804547429084778
Epoch 4 loss: 0.40877512097358704
Epoch 5 loss: 0.352464884519577
Epoch 6 loss: 0.3091789186000824
Epoch 7 loss: 0.28199881315231323
Epoch 8 loss: 0.26609018445014954
Epoch 9 loss: 0.24823133647441864
Epoch 10 loss: 0.22484511137008667
Epoch 11 loss: 0.19808275997638702
Epoch 12 loss: 0.17278091609477997
Epoch 13 loss: 0.15202809870243073
Epoch 14 loss: 0.13115404546260834
Epoch 15 loss: 0.11192416399717331
Epoch 16 loss: 0.09525609761476517
Epoch 17 loss: 0.0824543684720993
Epoch 18 loss: 0.07284078747034073
Epoch 19 loss: 0.06442566961050034
Epoch 20 loss: 0.05775027349591255
Epoch 21 loss: 0.051503829658031464
Epoch 22 loss: 0.04912145063281059
Epoch 23 loss: 0.046925172209739685
Epoch 24 loss: 0.04560451582074165
Epoch 25 loss: 0.04464007541537285
Epoch 26 loss: 0.04315115138888359
Epoch 27 loss: 0.0414748340845108
Epoch 28 loss: 0.04051163047552109
Epoch 29 loss: 0.040024664252996445
E

# Test

The test dataset does not require a loop or the mini-batches,
Simply compute the test predictions in one call to the trained model.
test_predictions is wrapped in a softmax to ensure that the predictions are probabilities that sum to 1.  This is common in classification tasks.




In [15]:
model.eval() # Put model in evaluation mode.

with torch.no_grad():
    test_predictions = model(test_data.x)[0]
    test_loss = loss_fn(test_predictions, test_data.y).item()
    test_predictions = torch.softmax(test_predictions, dim=1)

print(f'Loss (MSE) on test dataset {test_loss}')


Loss (MSE) on test dataset 0.09417455643415451


In [16]:
model.hidden1.bias

Parameter containing:
tensor([-0.1890, -1.3447,  3.0912, -1.0852,  0.4250,  0.4312, -0.3905, -1.4740,
        -0.9926, -0.5294, -1.8060,  0.7147], requires_grad=True)

In [17]:
model.hidden1.weight.shape

torch.Size([12, 4])

In [18]:
np.where(model.hidden1.weight[0] > 0)

(array([0, 1, 2, 3]),)

In [19]:
model.hidden1.weight

Parameter containing:
tensor([[ 1.7588,  0.6556,  2.2778,  0.5642],
        [-0.5034,  0.4696, -0.9210, -0.8904],
        [ 1.4754, -0.2611, -0.5847, -1.3288],
        [-1.2014,  1.1262, -0.8873, -0.7997],
        [-1.2014,  1.4520, -1.4586, -1.6997],
        [-0.6982,  1.2390, -1.5964, -1.3437],
        [-1.4773,  1.0019, -1.7186, -1.9799],
        [-0.3087, -0.6222,  2.8651,  1.9852],
        [-0.5809,  1.2950, -0.8598, -0.4670],
        [-1.1990,  1.1826, -1.4767, -1.4573],
        [-0.9695, -1.3006,  2.2729,  2.6452],
        [-0.8673,  1.2865, -1.6217, -1.7191]], requires_grad=True)

Compute precision, recall, and f1 score for the classifier

In [20]:
precision = precision_score(test_data.y.numpy(), torch.argmax(test_predictions, dim=1), average=None)
print(f"test set precision score: {precision}")

recall = recall_score(test_data.y.numpy(), torch.argmax(test_predictions, dim=1), average=None)
print(f"test set recall score: {recall}")

f1 = f1_score(test_data.y.numpy(), torch.argmax(test_predictions, dim=1), average=None)
print(f"test set f1 score: {f1}")


test set precision score: [1.   1.   0.75]
test set recall score: [1.         0.88888889 1.        ]
test set f1 score: [1.         0.94117647 0.85714286]


## Explainability

In [21]:
def test_predictions_and_explainability(model, test_x, test_y):
    
    activations_by_output_class = {
        0: [],
        1: [],
        2: []
    }

    model.eval() # Put model in evaluation mode.

    with torch.no_grad():
        predictions, activations = model(test_x)

    # modify the predictions to get the softmax and the argmax
    predictions = torch.argmax(torch.softmax(predictions, dim=1), dim=1)

    for prediction, truth, activation in zip(predictions, test_y, activations):
        print(prediction)
        print(truth)
        print(activation)
        print('')

        activation_on_indicies = np.where(activation > 0)
        for hidden_neuron in activation_on_indicies[0]:
            relevant_input_features = np.where(model.hidden1.weight[hidden_neuron] > 0)[0]

        if prediction == 0:
            activations_by_output_class[0].append((activation_on_indicies, relevant_input_features))
        elif prediction == 1:
            activations_by_output_class[1].append((activation_on_indicies, relevant_input_features))
        else:
            activations_by_output_class[2].append((activation_on_indicies, relevant_input_features))

    #for predicted_class, activation_list in activations_by_output_class.items():
    #    for activation in activation_list:
    #        relevant_input_features = np.where(model.hidden1.weight[0] > 0)

    print(activations_by_output_class)

    # The weights and bias are accesible for each layer,
    # add some intuition about which input features matter for which active neurons.
    # Weights of hidden layer 1 are shape 12x4 (output_size, input_size))
    



In [22]:
test_predictions_and_explainability(model, test_data.x, test_data.y)

tensor(1)
tensor(1)
tensor([2.3114, 0.0000, 4.0551, 0.0000, 0.0000, 0.0000, 0.0000, 0.1194, 0.0000,
        0.0000, 0.0000, 0.0000])

tensor(0)
tensor(0)
tensor([0.0000, 2.5618, 3.9603, 4.7513, 9.0245, 7.8111, 8.0532, 0.0000, 4.2812,
        7.1017, 0.0000, 8.8966])

tensor(1)
tensor(1)
tensor([0.0000, 0.0000, 2.5706, 0.0000, 0.0000, 0.0000, 0.0000, 0.1333, 0.0000,
        0.0000, 0.4885, 0.0000])

tensor(2)
tensor(2)
tensor([4.5970, 0.0000, 2.6815, 0.0000, 0.0000, 0.0000, 0.0000, 3.3796, 0.0000,
        0.0000, 2.5830, 0.0000])

tensor(1)
tensor(1)
tensor([1.4763, 0.0000, 2.9840, 0.0000, 0.0000, 0.0000, 0.0000, 0.7156, 0.0000,
        0.0000, 0.2888, 0.0000])

tensor(0)
tensor(0)
tensor([0.0000, 1.3395, 4.2172, 2.3369, 5.7351, 5.1224, 5.3196, 0.0000, 1.8677,
        4.3426, 0.0000, 5.9462])

tensor(1)
tensor(1)
tensor([2.2824, 0.0000, 3.8887, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000])

tensor(1)
tensor(1)
tensor([0.3658, 0.0000, 3.4428, 0.0000, 0.

# Conclusion

The purpose of this notebook is to use a network trained on a classification task and see if we can look into what neurons are active and what features are being used to activate these neurons.

This is certainly possible and this notebook is a start in that direction.  With a lot more polish, one could imagine a visual representation of a neural network showing the path that a given input would follow to arrive at a certain output.  This could go a long way towards making the neural network black box more explainable and intuitive.
