---
execute:
  cache: false
  eval: true
  echo: true
  warning: false
jupyter: python3
title: Explainable AI with SpotPython and Pytorch
---

In [1]:
#| echo: false
#| label: imports
import warnings
warnings.filterwarnings("ignore")

In [2]:
#| label: configure_spot
from spotpython.data.diabetes import Diabetes
from spotpython.hyperdict.light_hyper_dict import LightHyperDict
from spotpython.fun.hyperlight import HyperLight
from spotpython.utils.init import (fun_control_init, surrogate_control_init, design_control_init)
from spotpython.utils.eda import gen_design_table
from spotpython.spot import spot
from spotpython.utils.file import get_experiment_filename
from spotpython.hyperparameters.values import set_hyperparameter
from math import inf

PREFIX="602_2"

data_set = Diabetes()

fun_control = fun_control_init(
    save_experiment=True,
    PREFIX=PREFIX,
    fun_evals=inf,
    max_time=60,
    data_set = data_set,
    core_model_name="light.regression.NNLinearRegressor",
    hyperdict=LightHyperDict,
    _L_in=10,
    _L_out=1)

fun = HyperLight().fun


set_hyperparameter(fun_control, "optimizer", [ "Adadelta", "Adam", "Adamax"])
set_hyperparameter(fun_control, "l1", [5,7])
set_hyperparameter(fun_control, "epochs", [10,12])
set_hyperparameter(fun_control, "batch_size", [4,11])
set_hyperparameter(fun_control, "dropout_prob", [0.0, 0.025])
set_hyperparameter(fun_control, "patience", [2,9])

design_control = design_control_init(init_size=20)

spot_tuner = spot.Spot(fun=fun,fun_control=fun_control, design_control=design_control)

Seed set to 123


Seed set to 123


module_name: light
submodule_name: regression
model_name: NNLinearRegressor


## Running the Hyperparameter Tuning or Loading the Existing Model


In [3]:
#| label: run_experiment
from spotpython.utils.file import get_experiment_filename, load_experiment
import os
overwrite = False
filename = get_experiment_filename(PREFIX)
if os.path.exists(filename) and not overwrite:
    (spot_tuner, fun_control, design_control,
    surrogate_control, optimizer_control) = load_experiment(filename)
else:
    print("File does not exist or overwrite is True. Starting new experiment.")
    res = spot_tuner.run()
# only needed for spotpython version < 0.16.0
fun_control.update({"_L_cond": None})

Loaded experiment from spot_602_2_experiment.pickle


## Results from the Hyperparameter Tuning Experiment

* After the hyperparameter tuning is finished, the following information is available:
    * the `spot_tuner` object and the associated
    * `fun_control` dictionary


In [4]:
#| label: print_results
res = spot_tuner.print_results(print_screen=True)

min y: 2870.854248046875
l1: 5.0
epochs: 12.0
batch_size: 10.0
act_fn: 2.0
optimizer: 1.0
dropout_prob: 0.025
lr_mult: 2.553553499475551
patience: 6.0
batch_norm: 0.0
initialization: 2.0


In [5]:
#| label: 602_plot_progress_xai
spot_tuner.plot_progress()

<Figure size 2700x1800 with 1 Axes>

### Getting the Best Model, i.e, the Tuned Architecture

* The method `get_tuned_architecture` [[DOC]](https://sequential-parameter-optimization.github.io/spotPython/reference/spotpython/hyperparameters/values/#spotpython.hyperparameters.values.get_tuned_architecture) returns the best model architecture found during the hyperparameter tuning.
* It returns the transformed values, i.e., `batch_size = 2^x` if the hyperparameter `batch_size` was transformed with the `transform_power_2_int` function.


In [6]:
#| label: get_tuned_architecture
from spotpython.hyperparameters.values import get_tuned_architecture
import pprint
config = get_tuned_architecture(spot_tuner, fun_control)
pprint.pprint(config)

{'act_fn': ReLU(),
 'batch_norm': False,
 'batch_size': 1024,
 'dropout_prob': 0.025,
 'epochs': 4096,
 'initialization': 'kaiming_normal',
 'l1': 32,
 'lr_mult': 2.553553499475551,
 'optimizer': 'Adam',
 'patience': 64}


* Note: `get_tuned_architecture` has the option `force_minX` which does not have any effect in this case.


In [7]:
#| label: get_tuned_architecture_force_minX
from spotpython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(spot_tuner, fun_control, force_minX=True)
pprint.pprint(config)

{'act_fn': ReLU(),
 'batch_norm': False,
 'batch_size': 1024,
 'dropout_prob': 0.025,
 'epochs': 4096,
 'initialization': 'kaiming_normal',
 'l1': 32,
 'lr_mult': 2.553553499475551,
 'optimizer': 'Adam',
 'patience': 64}


## Training the Tuned Architecture on the Test Data

* Since we are interested in the explainability of the model, we will train the tuned architecture on the test data.
* `spotpythons`'s `test_model` function [[DOC]](https://sequential-parameter-optimization.github.io/spotPython/reference/spotpython/light/testmodel/) is used to train the model on the test data.
* Note: Until now, we do not use any information about the NN's weights and biases. Only the architecture, which is available as the `config`, is used.
* `spotpython` used the TensorBoard logger to save the training process in the `./runs` directory. Therefore, we have to enable the TensorBoard logger in the `fun_control` dictionary. To get a clean start, we remove an existing `runs` folder.


In [8]:
#| label: test_model
from spotpython.light.testmodel import test_model
from spotpython.light.loadmodel import load_light_from_checkpoint
import os
# if the directory "./runs" exists, delete it
if os.path.exists("./runs"):
    os.system("rm -r ./runs")
fun_control.update({"tensorboard_log": True})
test_model(config, fun_control)

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 1.6 K  | train | [1024, 10] | [1024, 1]
-----------------------------------------------------------------------
1.6 K     Trainable params
0         Non-trainable params
1.6 K     Total params
0.006     Total estimated model params size (MB)
31        Modules in train mode
0         Modules in eval mode


Restoring states from the checkpoint path at /Users/bartz/workspace/Hyperparameter-Tuning-Cookbook/runs/saved_models/32_4096_1024_ReLU_Adam_0.025_2.5536_64_False_kaiming_normal_TEST/last.ckpt


Loaded model weights from the checkpoint at /Users/bartz/workspace/Hyperparameter-Tuning-Cookbook/runs/saved_models/32_4096_1024_ReLU_Adam_0.025_2.5536_64_False_kaiming_normal_TEST/last.ckpt


test_model result: {'val_loss': 6792.990234375, 'hp_metric': 6792.990234375}


(6792.990234375, 6792.990234375)

In [9]:
#| label: load_model_from_chkpt
model = load_light_from_checkpoint(config, fun_control)

config: {'l1': 32, 'epochs': 4096, 'batch_size': 1024, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.025, 'lr_mult': 2.553553499475551, 'patience': 64, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 32_4096_1024_ReLU_Adam_0.025_2.5536_64_False_kaiming_normal_TEST from runs/saved_models/32_4096_1024_ReLU_Adam_0.025_2.5536_64_False_kaiming_normal_TEST/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=32, out_features=16, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=16, out_features=16, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=16, out_features=8, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, inplace=False)
    (12): Linear(in_features=8, out_features=8, bias=True)
    (13): ReLU()
   

#### Details of the Training Process on the Test Data

* The `test_model` method initializes the model with the tuned architecture as follows:

```python
model = fun_control["core_model"](**config, _L_in=_L_in, _L_out=_L_out, _torchmetric=_torchmetric)
```

* Then, the Lightning Trainer is initialized with the `fun_control` dictionary and the model as follows:
    
    ```python
        trainer = L.Trainer(
        default_root_dir=os.path.join(fun_control["CHECKPOINT_PATH"], config_id),
        max_epochs=model.hparams.epochs,
        accelerator=fun_control["accelerator"],
        devices=fun_control["devices"],
        logger=TensorBoardLogger(
            save_dir=fun_control["TENSORBOARD_PATH"],
            version=config_id,
            default_hp_metric=True,
            log_graph=fun_control["log_graph"],
        ),
        callbacks=[
            EarlyStopping(monitor="val_loss", patience=config["patience"], mode="min", strict=False, verbose=False),
            ModelCheckpoint(
                dirpath=os.path.join(fun_control["CHECKPOINT_PATH"], config_id), save_last=True
            ), 
        ],
        enable_progress_bar=enable_progress_bar,
    )
    trainer.fit(model=model, datamodule=dm)    
    test_result = trainer.test(datamodule=dm, ckpt_path="last")
    ```

* As shown in the code above, the last checkpoint ist saved.
* `spotpython`'s method `load_light_from_checkpoint` is used to load the last checkpoint and to get the model's weights and biases. It requires the `fun_control` dictionary and the `config_id` as input to find the correct checkpoint.
* Now, the model is trained and the weights and biases are available.


In [10]:
#| label: model_viz
from torchviz import make_dot
import torch
x = torch.randn(1, 10).requires_grad_(True)
x = x.to("mps")
output = model(x)
dot = make_dot(output, params=dict(model.named_parameters()), show_attrs=True, show_saved=True)
dot.render("model_architecture", format="png")

'model_architecture.png'

![architecture](./model_architecture.png)

## XAI Methods

* `spotpython` provides methods to explain the model's predictions. The following neural network elements can be analyzed: 

### Weights

* Weights are the parameters of the neural network that are learned from the data during training. They connect neurons between layers and determine the strength and direction of the signal sent from one neuron to another. The network adjusts the weights during training to minimize the error between the predicted output and the actual output.
* Interpretation of the weights: A high weight value indicates a strong influence of the input neuron on the output. Positive weights suggest a positive correlation, whereas negative weights suggest an inverse relationship between neurons.

### Activations

* Activations are the outputs produced by neurons after applying an activation function to the weighted sum of inputs. The activation function (e.g., ReLU, sigmoid, tanh) adds non-linearity to the model, allowing it to learn more complex relationships.
* Interpretation of the activations: The value of activations indicates the intensity of the signal passed to the next layer. Certain activation patterns can highlight which features or parts of the data the network is focusing on.

### Gradients

* Gradients are the partial derivatives of the loss function with respect to different parameters (weights) of the network. During backpropagation, gradients are used to update the weights in the direction that reduces the loss by methods like gradient descent.
* Interpretation of the gradients: The magnitude of the gradient indicates how much a parameter should change to reduce the error. A large gradient implies a steeper slope and a bigger update, while a small gradient suggests that the parameter is near an optimal point. If gradients are too small (vanishing gradient problem), the network may learn slowly or stop learning. If they are too large (exploding gradient problem), the updates may be unstable.
* `sptpython` provides the method `get_gradients` to get the gradients of the model. Here is a short example:

In [11]:
#| label: get_gradients_demo
from spotpython.utils.init import fun_control_init
from spotpython.data.diabetes import Diabetes
from spotpython.light.regression.nn_linear_regressor import NNLinearRegressor
from spotpython.hyperdict.light_hyper_dict import LightHyperDict
from spotpython.hyperparameters.values import (
        get_default_hyperparameters_as_array, get_one_config_from_X)
from spotpython.plot.xai import get_gradients
import numpy as np
fun_control = fun_control_init(
    _L_in=10, # 10: diabetes
    _L_out=1,
    _torchmetric="mean_squared_error",
    data_set=Diabetes(),
    core_model=NNLinearRegressor,
    hyperdict=LightHyperDict)
X = get_default_hyperparameters_as_array(fun_control)
config = get_one_config_from_X(X, fun_control)
_L_in = fun_control["_L_in"]
_L_out = fun_control["_L_out"]
_torchmetric = fun_control["_torchmetric"]
batch_size = 16
model = fun_control["core_model"](**config, _L_in=_L_in, _L_out=_L_out, _torchmetric=_torchmetric)
gradients = get_gradients(model, fun_control=fun_control, batch_size=batch_size, device = "cpu")
# assert that the gradients are a dictionary with keys that contain the string 'layers' and values that are arrays
assert all([key in gradients.keys() for key in gradients.keys()])
assert all([isinstance(value, np.ndarray) for value in gradients.values()])
print(gradients)

Seed set to 123


stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
{'layers.0.weight': array([-18.91906  , -15.034285 ,  -9.014692 , -11.67453  , -17.93505  ,
       -18.900719 ,   3.181451 ,  -7.079934 ,  -8.781589 , -19.415773 ,
       -31.762537 , -25.240526 , -15.134445 , -19.59995  , -30.110514 ,
       -31.731745 ,   5.3412247, -11.88625  , -14.743096 , -32.596447 ,
       -16.250072 , -19.540495 , -12.840339 , -12.497604 , -24.44074  ,
       -26.738008 ,   7.0891356, -14.540221 , -12.63131  , -20.33385  ,
       -16.617418 , -19.537054 , -12.366335 , -11.95286  , -22.170914 ,
       -24.224556 ,   7.333409 , -13.811482 , -12.374348 , -19.54898  ,
       -12.489107 , -14.683411 ,  -9.294134 ,  -8.983377 , -16.662935 ,
       -18.20638  ,   5.5115504, -10.380258 ,  -9.300154 , -14.692373 ,
       -10.237142 , -12.0357

In [12]:
#| label: import_xai
from spotpython.plot.xai import (get_activations, get_gradients, get_weights, visualize_weights, visualize_gradients, visualize_mean_activations, visualize_gradient_distributions, visualize_weights_distributions, visualize_activations_distributions)
batch_size = config["batch_size"]

### Getting the Weights


In [13]:
#| label: get_weights
from spotpython.plot.xai import sort_layers
weights = get_weights(model)
sort_layers(weights)

{'Layer 0': array([0.87303966, 0.46751958, 0.11630094, 0.493778  , 0.59376913,
        0.15941548, 0.21318042, 0.02062678, 0.32472628, 0.93553424,
        0.58549803, 0.46945995, 0.520079  , 0.8117728 , 0.05846822,
        0.11420518, 0.3337648 , 0.21224093, 0.75789255, 0.85329175,
        0.01488709, 0.07566017, 0.01309109, 0.68862724, 0.90242726,
        0.11232996, 0.2685218 , 0.6591103 , 0.17350733, 0.9247398 ,
        0.6165806 , 0.3608082 , 0.5324753 , 0.65588427, 0.32319343,
        0.11257207, 0.5033608 , 0.5091075 , 0.5100826 , 0.42703485,
        0.82103676, 0.3604704 , 0.45160365, 0.7055938 , 0.18528134,
        0.6338793 , 0.38944215, 0.73983675, 0.22876781, 0.51847917,
        0.5489147 , 0.09770173, 0.1364289 , 0.69175667, 0.35447174,
        0.7969483 , 0.00606871, 0.2528357 , 0.08816904, 0.6997357 ,
        0.48550433, 0.4067393 , 0.41681433, 0.10917556, 0.6417975 ,
        0.5124629 , 0.15494353, 0.68814385, 0.48995048, 0.01642621,
        0.7689601 , 0.76744354, 0.405

In [14]:
#| label: visualize_weights
visualize_weights(model, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))

80 values in Layer Layer 0.
1 padding values added.
81 values now in Layer Layer 0.


<Figure size 1800x1800 with 2 Axes>

32 values in Layer Layer 3.
4 padding values added.
36 values now in Layer Layer 3.


<Figure size 1800x1800 with 2 Axes>

16 values in Layer Layer 6.
16 values now in Layer Layer 6.


<Figure size 1800x1800 with 2 Axes>

8 values in Layer Layer 9.
1 padding values added.
9 values now in Layer Layer 9.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer Layer 12.
4 values now in Layer Layer 12.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer Layer 15.
4 values now in Layer Layer 15.


<Figure size 1800x1800 with 2 Axes>

2 values in Layer Layer 18.
2 padding values added.
4 values now in Layer Layer 18.


<Figure size 1800x1800 with 2 Axes>

In [15]:
#| label: visualize_weights_distributions
visualize_weights_distributions(model, color=f"C{0}", columns=4)

n:7


<Figure size 3240x1500 with 8 Axes>

### Getting the Activations


In [16]:
#| label: get_activations
from spotpython.plot.xai import get_activations
activations, mean_activations = get_activations(net=model, fun_control=fun_control, batch_size=batch_size, device="cpu")

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160


In [17]:
#| label: visualize_mean_activations
visualize_mean_activations(mean_activations, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))

8 values in Layer 0.
1 padding values added.
9 values now in Layer 0.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer 3.
4 values now in Layer 3.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer 6.
4 values now in Layer 6.


<Figure size 1800x1800 with 2 Axes>

2 values in Layer 9.
2 padding values added.
4 values now in Layer 9.


<Figure size 1800x1800 with 2 Axes>

2 values in Layer 12.
2 padding values added.
4 values now in Layer 12.


<Figure size 1800x1800 with 2 Axes>

2 values in Layer 15.
2 padding values added.
4 values now in Layer 15.


<Figure size 1800x1800 with 2 Axes>

In [18]:
#| label: visualize_activations_distributions
visualize_activations_distributions(activations=activations,
                                    net=model, color="C0", columns=4)

<Figure size 3240x1500 with 8 Axes>

### Getting the Gradients


In [19]:
#| label: get_gradients
gradients = get_gradients(net=model, fun_control=fun_control, batch_size=batch_size, device = "cpu")

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160


In [20]:
#| label: visualize_gradients
visualize_gradients(model, fun_control, batch_size, absolute=True, cmap="GreenYellowRed", figsize=(6, 6), device="cpu")

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
80 values in Layer layers.0.weight.
1 padding values added.
81 values now in Layer layers.0.weight.


<Figure size 1800x1800 with 2 Axes>

32 values in Layer layers.3.weight.
4 padding values added.
36 values now in Layer layers.3.weight.


<Figure size 1800x1800 with 2 Axes>

16 values in Layer layers.6.weight.
16 values now in Layer layers.6.weight.


<Figure size 1800x1800 with 2 Axes>

8 values in Layer layers.9.weight.
1 padding values added.
9 values now in Layer layers.9.weight.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer layers.12.weight.
4 values now in Layer layers.12.weight.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer layers.15.weight.
4 values now in Layer layers.15.weight.


<Figure size 1800x1800 with 2 Axes>

2 values in Layer layers.18.weight.
2 padding values added.
4 values now in Layer layers.18.weight.


<Figure size 1800x1800 with 2 Axes>

In [21]:
#| label: visualize_gradient_distributions
visualize_gradient_distributions(model, fun_control, batch_size=batch_size, color=f"C{0}", device="cpu", columns=3)

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
n:7


<Figure size 2430x2250 with 9 Axes>

## Feature Attributions

### Integrated Gradients


In [22]:
#| label: get_attributions_xai
from spotpython.plot.xai import get_attributions, plot_attributions
df_att = get_attributions(spot_tuner, fun_control, attr_method="IntegratedGradients", n_rel=10)
plot_attributions(df_att, attr_method="IntegratedGradients")

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 1.6 K  | train | [1024, 10] | [1024, 1]
-----------------------------------------------------------------------
1.6 K     Trainable params
0         Non-trainable params
1.6 K     Total params
0.006     Total estimated model params size (MB)
31        Modules in train mode
0         Modules in eval mode


`Trainer.fit` stopped: `max_epochs=4096` reached.


train_model result: {'val_loss': 23605.609375, 'hp_metric': 23605.609375}
config: {'l1': 32, 'epochs': 4096, 'batch_size': 1024, 'act_fn': ReLU(), 'optimizer': 'Adagrad', 'dropout_prob': 0.025, 'lr_mult': 2.553553499475551, 'patience': 64, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN from runs/saved_models/32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=32, out_features=16, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=16, out_features=16, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=16, out_features=8, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, inplace=F

<Figure size 3000x1800 with 1 Axes>

### Deep Lift


In [23]:
#| label: get_attributions_deep_lift
df_lift = get_attributions(spot_tuner, fun_control, attr_method="DeepLift",n_rel=10)
print(df_lift)
plot_attributions(df_lift,  attr_method="DeepLift")

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 1.6 K  | train | [1024, 10] | [1024, 1]
-----------------------------------------------------------------------
1.6 K     Trainable params
0         Non-trainable params
1.6 K     Total params
0.006     Total estimated model params size (MB)
31        Modules in train mode
0         Modules in eval mode


`Trainer.fit` stopped: `max_epochs=4096` reached.


train_model result: {'val_loss': 23605.609375, 'hp_metric': 23605.609375}
config: {'l1': 32, 'epochs': 4096, 'batch_size': 1024, 'act_fn': ReLU(), 'optimizer': 'Adagrad', 'dropout_prob': 0.025, 'lr_mult': 2.553553499475551, 'patience': 64, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN from runs/saved_models/32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=32, out_features=16, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=16, out_features=16, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=16, out_features=8, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, inplace=F

<Figure size 3000x1800 with 1 Axes>

### Feature Ablation


In [24]:
#| label: get_attributions_feature_ablation
df_fl = get_attributions(spot_tuner, fun_control, attr_method="FeatureAblation",n_rel=10)

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 1.6 K  | train | [1024, 10] | [1024, 1]
-----------------------------------------------------------------------
1.6 K     Trainable params
0         Non-trainable params
1.6 K     Total params
0.006     Total estimated model params size (MB)
31        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 2968.56298828125, 'hp_metric': 2968.56298828125}
config: {'l1': 32, 'epochs': 4096, 'batch_size': 1024, 'act_fn': ReLU(), 'optimizer': 'Adagrad', 'dropout_prob': 0.025, 'lr_mult': 2.553553499475551, 'patience': 64, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN from runs/saved_models/32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=32, out_features=16, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=16, out_features=16, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=16, out_features=8, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, i

In [25]:
#| label: plot_attributions_feature_ablation
print(df_fl)
plot_attributions(df_fl, attr_method="FeatureAblation")

   Feature Index Feature  FeatureAblationAttribution
0              9  s6_glu                         0.0
1              8  s5_ltg                         0.0
2              7  s4_tch                         0.0
3              6  s3_hdl                         0.0
4              5  s2_ldl                         0.0
5              4   s1_tc                         0.0
6              3      bp                         0.0
7              2     bmi                         0.0
8              1     sex                         0.0
9              0     age                         0.0


<Figure size 3000x1800 with 1 Axes>

## Conductance


In [26]:
#| label: get_conductance
from spotpython.plot.xai import plot_conductance_last_layer, get_weights_conductance_last_layer
weights_last, layer_conductance_last = get_weights_conductance_last_layer(spot_tuner, fun_control)
plot_conductance_last_layer(weights_last, layer_conductance_last, figsize=(6, 6))

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 1.6 K  | train | [1024, 10] | [1024, 1]
-----------------------------------------------------------------------
1.6 K     Trainable params
0         Non-trainable params
1.6 K     Total params
0.006     Total estimated model params size (MB)
31        Modules in train mode
0         Modules in eval mode


GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 1.6 K  | train | [1024, 10] | [1024, 1]
-----------------------------------------------------------------------
1.6 K     Trainable params
0         Non-trainable params
1.6 K     Total params
0.006     Total estimated model params size (MB)
31        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 3035.412841796875, 'hp_metric': 3035.412841796875}
config: {'l1': 32, 'epochs': 4096, 'batch_size': 1024, 'act_fn': ReLU(), 'optimizer': 'Adagrad', 'dropout_prob': 0.025, 'lr_mult': 2.553553499475551, 'patience': 64, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN from runs/saved_models/32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=32, out_features=16, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=16, out_features=16, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=16, out_features=8, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025,

`Trainer.fit` stopped: `max_epochs=4096` reached.


train_model result: {'val_loss': 23605.609375, 'hp_metric': 23605.609375}
config: {'l1': 32, 'epochs': 4096, 'batch_size': 1024, 'act_fn': ReLU(), 'optimizer': 'Adagrad', 'dropout_prob': 0.025, 'lr_mult': 2.553553499475551, 'patience': 64, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN from runs/saved_models/32_4096_1024_ReLU_Adagrad_0.025_2.5536_64_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=32, out_features=16, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=16, out_features=16, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=16, out_features=8, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, inplace=F

<Figure size 1800x1800 with 1 Axes>