---
execute:
  cache: false
  eval: true
  echo: true
  warning: false
jupyter: python3
title: Explainable AI with SpotPython and Pytorch
---

In [1]:
# from spotpython.utils.init import fun_control_init
# from spotpython.data.diabetes import Diabetes
# from spotpython.hyperdict.light_hyper_dict import LightHyperDict
# from spotpython.hyperparameters.values import (
#         get_default_hyperparameters_as_array, get_one_config_from_X)
# from spotpython.plot.xai import (get_activations, get_gradients, get_weights, plot_nn_values_hist, plot_nn_values_scatter, visualize_weights, visualize_gradients, visualize_activations, visualize_gradient_distributions, visualize_weights_distributions)

# data_set = Diabetes()
# fun_control = fun_control_init(
#     data_set=data_set,
#     _L_in=10, 
#     _L_out=1,
#     _torchmetric="mean_squared_error",
#     core_model_name="light.regression.NNLinearRegressor",
#     hyperdict=LightHyperDict,
#     )
# X = get_default_hyperparameters_as_array(fun_control)
# config = get_one_config_from_X(X, fun_control)
# _L_in = fun_control["_L_in"]
# _L_out = fun_control["_L_out"]
# _torchmetric = fun_control["_torchmetric"]
# model = fun_control["core_model"](**config, _L_in=_L_in, _L_out=_L_out, _torchmetric=_torchmetric)
# batch_size=config["batch_size"]

## Explainable AI with SpotPython and Pytorch {#sec-xai}


In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
from spotpython.data.diabetes import Diabetes
from spotpython.hyperdict.light_hyper_dict import LightHyperDict
from spotpython.fun.hyperlight import HyperLight
from spotpython.utils.init import (fun_control_init, surrogate_control_init, design_control_init)
from spotpython.utils.eda import gen_design_table
from spotpython.spot import spot
from spotpython.utils.file import get_experiment_filename
from spotpython.hyperparameters.values import set_hyperparameter
from math import inf

PREFIX="602"

data_set = Diabetes()

fun_control = fun_control_init(
    save_experiment=True,
    PREFIX=PREFIX,
    fun_evals=inf,
    max_time=60,
    data_set = data_set,
    core_model_name="light.regression.NNLinearRegressor",
    hyperdict=LightHyperDict,
    _L_in=10,
    _L_out=1)

fun = HyperLight().fun


set_hyperparameter(fun_control, "optimizer", [ "Adadelta", "Adam", "Adamax"])
set_hyperparameter(fun_control, "l1", [2,3])
set_hyperparameter(fun_control, "epochs", [10,12])
set_hyperparameter(fun_control, "batch_size", [4,11])
set_hyperparameter(fun_control, "dropout_prob", [0.0, 0.025])
set_hyperparameter(fun_control, "patience", [2,9])

design_control = design_control_init(init_size=20)

spot_tuner = spot.Spot(fun=fun,fun_control=fun_control, design_control=design_control)

Seed set to 123


Seed set to 123


module_name: light
submodule_name: regression
model_name: NNLinearRegressor


### Running the Hyperparameter Tuning or Loading the Existing Model


In [4]:
from spotpython.utils.file import get_experiment_filename, load_experiment
import os
overwrite = False
filename = get_experiment_filename(PREFIX)
if os.path.exists(filename) and not overwrite:
    (spot_tuner, fun_control, design_control,
    surrogate_control, optimizer_control) = load_experiment(filename)
else:
    print("File does not exist or overwrite is True. Starting new experiment.")
    res = spot_tuner.run()

Loaded experiment from spot_602_experiment.pickle


### Results from the Hyperparameter Tuning Experiment

* After the hyperparameter tuning is finished, the following information is available:
    * the `spot_tuner` object and the associated
    * `fun_control` dictionary


In [5]:
res = spot_tuner.print_results(print_screen=True)

min y: 2909.064208984375
l1: 3.0
epochs: 11.0
batch_size: 11.0
act_fn: 5.0
optimizer: 2.0
dropout_prob: 0.025
lr_mult: 10.0
patience: 7.0
batch_norm: 0.0
initialization: 2.0


#### Getting the Best Model, i.e, the Tuned Architecture

* The method `get_tuned_architecture` [[DOC]](https://sequential-parameter-optimization.github.io/spotPython/reference/spotpython/hyperparameters/values/#spotpython.hyperparameters.values.get_tuned_architecture) returns the best model architecture found during the hyperparameter tuning.
* It returns the transformed values, i.e., `batch_size = 2^x` if the hyperparameter `batch_size` was transformed with the `transform_power_2_int` function.


In [6]:
from spotpython.hyperparameters.values import get_tuned_architecture
import pprint
config = get_tuned_architecture(spot_tuner, fun_control)
pprint.pprint(config)

{'act_fn': Swish(),
 'batch_norm': False,
 'batch_size': 2048,
 'dropout_prob': 0.025,
 'epochs': 2048,
 'initialization': 'kaiming_normal',
 'l1': 8,
 'lr_mult': 10.0,
 'optimizer': 'Adamax',
 'patience': 128}


* Note: `get_tuned_architecture` has the option `force_minX` which does not have any effect in this case.


In [7]:
from spotpython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(spot_tuner, fun_control, force_minX=True)
pprint.pprint(config)

{'act_fn': Swish(),
 'batch_norm': False,
 'batch_size': 2048,
 'dropout_prob': 0.025,
 'epochs': 2048,
 'initialization': 'kaiming_normal',
 'l1': 8,
 'lr_mult': 10.0,
 'optimizer': 'Adamax',
 'patience': 128}


### Training the Tuned Architecture on the Test Data

* Since we are interested in the explainability of the model, we will train the tuned architecture on the test data.
* `spotpythons`'s `test_model` function [[DOC]](https://sequential-parameter-optimization.github.io/spotPython/reference/spotpython/light/testmodel/) is used to train the model on the test data.
* Note: Until now, we do not use any information about the NN's weights and biases. Only the architecture, which is available as the `config`, is used.
* `spotpython` used the TensorBoard logger to save the training process in the `./runs` directory. Therefore, we have to enable the TensorBoard logger in the `fun_control` dictionary. To get a clean start, we also set the `TENSORBOARD_CLEAN` flag to `True`.


In [8]:
from spotpython.light.testmodel import test_model
from spotpython.light.loadmodel import load_light_from_checkpoint

fun_control.update({"TENSORBOARD_CLEAN": True})
fun_control.update({"tensorboard_log": True})
test_model(config, fun_control)

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 169    | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
169       Trainable params
0         Non-trainable params
169       Total params
0.001     Total estimated model params size (MB)
15        Modules in train mode
0         Modules in eval mode


Restoring states from the checkpoint path at /Users/bartz/workspace/Hyperparameter-Tuning-Cookbook/runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TEST/last-v1.ckpt


Loaded model weights from the checkpoint at /Users/bartz/workspace/Hyperparameter-Tuning-Cookbook/runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TEST/last-v1.ckpt


test_model result: {'val_loss': 3530.435791015625, 'hp_metric': 3530.435791015625}


(3530.435791015625, 3530.435791015625)

In [9]:
model = load_light_from_checkpoint(config, fun_control)

config: {'l1': 8, 'epochs': 2048, 'batch_size': 2048, 'act_fn': Swish(), 'optimizer': 'Adamax', 'dropout_prob': 0.025, 'lr_mult': 10.0, 'patience': 128, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TEST from runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TEST/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=8, bias=True)
    (1): Swish()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=8, out_features=4, bias=True)
    (4): Swish()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=4, out_features=4, bias=True)
    (7): Swish()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=4, out_features=2, bias=True)
    (10): Swish()
    (11): Dropout(p=0.025, inplace=False)
    (12): Linear(in_features=2, out_features=2, bias=True)
    (13): Swish()
    (14): Dr

##### Details of the Training Process on the Test Data

* The `test_model` method initializes the model with the tuned architecture as follows:

```python
model = fun_control["core_model"](**config, _L_in=_L_in, _L_out=_L_out, _torchmetric=_torchmetric)
```

* Then, the Lightning Trainer is initialized with the `fun_control` dictionary and the model as follows:
    
    ```python
        trainer = L.Trainer(
        default_root_dir=os.path.join(fun_control["CHECKPOINT_PATH"], config_id),
        max_epochs=model.hparams.epochs,
        accelerator=fun_control["accelerator"],
        devices=fun_control["devices"],
        logger=TensorBoardLogger(
            save_dir=fun_control["TENSORBOARD_PATH"],
            version=config_id,
            default_hp_metric=True,
            log_graph=fun_control["log_graph"],
        ),
        callbacks=[
            EarlyStopping(monitor="val_loss", patience=config["patience"], mode="min", strict=False, verbose=False),
            ModelCheckpoint(
                dirpath=os.path.join(fun_control["CHECKPOINT_PATH"], config_id), save_last=True
            ), 
        ],
        enable_progress_bar=enable_progress_bar,
    )
    trainer.fit(model=model, datamodule=dm)    
    test_result = trainer.test(datamodule=dm, ckpt_path="last")
    ```

* As shown in the code above, the last checkpoint ist saved.
* `spotpython`'s method `load_light_from_checkpoint` is used to load the last checkpoint and to get the model's weights and biases. It requires the `fun_control` dictionary and the `config_id` as input to find the correct checkpoint.
* Now, the model is trained and the weights and biases are available.


In [10]:
pprint.pprint(model)

NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=8, bias=True)
    (1): Swish()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=8, out_features=4, bias=True)
    (4): Swish()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=4, out_features=4, bias=True)
    (7): Swish()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=4, out_features=2, bias=True)
    (10): Swish()
    (11): Dropout(p=0.025, inplace=False)
    (12): Linear(in_features=2, out_features=2, bias=True)
    (13): Swish()
    (14): Dropout(p=0.025, inplace=False)
    (15): Linear(in_features=2, out_features=2, bias=True)
    (16): Swish()
    (17): Dropout(p=0.025, inplace=False)
    (18): Linear(in_features=2, out_features=1, bias=True)
  )
)


In [11]:
from torchviz import make_dot
import torch

x = torch.randn(1, 10).requires_grad_(True)
x = x.to("mps")
output = model(x)
dot = make_dot(output, params=dict(model.named_parameters()), show_attrs=True, show_saved=True)
dot.render("model_architecture", format="png")

'model_architecture.png'

![architecture](./model_architecture.png)

## XAI Methods

* `spotpython` provides methods to explain the model's predictions. The following neural network elements can be analyzed: 

### Weights

* Weights are the parameters of the neural network that are learned from the data during training. They connect neurons between layers and determine the strength and direction of the signal sent from one neuron to another. The network adjusts the weights during training to minimize the error between the predicted output and the actual output.
* Interpretation of the weights: A high weight value indicates a strong influence of the input neuron on the output. Positive weights suggest a positive correlation, whereas negative weights suggest an inverse relationship between neurons.

### Activations

* Activations are the outputs produced by neurons after applying an activation function to the weighted sum of inputs. The activation function (e.g., ReLU, sigmoid, tanh) adds non-linearity to the model, allowing it to learn more complex relationships.
* Interpretation of the activations: The value of activations indicates the intensity of the signal passed to the next layer. Certain activation patterns can highlight which features or parts of the data the network is focusing on.

### Gradients

* Gradients are the partial derivatives of the loss function with respect to different parameters (weights) of the network. During backpropagation, gradients are used to update the weights in the direction that reduces the loss by methods like gradient descent.
* Interpretation of the gradients: The magnitude of the gradient indicates how much a parameter should change to reduce the error. A large gradient implies a steeper slope and a bigger update, while a small gradient suggests that the parameter is near an optimal point. If gradients are too small (vanishing gradient problem), the network may learn slowly or stop learning. If they are too large (exploding gradient problem), the updates may be unstable.


In [12]:
from spotpython.plot.xai import (get_activations, get_gradients, get_weights, visualize_weights, visualize_gradients, visualize_activations, visualize_gradient_distributions, visualize_weights_distributions, visualize_activations_distributions)
batch_size = config["batch_size"]

### Getting the Weights


In [13]:
from spotpython.plot.xai import sort_layers
weights = get_weights(model)
sort_layers(weights)

{'Layer 0': array([ 0.07110421, -0.6451192 , -0.5278635 , -0.06179228,  0.22153616,
        -0.14233318,  0.19266607,  0.77980345,  0.47848746,  0.1484009 ,
         0.10708999, -0.545021  ,  0.2124888 ,  0.15804209, -0.10240846,
        -0.22105302, -0.9132397 , -0.5975416 ,  0.65280706,  0.36982608,
        -0.11283313, -0.52672344,  0.39792368,  0.47690582,  0.3719671 ,
         0.19760166, -0.01430609, -0.35328487,  0.24063474,  0.26064098,
         0.10604577,  0.76814866,  0.23175798,  0.35133767, -0.13959356,
         0.76123536, -0.00258299, -0.14246573,  0.27527282,  0.44271433,
        -0.64850646, -0.05273896,  0.5787495 ,  0.21478535,  0.97581714,
         0.09153489,  0.623456  , -0.3014133 ,  0.35171664,  0.33499268,
         0.23520835, -0.06839592, -0.1973295 ,  0.55869144,  0.55193007,
        -0.32832047, -0.373637  , -0.32992724,  0.44109768, -0.11034199,
         0.10730347, -0.16623916,  0.2829803 , -0.38360766,  0.46135926,
        -0.31776834,  0.13553633,  0.444

In [14]:
visualize_weights(model, absolute=True, cmap="BlueWhiteRed", figsize=(6, 6))

80 values in Layer Layer 0.
1 padding values added.
81 values now in Layer Layer 0.


<Figure size 1800x1800 with 2 Axes>

32 values in Layer Layer 3.
4 padding values added.
36 values now in Layer Layer 3.


<Figure size 1800x1800 with 2 Axes>

16 values in Layer Layer 6.
16 values now in Layer Layer 6.


<Figure size 1800x1800 with 2 Axes>

8 values in Layer Layer 9.
1 padding values added.
9 values now in Layer Layer 9.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer Layer 12.
4 values now in Layer Layer 12.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer Layer 15.
4 values now in Layer Layer 15.


<Figure size 1800x1800 with 2 Axes>

2 values in Layer Layer 18.
2 padding values added.
4 values now in Layer Layer 18.


<Figure size 1800x1800 with 2 Axes>

In [15]:
visualize_weights_distributions(model, color=f"C{0}")

n:7


<Figure size 1620x3000 with 8 Axes>

### Getting the Activations


In [16]:
activations = get_activations(model, fun_control=fun_control, batch_size=batch_size, device = "mps")

In [17]:
visualize_activations(model, fun_control=fun_control, batch_size=batch_size, device = "mps", cmap="BlueWhiteRed", absolute=True)

3536 values in Layer 0.
64 padding values added.
3600 values now in Layer 0.


<Figure size 1800x1800 with 2 Axes>

1768 values in Layer 3.
81 padding values added.
1849 values now in Layer 3.


<Figure size 1800x1800 with 2 Axes>

1768 values in Layer 6.
81 padding values added.
1849 values now in Layer 6.


<Figure size 1800x1800 with 2 Axes>

884 values in Layer 9.
16 padding values added.
900 values now in Layer 9.


<Figure size 1800x1800 with 2 Axes>

884 values in Layer 12.
16 padding values added.
900 values now in Layer 12.


<Figure size 1800x1800 with 2 Axes>

884 values in Layer 15.
16 padding values added.
900 values now in Layer 15.


<Figure size 1800x1800 with 2 Axes>

442 values in Layer 18.
42 padding values added.
484 values now in Layer 18.


<Figure size 1800x1800 with 2 Axes>

In [18]:
visualize_activations_distributions(net=model, fun_control=fun_control, batch_size=batch_size, device="mps", color="C0", columns=2)

n:7


<Figure size 1620x3000 with 8 Axes>

### Getting the Gradients


In [19]:
gradients = get_gradients(model, fun_control=fun_control, batch_size=batch_size, device = "mps")

In [20]:
visualize_gradients(model, fun_control, batch_size, absolute=True, cmap="BlueWhiteRed", figsize=(6, 6), device="mps")

80 values in Layer layers.0.weight.
1 padding values added.
81 values now in Layer layers.0.weight.


<Figure size 1800x1800 with 2 Axes>

32 values in Layer layers.3.weight.
4 padding values added.
36 values now in Layer layers.3.weight.


<Figure size 1800x1800 with 2 Axes>

16 values in Layer layers.6.weight.
16 values now in Layer layers.6.weight.


<Figure size 1800x1800 with 2 Axes>

8 values in Layer layers.9.weight.
1 padding values added.
9 values now in Layer layers.9.weight.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer layers.12.weight.
4 values now in Layer layers.12.weight.


<Figure size 1800x1800 with 2 Axes>

4 values in Layer layers.15.weight.
4 values now in Layer layers.15.weight.


<Figure size 1800x1800 with 2 Axes>

2 values in Layer layers.18.weight.
2 padding values added.
4 values now in Layer layers.18.weight.


<Figure size 1800x1800 with 2 Axes>

In [21]:
visualize_gradient_distributions(model, fun_control, batch_size=batch_size, color=f"C{0}", device="mps")

n:7


<Figure size 1620x3000 with 8 Axes>

## Feature Attributions

### Integrated Gradients


In [22]:
from spotpython.plot.xai import get_attributions, plot_attributions
df_att = get_attributions(spot_tuner, fun_control, attr_method="IntegratedGradients", n_rel=10)
plot_attributions(df_att, attr_method="IntegratedGradients")

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 169    | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
169       Trainable params
0         Non-trainable params
169       Total params
0.001     Total estimated model params size (MB)
15        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 2918.2333984375, 'hp_metric': 2918.2333984375}
config: {'l1': 8, 'epochs': 2048, 'batch_size': 2048, 'act_fn': Swish(), 'optimizer': 'Adamax', 'dropout_prob': 0.025, 'lr_mult': 10.0, 'patience': 128, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN from runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=8, bias=True)
    (1): Swish()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=8, out_features=4, bias=True)
    (4): Swish()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=4, out_features=4, bias=True)
    (7): Swish()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=4, out_features=2, bias=True)
    (10): Swish()
    (11): Dropout(p=0.025, inplace=False)
    (12

<Figure size 3000x1800 with 1 Axes>

### Deep Lift


In [23]:
df_lift = get_attributions(spot_tuner, fun_control, attr_method="DeepLift",n_rel=10)
print(df_lift)
plot_attributions(df_lift,  attr_method="DeepLift")

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 169    | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
169       Trainable params
0         Non-trainable params
169       Total params
0.001     Total estimated model params size (MB)
15        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 2867.108154296875, 'hp_metric': 2867.108154296875}
config: {'l1': 8, 'epochs': 2048, 'batch_size': 2048, 'act_fn': Swish(), 'optimizer': 'Adamax', 'dropout_prob': 0.025, 'lr_mult': 10.0, 'patience': 128, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN from runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=8, bias=True)
    (1): Swish()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=8, out_features=4, bias=True)
    (4): Swish()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=4, out_features=4, bias=True)
    (7): Swish()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=4, out_features=2, bias=True)
    (10): Swish()
    (11): Dropout(p=0.025, inplace=False)
   

<Figure size 3000x1800 with 1 Axes>

### Feature Ablation


In [24]:
df_fl = get_attributions(spot_tuner, fun_control, attr_method="FeatureAblation",n_rel=10)

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 169    | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
169       Trainable params
0         Non-trainable params
169       Total params
0.001     Total estimated model params size (MB)
15        Modules in train mode
0         Modules in eval mode


`Trainer.fit` stopped: `max_epochs=2048` reached.


train_model result: {'val_loss': 19317.923828125, 'hp_metric': 19317.923828125}
config: {'l1': 8, 'epochs': 2048, 'batch_size': 2048, 'act_fn': Swish(), 'optimizer': 'Adamax', 'dropout_prob': 0.025, 'lr_mult': 10.0, 'patience': 128, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN from runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=8, bias=True)
    (1): Swish()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=8, out_features=4, bias=True)
    (4): Swish()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=4, out_features=4, bias=True)
    (7): Swish()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=4, out_features=2, bias=True)
    (10): Swish()
    (11): Dropout(p=0.025, inplace=False)
    (12

In [25]:
print(df_fl)
plot_attributions(df_fl, attr_method="FeatureAblation")

   Feature Index Feature  FeatureAblationAttribution
0              8  s5_ltg                   -0.670175
1              3      bp                   -0.540340
2              1     sex                    0.462996
3              2     bmi                   -0.367669
4              7  s4_tch                   -0.346730
5              5  s2_ldl                    0.167847
6              9  s6_glu                    0.084294
7              0     age                    0.083628
8              6  s3_hdl                   -0.071274
9              4   s1_tc                    0.024698


<Figure size 3000x1800 with 1 Axes>

## Conductance


In [26]:
from spotpython.plot.xai import plot_conductance_last_layer, get_weights_conductance_last_layer
weights_last, layer_conductance_last = get_weights_conductance_last_layer(spot_tuner, fun_control)
plot_conductance_last_layer(weights_last, layer_conductance_last, figsize=(6, 6))

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 169    | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
169       Trainable params
0         Non-trainable params
169       Total params
0.001     Total estimated model params size (MB)
15        Modules in train mode
0         Modules in eval mode


GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 169    | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
169       Trainable params
0         Non-trainable params
169       Total params
0.001     Total estimated model params size (MB)
15        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 4836.5693359375, 'hp_metric': 4836.5693359375}
config: {'l1': 8, 'epochs': 2048, 'batch_size': 2048, 'act_fn': Swish(), 'optimizer': 'Adamax', 'dropout_prob': 0.025, 'lr_mult': 10.0, 'patience': 128, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN from runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=8, bias=True)
    (1): Swish()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=8, out_features=4, bias=True)
    (4): Swish()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=4, out_features=4, bias=True)
    (7): Swish()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=4, out_features=2, bias=True)
    (10): Swish()
    (11): Dropout(p=0.025, inplace=False)
    (12

train_model result: {'val_loss': 3858.173095703125, 'hp_metric': 3858.173095703125}
config: {'l1': 8, 'epochs': 2048, 'batch_size': 2048, 'act_fn': Swish(), 'optimizer': 'Adamax', 'dropout_prob': 0.025, 'lr_mult': 10.0, 'patience': 128, 'batch_norm': False, 'initialization': 'kaiming_normal'}
Loading model with 8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN from runs/saved_models/8_2048_2048_Swish_Adamax_0.025_10.0_128_False_kaiming_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=8, bias=True)
    (1): Swish()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=8, out_features=4, bias=True)
    (4): Swish()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=4, out_features=4, bias=True)
    (7): Swish()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=4, out_features=2, bias=True)
    (10): Swish()
    (11): Dropout(p=0.025, inplace=False)
   

<Figure size 1800x1800 with 1 Axes>