---
execute:
  cache: false
  eval: true
  echo: true
  warning: false
jupyter: python3
title: Explainable AI with SpotPython and Pytorch
---

In [1]:
#| echo: false
#| label: imports
import warnings
warnings.filterwarnings("ignore")

In [2]:
#| label: configure_spot
from spotpython.data.diabetes import Diabetes
from spotpython.hyperdict.light_hyper_dict import LightHyperDict
from spotpython.fun.hyperlight import HyperLight
from spotpython.utils.init import (fun_control_init, surrogate_control_init, design_control_init)
from spotpython.utils.eda import gen_design_table
from spotpython.spot import spot
from spotpython.utils.file import get_experiment_filename
from spotpython.hyperparameters.values import set_hyperparameter
from math import inf

PREFIX="602_5"

data_set = Diabetes()

fun_control = fun_control_init(
    save_experiment=True,
    PREFIX=PREFIX,
    fun_evals=inf,
    max_time=60,
    data_set = data_set,
    core_model_name="light.regression.NNLinearRegressor",
    hyperdict=LightHyperDict,
    _L_in=10,
    _L_out=1)

fun = HyperLight().fun


set_hyperparameter(fun_control, "optimizer", [ "Adadelta", "Adam", "Adamax"])
set_hyperparameter(fun_control, "l1", [3,7])
set_hyperparameter(fun_control, "epochs", [10,12])
set_hyperparameter(fun_control, "batch_size", [4,11])
set_hyperparameter(fun_control, "dropout_prob", [0.0, 0.025])
set_hyperparameter(fun_control, "patience", [2,9])

design_control = design_control_init(init_size=7)

spot_tuner = spot.Spot(fun=fun,fun_control=fun_control, design_control=design_control)

Seed set to 123


Seed set to 123


module_name: light
submodule_name: regression
model_name: NNLinearRegressor


## Running the Hyperparameter Tuning or Loading the Existing Model


In [3]:
#| label: run_experiment
from spotpython.utils.file import get_experiment_filename, load_experiment
import os
overwrite = False
filename = get_experiment_filename(PREFIX)
if os.path.exists(filename) and not overwrite:
    (spot_tuner, fun_control, design_control,
    surrogate_control, optimizer_control) = load_experiment(filename)
else:
    print("File does not exist or overwrite is True. Starting new experiment.")
    res = spot_tuner.run()
# only needed for spotpython version < 0.16.0
fun_control.update({"_L_cond": None})

Loaded experiment from spot_602_5_experiment.pickle


## Results from the Hyperparameter Tuning Experiment

* After the hyperparameter tuning is finished, the following information is available:
    * the `spot_tuner` object and the associated
    * `fun_control` dictionary


In [4]:
#| label: print_results
res = spot_tuner.print_results(print_screen=True)

min y: 2759.4755859375
l1: 7.0
epochs: 12.0
batch_size: 11.0
act_fn: 2.0
optimizer: 1.0
dropout_prob: 0.025
lr_mult: 3.2952069054058097
patience: 5.0
batch_norm: 0.0
initialization: 1.0


In [5]:
#| label: 602_plot_progress_xai
spot_tuner.plot_progress()

<Figure size 2700x1800 with 1 Axes>

### Getting the Best Model, i.e, the Tuned Architecture

* The method `get_tuned_architecture` [[DOC]](https://sequential-parameter-optimization.github.io/spotPython/reference/spotpython/hyperparameters/values/#spotpython.hyperparameters.values.get_tuned_architecture) returns the best model architecture found during the hyperparameter tuning.
* It returns the transformed values, i.e., `batch_size = 2^x` if the hyperparameter `batch_size` was transformed with the `transform_power_2_int` function.


In [6]:
#| label: get_tuned_architecture
from spotpython.hyperparameters.values import get_tuned_architecture
import pprint
config = get_tuned_architecture(spot_tuner, fun_control)
pprint.pprint(config)

{'act_fn': ReLU(),
 'batch_norm': False,
 'batch_size': 2048,
 'dropout_prob': 0.025,
 'epochs': 4096,
 'initialization': 'kaiming_uniform',
 'l1': 128,
 'lr_mult': 3.2952069054058097,
 'optimizer': 'Adam',
 'patience': 32}


* Note: `get_tuned_architecture` has the option `force_minX` which does not have any effect in this case.


In [7]:
#| label: get_tuned_architecture_force_minX
from spotpython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(spot_tuner, fun_control, force_minX=True)
pprint.pprint(config)

{'act_fn': ReLU(),
 'batch_norm': False,
 'batch_size': 2048,
 'dropout_prob': 0.025,
 'epochs': 4096,
 'initialization': 'kaiming_uniform',
 'l1': 128,
 'lr_mult': 3.2952069054058097,
 'optimizer': 'Adam',
 'patience': 32}


## Training the Tuned Architecture on the Test Data

* Since we are interested in the explainability of the model, we will train the tuned architecture on the test data.
* `spotpythons`'s `test_model` function [[DOC]](https://sequential-parameter-optimization.github.io/spotPython/reference/spotpython/light/testmodel/) is used to train the model on the test data.
* Note: Until now, we do not use any information about the NN's weights and biases. Only the architecture, which is available as the `config`, is used.
* `spotpython` used the TensorBoard logger to save the training process in the `./runs` directory. Therefore, we have to enable the TensorBoard logger in the `fun_control` dictionary. To get a clean start, we remove an existing `runs` folder.


In [8]:
#| label: test_model
from spotpython.light.testmodel import test_model
from spotpython.light.loadmodel import load_light_from_checkpoint
import os
# if the directory "./runs" exists, delete it
if os.path.exists("./runs"):
    os.system("rm -r ./runs")
fun_control.update({"tensorboard_log": True})
test_model(config, fun_control)

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 19.4 K | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
19.4 K    Trainable params
0         Non-trainable params
19.4 K    Total params
0.078     Total estimated model params size (MB)
23        Modules in train mode
0         Modules in eval mode


Restoring states from the checkpoint path at /Users/bartz/workspace/Hyperparameter-Tuning-Cookbook/runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TEST/last.ckpt


Loaded model weights from the checkpoint at /Users/bartz/workspace/Hyperparameter-Tuning-Cookbook/runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TEST/last.ckpt


test_model result: {'val_loss': 3069.13818359375, 'hp_metric': 3069.13818359375}


(3069.13818359375, 3069.13818359375)

In [9]:
#| label: load_model_from_chkpt
model = load_light_from_checkpoint(config, fun_control)

config: {'l1': 128, 'epochs': 4096, 'batch_size': 2048, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.025, 'lr_mult': 3.2952069054058097, 'patience': 32, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TEST from runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TEST/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=128, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=128, out_features=64, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=64, out_features=64, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=64, out_features=32, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, inplace=False)
    (12): Linear(in_features=32, out_features=32, bias=True)
    (13)

#### Details of the Training Process on the Test Data

* The `test_model` method initializes the model with the tuned architecture as follows:

```python
model = fun_control["core_model"](**config, _L_in=_L_in, _L_out=_L_out, _torchmetric=_torchmetric)
```

* Then, the Lightning Trainer is initialized with the `fun_control` dictionary and the model as follows:
    
    ```python
        trainer = L.Trainer(
        default_root_dir=os.path.join(fun_control["CHECKPOINT_PATH"], config_id),
        max_epochs=model.hparams.epochs,
        accelerator=fun_control["accelerator"],
        devices=fun_control["devices"],
        logger=TensorBoardLogger(
            save_dir=fun_control["TENSORBOARD_PATH"],
            version=config_id,
            default_hp_metric=True,
            log_graph=fun_control["log_graph"],
        ),
        callbacks=[
            EarlyStopping(monitor="val_loss", patience=config["patience"], mode="min", strict=False, verbose=False),
            ModelCheckpoint(
                dirpath=os.path.join(fun_control["CHECKPOINT_PATH"], config_id), save_last=True
            ), 
        ],
        enable_progress_bar=enable_progress_bar,
    )
    trainer.fit(model=model, datamodule=dm)    
    test_result = trainer.test(datamodule=dm, ckpt_path="last")
    ```

* As shown in the code above, the last checkpoint ist saved.
* `spotpython`'s method `load_light_from_checkpoint` is used to load the last checkpoint and to get the model's weights and biases. It requires the `fun_control` dictionary and the `config_id` as input to find the correct checkpoint.
* Now, the model is trained and the weights and biases are available.

## Visualizing the Neural Network Architecture


In [10]:
#| label: viz_net_spotpython
from spotpython.plot.xai import viz_net
viz_net(model, device="mps")

![architecture](./model_architecture.png)

## XAI Methods

* `spotpython` provides methods to explain the model's predictions. The following neural network elements can be analyzed: 

### Weights

* Weights are the parameters of the neural network that are learned from the data during training. They connect neurons between layers and determine the strength and direction of the signal sent from one neuron to another. The network adjusts the weights during training to minimize the error between the predicted output and the actual output.
* Interpretation of the weights: A high weight value indicates a strong influence of the input neuron on the output. Positive weights suggest a positive correlation, whereas negative weights suggest an inverse relationship between neurons.

### Activations

* Activations are the outputs produced by neurons after applying an activation function to the weighted sum of inputs. The activation function (e.g., ReLU, sigmoid, tanh) adds non-linearity to the model, allowing it to learn more complex relationships.
* Interpretation of the activations: The value of activations indicates the intensity of the signal passed to the next layer. Certain activation patterns can highlight which features or parts of the data the network is focusing on.

### Gradients

* Gradients are the partial derivatives of the loss function with respect to different parameters (weights) of the network. During backpropagation, gradients are used to update the weights in the direction that reduces the loss by methods like gradient descent.
* Interpretation of the gradients: The magnitude of the gradient indicates how much a parameter should change to reduce the error. A large gradient implies a steeper slope and a bigger update, while a small gradient suggests that the parameter is near an optimal point. If gradients are too small (vanishing gradient problem), the network may learn slowly or stop learning. If they are too large (exploding gradient problem), the updates may be unstable.
* `sptpython` provides the method `get_gradients` to get the gradients of the model. 


In [11]:
#| label: import_xai
from spotpython.plot.xai import (get_activations, get_gradients, get_weights, visualize_weights, visualize_gradients, visualize_mean_activations, visualize_gradient_distributions, visualize_weights_distributions, visualize_activations_distributions)
batch_size = config["batch_size"]

### Getting the Weights


In [12]:
#| label: get_weights
from spotpython.plot.xai import sort_layers
weights, _ = get_weights(model)
# sort_layers(weights)

In [13]:
#| label: visualize_weights
visualize_weights(model, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))

1280 values in Layer Layer 0. Geometry: (128, 10)


<Figure size 1800x1800 with 2 Axes>

8192 values in Layer Layer 3. Geometry: (64, 128)


<Figure size 1800x1800 with 2 Axes>

4096 values in Layer Layer 6. Geometry: (64, 64)


<Figure size 1800x1800 with 2 Axes>

2048 values in Layer Layer 9. Geometry: (32, 64)


<Figure size 1800x1800 with 2 Axes>

1024 values in Layer Layer 12. Geometry: (32, 32)


<Figure size 1800x1800 with 2 Axes>

1024 values in Layer Layer 15. Geometry: (32, 32)


<Figure size 1800x1800 with 2 Axes>

512 values in Layer Layer 18. Geometry: (16, 32)


<Figure size 1800x1800 with 2 Axes>

256 values in Layer Layer 21. Geometry: (16, 16)


<Figure size 1800x1800 with 2 Axes>

256 values in Layer Layer 24. Geometry: (16, 16)


<Figure size 1800x1800 with 2 Axes>

256 values in Layer Layer 27. Geometry: (16, 16)


<Figure size 1800x1800 with 2 Axes>

16 values in Layer Layer 30. Geometry: (1, 16)


<Figure size 1800x1800 with 2 Axes>

In [14]:
#| label: visualize_weights_distributions
visualize_weights_distributions(model, color=f"C{0}", columns=4)

n:11


<Figure size 3240x2250 with 12 Axes>

### Getting the Activations


In [15]:
#| label: get_activations
from spotpython.plot.xai import get_activations
activations, mean_activations, layer_sizes = get_activations(net=model, fun_control=fun_control, batch_size=batch_size, device="mps")

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160


In [16]:
#| label: visualize_mean_activations
visualize_mean_activations(mean_activations, layer_sizes=layer_sizes, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))

128 values in Layer 0. Geometry: (1, 128)


<Figure size 1800x1800 with 2 Axes>

64 values in Layer 3. Geometry: (1, 64)


<Figure size 1800x1800 with 2 Axes>

64 values in Layer 6. Geometry: (1, 64)


<Figure size 1800x1800 with 2 Axes>

32 values in Layer 9. Geometry: (1, 32)


<Figure size 1800x1800 with 2 Axes>

32 values in Layer 12. Geometry: (1, 32)


<Figure size 1800x1800 with 2 Axes>

32 values in Layer 15. Geometry: (1, 32)


<Figure size 1800x1800 with 2 Axes>

16 values in Layer 18. Geometry: (1, 16)


<Figure size 1800x1800 with 2 Axes>

16 values in Layer 21. Geometry: (1, 16)


<Figure size 1800x1800 with 2 Axes>

16 values in Layer 24. Geometry: (1, 16)


<Figure size 1800x1800 with 2 Axes>

16 values in Layer 27. Geometry: (1, 16)


<Figure size 1800x1800 with 2 Axes>

In [17]:
#| label: visualize_activations_distributions
visualize_activations_distributions(activations=activations,
                                    net=model, color="C0", columns=4)

<Figure size 3240x2250 with 12 Axes>

### Getting the Gradients


In [18]:
#| label: get_gradients
gradients, _ = get_gradients(net=model, fun_control=fun_control, batch_size=batch_size, device = "mps")

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160


In [19]:
#| label: visualize_gradients
visualize_gradients(model, fun_control, batch_size, absolute=True, cmap="GreenYellowRed", figsize=(6, 6), device="mps")

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
1280 values in Layer layers.0.weight. Geometry: (128, 10)


<Figure size 1800x1800 with 2 Axes>

8192 values in Layer layers.3.weight. Geometry: (64, 128)


<Figure size 1800x1800 with 2 Axes>

4096 values in Layer layers.6.weight. Geometry: (64, 64)


<Figure size 1800x1800 with 2 Axes>

2048 values in Layer layers.9.weight. Geometry: (32, 64)


<Figure size 1800x1800 with 2 Axes>

1024 values in Layer layers.12.weight. Geometry: (32, 32)


<Figure size 1800x1800 with 2 Axes>

1024 values in Layer layers.15.weight. Geometry: (32, 32)


<Figure size 1800x1800 with 2 Axes>

512 values in Layer layers.18.weight. Geometry: (16, 32)


<Figure size 1800x1800 with 2 Axes>

256 values in Layer layers.21.weight. Geometry: (16, 16)


<Figure size 1800x1800 with 2 Axes>

256 values in Layer layers.24.weight. Geometry: (16, 16)


<Figure size 1800x1800 with 2 Axes>

256 values in Layer layers.27.weight. Geometry: (16, 16)


<Figure size 1800x1800 with 2 Axes>

16 values in Layer layers.30.weight. Geometry: (1, 16)


<Figure size 1800x1800 with 2 Axes>

In [20]:
#| label: visualize_gradient_distributions
visualize_gradient_distributions(model, fun_control, batch_size=batch_size, color=f"C{0}", device="mps", columns=3)

stage: fit
full_sizefull_train_size: 0.6
full_sizeval_size: 0.24
full_sizetrain_size: 0.36
full_sizetest_size: 0.4
train_size: 0.36, val_size: 0.24 used for train & val data.
LightDataModule.train_dataloader(). data_train size: 160


n:11


<Figure size 2430x3000 with 12 Axes>

## Feature Attributions

### Integrated Gradients


In [21]:
#| label: get_attributions_xai
from spotpython.plot.xai import get_attributions, plot_attributions
df_att = get_attributions(spot_tuner, fun_control, attr_method="IntegratedGradients", n_rel=10)
plot_attributions(df_att, attr_method="IntegratedGradients")

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 19.4 K | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
19.4 K    Trainable params
0         Non-trainable params
19.4 K    Total params
0.078     Total estimated model params size (MB)
23        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 2884.680419921875, 'hp_metric': 2884.680419921875}
config: {'l1': 128, 'epochs': 4096, 'batch_size': 2048, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.025, 'lr_mult': 3.2952069054058097, 'patience': 32, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN from runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=128, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=128, out_features=64, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=64, out_features=64, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=64, out_features=32, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025

<Figure size 3000x1800 with 1 Axes>

### Deep Lift


In [22]:
#| label: get_attributions_deep_lift
df_lift = get_attributions(spot_tuner, fun_control, attr_method="DeepLift",n_rel=10)
print(df_lift)
plot_attributions(df_lift,  attr_method="DeepLift")

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 19.4 K | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
19.4 K    Trainable params
0         Non-trainable params
19.4 K    Total params
0.078     Total estimated model params size (MB)
23        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 2942.842529296875, 'hp_metric': 2942.842529296875}
config: {'l1': 128, 'epochs': 4096, 'batch_size': 2048, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.025, 'lr_mult': 3.2952069054058097, 'patience': 32, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN from runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=128, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=128, out_features=64, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=64, out_features=64, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=64, out_features=32, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025

<Figure size 3000x1800 with 1 Axes>

### Feature Ablation


In [23]:
#| label: get_attributions_feature_ablation
df_fl = get_attributions(spot_tuner, fun_control, attr_method="FeatureAblation",n_rel=10)

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 19.4 K | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
19.4 K    Trainable params
0         Non-trainable params
19.4 K    Total params
0.078     Total estimated model params size (MB)
23        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 2884.307373046875, 'hp_metric': 2884.307373046875}
config: {'l1': 128, 'epochs': 4096, 'batch_size': 2048, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.025, 'lr_mult': 3.2952069054058097, 'patience': 32, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN from runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=128, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=128, out_features=64, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=64, out_features=64, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=64, out_features=32, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025

In [24]:
#| label: plot_attributions_feature_ablation
print(df_fl)
plot_attributions(df_fl, attr_method="FeatureAblation")

   Feature Index Feature  FeatureAblationAttribution
0              2     bmi                  108.357697
1              8  s5_ltg                   90.321625
2              3      bp                   85.288071
3              6  s3_hdl                   71.089119
4              1     sex                   60.189651
5              0     age                   59.186375
6              5  s2_ldl                   58.824371
7              9  s6_glu                   57.201794
8              7  s4_tch                   37.141697
9              4   s1_tc                   28.066246


<Figure size 3000x1800 with 1 Axes>

## Conductance


In [25]:
#| label: get_conductance
from spotpython.plot.xai import plot_conductance_last_layer, get_weights_conductance_last_layer
weights_last, layer_conductance_last = get_weights_conductance_last_layer(spot_tuner, fun_control)
plot_conductance_last_layer(weights_last, layer_conductance_last, figsize=(6, 6))

GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 19.4 K | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
19.4 K    Trainable params
0         Non-trainable params
19.4 K    Total params
0.078     Total estimated model params size (MB)
23        Modules in train mode
0         Modules in eval mode


GPU available: True (mps), used: True


TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs



  | Name   | Type       | Params | Mode  | In sizes   | Out sizes
-----------------------------------------------------------------------
0 | layers | Sequential | 19.4 K | train | [2048, 10] | [2048, 1]
-----------------------------------------------------------------------
19.4 K    Trainable params
0         Non-trainable params
19.4 K    Total params
0.078     Total estimated model params size (MB)
23        Modules in train mode
0         Modules in eval mode


train_model result: {'val_loss': 2810.6845703125, 'hp_metric': 2810.6845703125}
config: {'l1': 128, 'epochs': 4096, 'batch_size': 2048, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.025, 'lr_mult': 3.2952069054058097, 'patience': 32, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN from runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=128, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=128, out_features=64, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=64, out_features=64, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=64, out_features=32, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, in

train_model result: {'val_loss': 2993.82763671875, 'hp_metric': 2993.82763671875}
config: {'l1': 128, 'epochs': 4096, 'batch_size': 2048, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.025, 'lr_mult': 3.2952069054058097, 'patience': 32, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN from runs/saved_models/128_4096_2048_ReLU_Adam_0.025_3.2952_32_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=128, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.025, inplace=False)
    (3): Linear(in_features=128, out_features=64, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.025, inplace=False)
    (6): Linear(in_features=64, out_features=64, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.025, inplace=False)
    (9): Linear(in_features=64, out_features=32, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.025, 

<Figure size 1800x1800 with 1 Axes>