# Imports

In [1]:
import numpy as np

# Add these lines if you clone from https://github.com/fdalvi/NeuroX
# import sys
# sys.path.append("/path/to/NeuroX")
# In the case of the notebook, the path is the parent directory
import sys
sys.path.append("..")

# Introduction

This notebook uses the [CONLL 2003](https://huggingface.co/datasets/conll2003) dataset for Named Entity Recognition to illustrate central methods in NeuroX:
- Extracting activations from a transformer LM
- Loading activations and a labeled dataset
- Training a probe on all activations of the LM
- Training a probe on specific layers and neuron subsets
- Running a control task to assess the quality of the probe


The notebook was built by [David Arps](https://github.com/davidarps) and [Younes Samih](https://user.phil-fak.uni-duesseldorf.de/~samih/) on the basis of the [tutorial](https://github.com/fdalvi/NeuroX/blob/master/examples/End%20to%20End%20Example.ipynb) from [Fahim Dalvi](https://fdalvi.github.io).

# Data

In [2]:
from datasets import load_dataset

In [14]:
conll_dataset = load_dataset("conll2003")

Reusing dataset conll2003 (/home/david/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/63f4ebd1bcb7148b1644497336fd74643d4ce70123334431a3c053b7ee4e96ee)


  0%|          | 0/3 [00:00<?, ?it/s]

Adapt these cutoffs to your hardware. Rule of thumb: DistilBERT representations in `float16` precision take 150-200MB for 1k sentences. 

In [80]:
train_cutoff = 2000 # max: 14042
valid_cutoff = 200  # max: 3251

In [81]:
# write the training set to file
with open('train.tok', 'w') as f:
    for sent in conll_dataset['train']['tokens'][:train_cutoff]:
        f.write(' '.join(sent))
        f.write('\n')
with open('train.ner', 'w') as f:
    # for decoding ix->label
    feat_names = conll_dataset['train'].features['ner_tags'].feature.names
    for ix_seq in conll_dataset['train']['ner_tags'][:train_cutoff]:
        f.write(' '.join([feat_names[t] for t in ix_seq]))
        f.write('\n')

In [82]:
# write the validation set to file
with open('validation.tok', 'w') as f:
    for sent in conll_dataset['validation']['tokens'][:valid_cutoff]:
        f.write(' '.join(sent))
        f.write('\n')
with open('validation.ner', 'w') as f:
    # for decoding ix->label
    feat_names = conll_dataset['validation'].features['ner_tags'].feature.names
    for ix_seq in conll_dataset['validation']['ner_tags'][:valid_cutoff]:
        f.write(' '.join([feat_names[t] for t in ix_seq]))
        f.write('\n')

# Extract Representations

In [26]:
import neurox.data.extraction.transformers_extractor as transformers_extractor

In [83]:
transformers_extractor.extract_representations('distilbert-base-uncased',
    'train.tok',
    'train_activations.hdf5',
    aggregation="average", #last, first
    output_type='hdf5',
    dtype='float16'
)

Loading model: distilbert-base-uncased


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_projector.weight', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Reading input corpus
Preparing output file
Extracting representations from model
Sentence         : "EU rejects German call to boycott British lamb ."
Original    (009): ['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']
Tokenized   (011): ['[CLS]', 'eu', 'rejects', 'german', 'call', 'to', 'boycott', 'british', 'lamb', '.', '[SEP]']
Filtered   (009): ['eu', 'rejects', 'german', 'call', 'to', 'boycott', 'british', 'lamb', '.']
Detokenized (009): ['eu', 'rejects', 'german', 'call', 'to', 'boycott', 'british', 'lamb', '.']
Counter: 9
Hidden states:  (7, 9, 768)
# Extracted words:  9


In [84]:
transformers_extractor.extract_representations('distilbert-base-uncased',
    'validation.tok',
    'validation_activations.hdf5',
    aggregation="average", #last, first
    output_type='hdf5',
    dtype='float16'
)

Loading model: distilbert-base-uncased


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_projector.weight', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Reading input corpus
Preparing output file
Extracting representations from model
Sentence         : "CRICKET - LEICESTERSHIRE TAKE OVER AT TOP AFTER INNINGS VICTORY ."
Original    (011): ['CRICKET', '-', 'LEICESTERSHIRE', 'TAKE', 'OVER', 'AT', 'TOP', 'AFTER', 'INNINGS', 'VICTORY', '.']
Tokenized   (013): ['[CLS]', 'cricket', '-', 'leicestershire', 'take', 'over', 'at', 'top', 'after', 'innings', 'victory', '.', '[SEP]']
Filtered   (011): ['cricket', '-', 'leicestershire', 'take', 'over', 'at', 'top', 'after', 'innings', 'victory', '.']
Detokenized (011): ['cricket', '-', 'leicestershire', 'take', 'over', 'at', 'top', 'after', 'innings', 'victory', '.']
Counter: 11
Hidden states:  (7, 11, 768)
# Extracted words:  11


# Prepare Data

The following cells load the activation file for the training and validation data.

In [33]:
import neurox.data.loader as data_loader
activations, num_layers = data_loader.load_activations('train_activations.hdf5', 768, dtype='float16')

Loading hdf5 activations from train_activations.hdf5...


In [46]:
valid_activations, num_layers = data_loader.load_activations('validation_activations.hdf5', 768, dtype='float16')

Loading hdf5 activations from validation_activations.hdf5...


You can explore the activations variable. What type is it? What do the size of the activations variable, and the size of its components mean?

In [105]:
# print(activations)

In [39]:
activations[0].shape

(9, 5376)

In [40]:
activations[1].shape

(2, 5376)

Here, the text and NER tags are loaded

In [41]:
# load_data also does sanity checks for parallelism between tokens, labels and activations
tokens = data_loader.load_data('train.tok',
                               'train.ner',
                               activations,
                               512 # max_sent_l
                              )

In [47]:
# load validation text and tokens here
valid_tokens = data_loader.load_data('validation.tok',
                               'validation.ner',
                               valid_activations,
                               512 # max_sent_l
                              )

This creates tensors and mappings, which are used as input to the probes.

In [49]:
import neurox.interpretation.utils as utils
X, y, mapping = utils.create_tensors(tokens, activations, 'NN', dtype='float16')
label2idx, idx2label, src2idx, idx2src = mapping

Number of tokens:  29031
length of source dictionary:  7020
length of target dictionary:  9
29031
Total instances: 29031
['Scorer', 'Social', 'RACING', 'promised', 'evacuated', 'second-seeded', 'weapon', 'scores', 'weighed', 'Sales', '2-1', 'writers', 'draft', 'audience', 'landed', 'Northamptonshire', 'steps', '6-7(3-7', 'risk', 'championships']
Number of samples:  29031
Stats: Labels with their frequencies in the final set
O 23888
B-ORG 656
B-PER 1084
I-LOC 152
B-MISC 577
I-ORG 428
B-LOC 1255
I-PER 791
I-MISC 200


In [55]:
X_valid, y_valid, mapping = utils.create_tensors(valid_tokens, valid_activations, 'NN', mappings=mapping, dtype='float16')
X[0] #9984

Number of tokens:  2591
length of source dictionary:  7020
length of target dictionary:  9
2591
Total instances: 2591
['471', '199', 'Medvedev', 'silence', 'U.S.', 'CAPTAIN', '31st', 'Africa', 'Glenn', '56', 'feel', '6-4', 'scores', '426', 'Test', 'CHAMPIONSHIP', 'taking', '372', '2-1', 'Northamptonshire']
Number of samples:  2591
Stats: Labels with their frequencies in the final set
O 2022
B-ORG 119
B-PER 122
I-LOC 24
B-MISC 27
I-ORG 20
B-LOC 125
I-PER 111
I-MISC 21


array([-0.7524,  0.0982,  1.195 , ..., -0.523 ,  0.576 ,  0.1161],
      dtype=float16)

# Train Probing Classifier

In [106]:
import neurox.interpretation.linear_probe as linear_probe

# check out the parameters of this method for default hyperparameters
probe = linear_probe.train_logistic_regression_probe(X, y, lambda_l1=0.001, lambda_l2=0.001)

Training classification probe
Creating model...
Number of training instances: 29031
Number of classes: 9


epoch [1/10]: 0it [00:00, ?it/s]

Epoch: [1/10], Loss: 0.0149


epoch [2/10]: 0it [00:00, ?it/s]

Epoch: [2/10], Loss: 0.0137


epoch [3/10]: 0it [00:00, ?it/s]

Epoch: [3/10], Loss: 0.0135


epoch [4/10]: 0it [00:00, ?it/s]

Epoch: [4/10], Loss: 0.0134


epoch [5/10]: 0it [00:00, ?it/s]

Epoch: [5/10], Loss: 0.0133


epoch [6/10]: 0it [00:00, ?it/s]

Epoch: [6/10], Loss: 0.0133


epoch [7/10]: 0it [00:00, ?it/s]

Epoch: [7/10], Loss: 0.0133


epoch [8/10]: 0it [00:00, ?it/s]

Epoch: [8/10], Loss: 0.0133


epoch [9/10]: 0it [00:00, ?it/s]

Epoch: [9/10], Loss: 0.0132


epoch [10/10]: 0it [00:00, ?it/s]

Epoch: [10/10], Loss: 0.0132


# Evaluate Model

In [54]:
print('Eval on training data:')
linear_probe.evaluate_probe(probe, X, y, idx_to_class=idx2label)

Eval on training data:


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.94


{'__OVERALL__': 0.9425786228514347,
 'O': 0.996483590087073,
 'B-ORG': 0.29115853658536583,
 'B-PER': 0.9575645756457565,
 'I-LOC': 0.25,
 'B-MISC': 0.6845753899480069,
 'I-ORG': 0.29439252336448596,
 'B-LOC': 0.7960159362549801,
 'I-PER': 0.8470290771175727,
 'I-MISC': 0.515}

In [101]:
print('Eval on validation data:')
scores = linear_probe.evaluate_probe(probe, X_valid, y_valid, idx_to_class=idx2label)
scores

Eval on validation data:


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.93


{'__OVERALL__': 0.9293708992666924,
 'O': 0.996538081107814,
 'B-ORG': 0.36134453781512604,
 'B-PER': 0.9836065573770492,
 'I-LOC': 0.5,
 'B-MISC': 0.5555555555555556,
 'I-ORG': 0.15,
 'B-LOC': 0.752,
 'I-PER': 0.8558558558558559,
 'I-MISC': 0.5238095238095238}

# Layerwise probing

In [57]:
# [layer0_neuron0, layer0_neuron1, layer0_neuron2 ...., layer1_neuron0, layer1_neuron1,....., layer13_neuron768]
# [768,768,770...1536] <- for layer 1
import neurox.interpretation.ablation as ablation

The following cell shows how to train and evaluate a probe on individual layers. In this case, the embedding layer, and two hidden layers

In [58]:
layers_to_evaluate = [0,2,4]
train_results = []
valid_results = []
for layer in layers_to_evaluate:
    layer_l_X = ablation.filter_activations_by_layers(X, [layer], 7)
    layer_l_X_valid = ablation.filter_activations_by_layers(X_valid, [layer], 7)
    probe_layer_l = linear_probe.train_logistic_regression_probe(layer_l_X, y, lambda_l1=0.001, lambda_l2=0.001)
    train_results.append(linear_probe.evaluate_probe(probe_layer_l, layer_l_X, y, idx_to_class=idx2label))
    valid_results.append(linear_probe.evaluate_probe(probe_layer_l, layer_l_X_valid, y_valid, idx_to_class=idx2label))

Training classification probe
Creating model...
Number of training instances: 29031
Number of classes: 9


epoch [1/10]: 0it [00:00, ?it/s]

Epoch: [1/10], Loss: 0.0200


epoch [2/10]: 0it [00:00, ?it/s]

Epoch: [2/10], Loss: 0.0152


epoch [3/10]: 0it [00:00, ?it/s]

Epoch: [3/10], Loss: 0.0146


epoch [4/10]: 0it [00:00, ?it/s]

Epoch: [4/10], Loss: 0.0144


epoch [5/10]: 0it [00:00, ?it/s]

Epoch: [5/10], Loss: 0.0142


epoch [6/10]: 0it [00:00, ?it/s]

Epoch: [6/10], Loss: 0.0141


epoch [7/10]: 0it [00:00, ?it/s]

Epoch: [7/10], Loss: 0.0141


epoch [8/10]: 0it [00:00, ?it/s]

Epoch: [8/10], Loss: 0.0140


epoch [9/10]: 0it [00:00, ?it/s]

Epoch: [9/10], Loss: 0.0140


epoch [10/10]: 0it [00:00, ?it/s]

Epoch: [10/10], Loss: 0.0139


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.90


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.88
Training classification probe
Creating model...
Number of training instances: 29031
Number of classes: 9


epoch [1/10]: 0it [00:00, ?it/s]

Epoch: [1/10], Loss: 0.0142


epoch [2/10]: 0it [00:00, ?it/s]

Epoch: [2/10], Loss: 0.0107


epoch [3/10]: 0it [00:00, ?it/s]

Epoch: [3/10], Loss: 0.0103


epoch [4/10]: 0it [00:00, ?it/s]

Epoch: [4/10], Loss: 0.0101


epoch [5/10]: 0it [00:00, ?it/s]

Epoch: [5/10], Loss: 0.0100


epoch [6/10]: 0it [00:00, ?it/s]

Epoch: [6/10], Loss: 0.0099


epoch [7/10]: 0it [00:00, ?it/s]

Epoch: [7/10], Loss: 0.0099


epoch [8/10]: 0it [00:00, ?it/s]

Epoch: [8/10], Loss: 0.0098


epoch [9/10]: 0it [00:00, ?it/s]

Epoch: [9/10], Loss: 0.0098


epoch [10/10]: 0it [00:00, ?it/s]

Epoch: [10/10], Loss: 0.0098


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.94


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.93
Training classification probe
Creating model...
Number of training instances: 29031
Number of classes: 9


epoch [1/10]: 0it [00:00, ?it/s]

Epoch: [1/10], Loss: 0.0124


epoch [2/10]: 0it [00:00, ?it/s]

Epoch: [2/10], Loss: 0.0093


epoch [3/10]: 0it [00:00, ?it/s]

Epoch: [3/10], Loss: 0.0089


epoch [4/10]: 0it [00:00, ?it/s]

Epoch: [4/10], Loss: 0.0087


epoch [5/10]: 0it [00:00, ?it/s]

Epoch: [5/10], Loss: 0.0086


epoch [6/10]: 0it [00:00, ?it/s]

Epoch: [6/10], Loss: 0.0086


epoch [7/10]: 0it [00:00, ?it/s]

Epoch: [7/10], Loss: 0.0086


epoch [8/10]: 0it [00:00, ?it/s]

Epoch: [8/10], Loss: 0.0085


epoch [9/10]: 0it [00:00, ?it/s]

Epoch: [9/10], Loss: 0.0085


epoch [10/10]: 0it [00:00, ?it/s]

Epoch: [10/10], Loss: 0.0085


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.96


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.95


In [66]:
print('Layer     : ', '      '.join([str(l) for l in layers_to_evaluate]))
print('Train acc.: ','   '.join([str(l['__OVERALL__'])[:4] for l in train_results]))
print('Valid acc.: ','   '.join([str(l['__OVERALL__'])[:4] for l in valid_results]))


Layer     :  0      2      4
Train acc.:  0.89   0.94   0.95
Valid acc.:  0.88   0.93   0.94


# Get Neuron Ranking

The following code snippets creates a ranking of all neurons (of all dimensions of the representation). Then, a probing classifier is trained on only the neurons that are most salient according to the ranking algorithm. 


In [69]:
ordering, cutoffs = linear_probe.get_neuron_ordering(probe, label2idx)

  0%|          | 0/101 [00:00<?, ?it/s]

In [107]:
ordering[:20]

[3489,
 3746,
 2590,
 584,
 3113,
 3438,
 3442,
 3357,
 4094,
 3681,
 3618,
 2435,
 3593,
 4330,
 3254,
 3326,
 4099,
 867,
 1544,
 3176]

# Train on top N neurons

In [79]:
N=20

In [74]:
X_selected = ablation.filter_activations_keep_neurons(X, ordering[:N])
X_selected_valid = ablation.filter_activations_keep_neurons(X_valid, ordering[:N])

In [72]:
X_selected.shape

(29031, 20)

In [73]:
probe_selected = linear_probe.train_logistic_regression_probe(X_selected, y, lambda_l1=0.001, lambda_l2=0.001)

Training classification probe
Creating model...
Number of training instances: 29031
Number of classes: 9


epoch [1/10]: 0it [00:00, ?it/s]

Epoch: [1/10], Loss: 0.0436


epoch [2/10]: 0it [00:00, ?it/s]

Epoch: [2/10], Loss: 0.0258


epoch [3/10]: 0it [00:00, ?it/s]

Epoch: [3/10], Loss: 0.0215


epoch [4/10]: 0it [00:00, ?it/s]

Epoch: [4/10], Loss: 0.0195


epoch [5/10]: 0it [00:00, ?it/s]

Epoch: [5/10], Loss: 0.0185


epoch [6/10]: 0it [00:00, ?it/s]

Epoch: [6/10], Loss: 0.0180


epoch [7/10]: 0it [00:00, ?it/s]

Epoch: [7/10], Loss: 0.0176


epoch [8/10]: 0it [00:00, ?it/s]

Epoch: [8/10], Loss: 0.0174


epoch [9/10]: 0it [00:00, ?it/s]

Epoch: [9/10], Loss: 0.0173


epoch [10/10]: 0it [00:00, ?it/s]

Epoch: [10/10], Loss: 0.0172


In [76]:
print('Acc. on training data: ')
linear_probe.evaluate_probe(probe_selected, X_selected, y, idx_to_class=idx2label)

Acc. on training data: 


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.86


{'__OVERALL__': 0.8557404154180015,
 'O': 0.9938044206296048,
 'B-ORG': 0.051829268292682924,
 'B-PER': 0.25830258302583026,
 'I-LOC': 0.03289473684210526,
 'B-MISC': 0.21837088388214904,
 'I-ORG': 0.007009345794392523,
 'B-LOC': 0.38326693227091635,
 'I-PER': 0.21997471554993678,
 'I-MISC': 0.0}

In [78]:
print('Acc. on validation data: ')
linear_probe.evaluate_probe(probe_selected, X_selected_valid, y_valid, idx_to_class=idx2label)

Acc. on validation data: 


Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.82


{'__OVERALL__': 0.8189888074102664,
 'O': 0.9945598417408507,
 'B-ORG': 0.025210084033613446,
 'B-PER': 0.2540983606557377,
 'I-LOC': 0.0,
 'B-MISC': 0.07407407407407407,
 'I-ORG': 0.0,
 'B-LOC': 0.4,
 'I-PER': 0.22522522522522523,
 'I-MISC': 0.0}

# Control task

Implementation of the sequence labeling control task as described by [Hewitt and Liang (2019)](https://aclanthology.org/D19-1275.pdf).
The control task is used to assess the selectivity of a probe. Selectivity is the performance difference between the probe and performance on a control task with random, word-based labels. It quantifies how well a probe memorizes the task based on word types. 

In [91]:
import neurox.data.control_task as ct
import neurox.interpretation.utils as utils

In [95]:
# prepare control task labels based on the training data
[ct_tokens, ct_valid_tokens] = ct.create_sequence_labeling_dataset(tokens, dev_source=valid_tokens['source'], sample_from='uniform')

In [108]:
# example from the dataset: 
print([s+'/'+str(t) for s,t in zip(ct_tokens['source'][0], ct_tokens['target'][0])])

['EU/7', 'rejects/2', 'German/1', 'call/4', 'to/3', 'boycott/3', 'British/4', 'lamb/6', './6']


In [99]:
# control task tensors
X_ct, y_ct, mapping_ct = utils.create_tensors(ct_tokens, activations, 'NN')
label2idx_ct, idx2label_ct, src2idx_ct, idx2src_ct = mapping_ct
X_valid_ct, y_valid_ct, mapping_ct = utils.create_tensors(ct_valid_tokens, valid_activations, 'NN', mappings=mapping_ct)

Number of tokens:  29031
length of source dictionary:  7020
length of target dictionary:  9
29031
Total instances: 29031
['Scorer', 'Social', 'RACING', 'promised', 'evacuated', 'second-seeded', 'weapon', 'scores', 'weighed', 'Sales', '2-1', 'writers', 'draft', 'audience', 'landed', 'Northamptonshire', 'steps', '6-7(3-7', 'risk', 'championships']
Number of samples:  29031
Stats: Labels with their frequencies in the final set
0 2636
1 3500
2 2300
3 3954
4 2954
5 4461
6 3275
7 3518
8 2433
Number of tokens:  2591
length of source dictionary:  7020
length of target dictionary:  9
2591
Total instances: 2591
['471', '199', 'Medvedev', 'silence', 'U.S.', 'CAPTAIN', '31st', 'Africa', 'Glenn', '56', 'feel', '6-4', 'scores', '426', 'Test', 'CHAMPIONSHIP', 'taking', '372', '2-1', 'Northamptonshire']
Number of samples:  2591
Stats: Labels with their frequencies in the final set
0 250
1 235
2 207
3 395
4 265
5 338
6 311
7 386
8 204


In [100]:
ct_probe = linear_probe.train_logistic_regression_probe(X, y_ct, lambda_l1=0.001, lambda_l2=0.001)

Training classification probe
Creating model...
Number of training instances: 29031
Number of classes: 9


epoch [1/10]: 0it [00:00, ?it/s]

Epoch: [1/10], Loss: 0.0565


epoch [2/10]: 0it [00:00, ?it/s]

Epoch: [2/10], Loss: 0.0539


epoch [3/10]: 0it [00:00, ?it/s]

Epoch: [3/10], Loss: 0.0531


epoch [4/10]: 0it [00:00, ?it/s]

Epoch: [4/10], Loss: 0.0528


epoch [5/10]: 0it [00:00, ?it/s]

Epoch: [5/10], Loss: 0.0526


epoch [6/10]: 0it [00:00, ?it/s]

Epoch: [6/10], Loss: 0.0525


epoch [7/10]: 0it [00:00, ?it/s]

Epoch: [7/10], Loss: 0.0524


epoch [8/10]: 0it [00:00, ?it/s]

Epoch: [8/10], Loss: 0.0524


epoch [9/10]: 0it [00:00, ?it/s]

Epoch: [9/10], Loss: 0.0523


epoch [10/10]: 0it [00:00, ?it/s]

Epoch: [10/10], Loss: 0.0523


In [110]:
ct_scores = linear_probe.evaluate_probe(ct_probe, X_valid_ct, y_valid_ct, idx_to_class=idx2label_ct)
selectivity = scores['__OVERALL__'] - ct_scores['__OVERALL__']
print('Selectivity (Diff. between true task and probing task performance on validation data): ', selectivity)

Evaluating: 0it [00:00, ?it/s]

Score (accuracy) of the probe: 0.56
Selectivity (Diff. between true task and probing task performance on validation data):  0.3708992666923967
