# End-to-end notebook!

Here is the workflow:
* Sampling configs (sampling parameters, etc.) lead to...
* Weaving configs (blank model settings, donor model settings, layer assignments) lead to...
* Models (probably TFRobertaForSequenceClassification in all cases) lead to...
* Performance scores (numbers from 0-100)

In [5]:
# install dependencies

! pip install -q joblib  # joblib for memoizing functions
! pip install -q ipywidgets widgetsnbextension pandas-profiling # IProgress for progress bars

# ! pip install -q tensorflow==2.13.0 tensorflow-datasets==4.9.2 tensorflow-probability==0.21.0 transformers==4.35.0  datasets==2.14.6 torch==2.1.0 scipy==1.10.1 scikit-learn==1.3.2

[0m

In [2]:
# Add model_merging to the python path

import os
import sys

model_merging_base = os.path.abspath("../model_merging/")
# assert it exist
assert os.path.exists(model_merging_base)
if model_merging_base not in sys.path:
    sys.path.append(model_merging_base)

In [3]:
# import joblib for caching and distributed computing
from math import sqrt

from joblib import Memory, Parallel, delayed

memory = Memory(location="cache", verbose=10)

parallel = Parallel(n_jobs=2, return_as="generator")
output_generator = parallel(delayed(sqrt)(i**2) for i in range(10))

In [4]:
# Imports and cached functions

import os

from llm_weaver import (
    calculate_score_from_weaving_config,
    test_weaver,
)

# Disable parallelism in tokenizers to avoid deadlocks
os.environ["TOKENIZERS_PARALLELISM"] = "false"

calculate_score_from_weaving_config_cached = memory.cache(
    calculate_score_from_weaving_config
)
test_weaver_cached = memory.cache(test_weaver)

## Step 0: Get RTE scores

* RTE vanilla
* RTE isotropically merged with MNLI score with a weight chosen properly
* RTE fisher merge with MNLI with a weight chosen properly
* replacing with certain layers?
* Shifting?

## Steps: configs to graph


In [5]:
model_id = "textAttack/roberta-base-RTE"

In [9]:
import pandas as pd
from llm_weaver import dict_overwrite, get_model_config, normalize_glue_task_name

model_id = "textAttack/roberta-base-RTE"

mnli_model_id = "textAttack/roberta-base-MNLI"


def RTEVanilla(model_id):
    num_layers = get_model_config(model_id)["num_hidden_layers"]
    layer_assignments = [
        {
            "type": "SingleLayer",
            "params": {
                "donor": model_id,
                "hidden_layer_number": i,
            },
        }
        for i in range(num_layers)
    ]

    blank_model_config = dict_overwrite(
        get_model_config(mnli_model_id),
        {
            "num_hidden_layers": len(layer_assignments),
        },
    )
    config = {
        "glue_task": normalize_glue_task_name(mnli_model_id),
        "tokenizer_model_id": model_id,
        "blank_model_config": blank_model_config,
        "layer_assignments": layer_assignments,
        "classification_head": {
            "type": "SingleClassificationHead",
            "params": {
                "donor": mnli_model_id,
            },
        },
        "embeddings": {
            "type": "SingleEmbeddings",
            "params": {
                "donor": model_id,
            },
        },
    }

    yield config


weave_configs = list(RTEVanilla(model_id))

scores = Parallel(n_jobs=5, return_as="list")(
    delayed(calculate_score_from_weaving_config_cached)(
        weave_config,
        # n_examples=4096,
        n_examples=128,
        split="validation",
    )
    for weave_config in weave_configs
)
accuracies = [score["accuracy"] for score in scores]

records = []
for weave_config, accuracy in zip(weave_configs, accuracies):
    record = {}
    record["name"] = "RTEVanilla"
    record["accuracy"] = accuracy
    records.append(record)
df_rte_vanilla = pd.DataFrame.from_records(records)
df_rte_vanilla

All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validation')
calculating score for weaving config md5sum: 58dcf2497ce3acdd7774b984a956f0

All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


Loading textAttack/roberta-base-RTE


All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
  return hfds.load_metric("glue", task)
2023-11-28 15:36:58.272709: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


_____________________________calculate_score_from_weaving_config - 11.5s, 0.2min


Unnamed: 0,name,accuracy
0,RTEVanilla,0.28125


In [10]:
import pandas as pd
from llm_weaver import dict_overwrite, get_model_config, normalize_glue_task_name

model_id = "textAttack/roberta-base-RTE"


def RTEMNLIIsotropic(model_id):
    for alpha in [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 0.8, 0.9, 1.0]:
        num_layers = get_model_config(model_id)["num_hidden_layers"]
        layer_assignments = [
            {
                "type": "IsotropicLinearCombination",
                "params": {
                    "donors": [
                        {"donor": model_id, "hidden_layer_number": i, "weight": alpha},
                        {
                            "donor": "textAttack/roberta-base-MNLI",
                            "hidden_layer_number": i,
                            "weight": 1.0 - alpha,
                        },
                    ]
                },
            }
            for i in range(num_layers)
        ]

        blank_model_config = dict_overwrite(
            get_model_config(mnli_model_id),
            {
                "num_hidden_layers": len(layer_assignments),
            },
        )
        config = {
            "glue_task": normalize_glue_task_name(mnli_model_id),
            "tokenizer_model_id": model_id,
            "blank_model_config": blank_model_config,
            "layer_assignments": layer_assignments,
            "classification_head": {
                "type": "SingleClassificationHead",
                "params": {
                    "donor": mnli_model_id,
                },
            },
            "embeddings": {
                "type": "SingleEmbeddings",
                "params": {
                    "donor": model_id,
                },
            },
        }

        yield config


weave_configs = list(RTEMNLIIsotropic(model_id))

scores = Parallel(n_jobs=5, return_as="list")(
    delayed(calculate_score_from_weaving_config_cached)(
        weave_config,
        # n_examples=4096,
        n_examples=128,
        split="validation",
    )
    for weave_config in weave_configs
)
accuracies = [score["accuracy"] for score in scores]

records = []
for weave_config, accuracy in zip(weave_configs, accuracies):
    record = {}
    record["name"] = "RTEMNLIIsotropic"
    record["accuracy"] = accuracy
    records.append(record)
df_rte_vanilla = pd.DataFrame.from_records(records)
df_rte_vanilla

________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validation')
calculating score for weaving config md5sum: 7ce5474470e0b49df9a00a9c3b6cb3



________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validation')
calculating score for weaving config md5sum: 5ab9b2c051949c9807ce9a856c41d9

All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can 

Loading textAttack/roberta-base-MNLI
Loading textAttack/roberta-base-MNLI
Loading textAttack/roberta-base-MNLI
Loading textAttack/roberta-base-RTE


2023-11-28 15:37:11.065644: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to 

_____________________________calculate_score_from_weaving_config - 14.5s, 0.2min
________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validat

2023-11-28 15:37:26.535472: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:37:27.864871: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:37:27.879499: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 30.4s, 0.5min
________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validat



_____________________________calculate_score_from_weaving_config - 26.5s, 0.4min


2023-11-28 15:37:49.578962: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:37:53.562019: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:37:53.751119: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 20.8s, 0.3min
_____________________________calculate_score_from_weaving_config - 22.7s, 0.4min
_____________________________calculate_score_from_weaving_config - 22.3s, 0.4min
_____________________________calculate_score_from_weaving_config - 22.5s, 0.4min


Unnamed: 0,name,accuracy
0,RTEMNLIIsotropic,0.273438
1,RTEMNLIIsotropic,0.257812
2,RTEMNLIIsotropic,0.25
3,RTEMNLIIsotropic,0.25
4,RTEMNLIIsotropic,0.296875
5,RTEMNLIIsotropic,0.3125
6,RTEMNLIIsotropic,0.328125
7,RTEMNLIIsotropic,0.335938
8,RTEMNLIIsotropic,0.351562
9,RTEMNLIIsotropic,0.28125


In [11]:
import pandas as pd
from llm_weaver import dict_overwrite, get_model_config, normalize_glue_task_name

model_id = "textAttack/roberta-base-RTE"


def RTEMNLIIsotropicMarenLayers(model_id):
    replacement_layers = [0, 1, 4, 11]
    for alpha in [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 0.8, 0.9, 1.0]:
        num_layers = get_model_config(model_id)["num_hidden_layers"]
        layer_assignments = [
            {
                "type": "IsotropicLinearCombination",
                "params": {
                    "donors": [
                        {
                            "donor": model_id,
                            "hidden_layer_number": i,
                            "weight": alpha if (i in replacement_layers) else 1.0,
                        },
                        {
                            "donor": "textAttack/roberta-base-MNLI",
                            "hidden_layer_number": i,
                            "weight": (1.0 - alpha)
                            if (i in replacement_layers)
                            else 0.0,
                        },
                    ]
                },
            }
            for i in range(num_layers)
        ]

        blank_model_config = dict_overwrite(
            get_model_config(mnli_model_id),
            {
                "num_hidden_layers": len(layer_assignments),
            },
        )
        config = {
            "glue_task": normalize_glue_task_name(mnli_model_id),
            "tokenizer_model_id": model_id,
            "blank_model_config": blank_model_config,
            "layer_assignments": layer_assignments,
            "classification_head": {
                "type": "SingleClassificationHead",
                "params": {
                    "donor": mnli_model_id,
                },
            },
            "embeddings": {
                "type": "SingleEmbeddings",
                "params": {
                    "donor": model_id,
                },
            },
        }

        yield config


weave_configs = list(RTEMNLIIsotropicMarenLayers(model_id))

scores = Parallel(n_jobs=5, return_as="list")(
    delayed(calculate_score_from_weaving_config_cached)(
        weave_config,
        # n_examples=4096,
        n_examples=128,
        split="validation",
    )
    for weave_config in weave_configs
)
accuracies = [score["accuracy"] for score in scores]

records = []
for weave_config, accuracy in zip(weave_configs, accuracies):
    record = {}
    record["name"] = "RTEMNLIIsotropicMarenLayers"
    record["accuracy"] = accuracy
    records.append(record)
df_rte_vanilla = pd.DataFrame.from_records(records)
df_rte_vanilla

________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validation')
calculating score for weaving config md5sum: 5dd70f99a6823d9489feb60104d892

2023-11-28 15:38:16.554929: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:38:17.865163: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:38:18.375917: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 24.8s, 0.4min
________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validat

2023-11-28 15:38:39.424580: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:38:43.070733: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:38:43.487188: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 20.9s, 0.3min
_____________________________calculate_score_from_weaving_config - 22.1s, 0.4min
_____________________________calculate_score_from_weaving_config - 21.3s, 0.4min
_____________________________calculate_score_from_weaving_config - 21.4s, 0.4min


Unnamed: 0,name,accuracy
0,RTEMNLIIsotropicMarenLayers,0.328125
1,RTEMNLIIsotropicMarenLayers,0.335938
2,RTEMNLIIsotropicMarenLayers,0.3125
3,RTEMNLIIsotropicMarenLayers,0.304688
4,RTEMNLIIsotropicMarenLayers,0.320312
5,RTEMNLIIsotropicMarenLayers,0.335938
6,RTEMNLIIsotropicMarenLayers,0.328125
7,RTEMNLIIsotropicMarenLayers,0.3125
8,RTEMNLIIsotropicMarenLayers,0.3125
9,RTEMNLIIsotropicMarenLayers,0.28125


In [12]:
import pandas as pd
from llm_weaver import dict_overwrite, get_model_config, normalize_glue_task_name

model_id = "textAttack/roberta-base-RTE"


def FisherAllLayers(model_id):
    for alpha in [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 0.8, 0.9, 1.0]:
        num_layers = get_model_config(model_id)["num_hidden_layers"]
        layer_assignments = [
            {
                "type": "ElementWiseLinearCombination",
                "params": {
                    "donors": [
                        {
                            "donor": model_id,
                            "hidden_layer_number": i,
                            "weight": alpha,
                            "element_wise_multiplier_filename": f"../data/fisher_info/{model_id.replace('/', '_')}-fisher-info.h5",
                        },
                        {
                            "donor": "textAttack/roberta-base-MNLI",
                            "hidden_layer_number": i,
                            "weight": 1.0 - alpha,
                            "element_wise_multiplier_filename": "../data/fisher_info/textAttack_roberta-base-MNLI-fisher-info.h5",
                        },
                    ],
                    "normalize": True,
                },
            }
            for i in range(num_layers)
        ]

        blank_model_config = dict_overwrite(
            get_model_config(mnli_model_id),
            {
                "num_hidden_layers": len(layer_assignments),
            },
        )
        config = {
            "glue_task": normalize_glue_task_name(mnli_model_id),
            "tokenizer_model_id": model_id,
            "blank_model_config": blank_model_config,
            "layer_assignments": layer_assignments,
            "classification_head": {
                "type": "SingleClassificationHead",
                "params": {
                    "donor": mnli_model_id,
                },
            },
            "embeddings": {
                "type": "SingleEmbeddings",
                "params": {
                    "donor": model_id,
                },
            },
        }

        yield config


weave_configs = list(FisherAllLayers(model_id))

scores = Parallel(n_jobs=5, return_as="list")(
    delayed(calculate_score_from_weaving_config_cached)(
        weave_config,
        # n_examples=4096,
        n_examples=129,
        split="validation",
    )
    for weave_config in weave_configs
)
accuracies = [score["accuracy"] for score in scores]

records = []
for weave_config, accuracy in zip(weave_configs, accuracies):
    record = {}
    record["name"] = "FisherAllLayers"
    record["accuracy"] = accuracy
    records.append(record)
df_rte_vanilla = pd.DataFrame.from_records(records)
df_rte_vanilla

________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=129, split='validation')
___________________________________________________________________________

2023-11-28 15:39:10.284102: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:39:11.468202: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:39:11.472509: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 30.0s, 0.5min
________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=129, split='validat

2023-11-28 15:39:37.400484: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:39:41.780369: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:39:41.791229: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 26.3s, 0.4min
_____________________________calculate_score_from_weaving_config - 28.4s, 0.5min
_____________________________calculate_score_from_weaving_config - 28.1s, 0.5min
_____________________________calculate_score_from_weaving_config - 28.6s, 0.5min
_____________________________calculate_score_from_weaving_config - 28.6s, 0.5min


Unnamed: 0,name,accuracy
0,FisherAllLayers,0.27907
1,FisherAllLayers,0.255814
2,FisherAllLayers,0.263566
3,FisherAllLayers,0.255814
4,FisherAllLayers,0.286822
5,FisherAllLayers,0.310078
6,FisherAllLayers,0.317829
7,FisherAllLayers,0.341085
8,FisherAllLayers,0.356589
9,FisherAllLayers,0.286822


In [13]:
import pandas as pd
from llm_weaver import dict_overwrite, get_model_config, normalize_glue_task_name

model_id = "textAttack/roberta-base-RTE"


def FisherMARENSLayers(model_id):
    replacement_layers = [0, 1, 4, 11]
    for alpha in [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 0.8, 0.9, 1.0]:
        num_layers = get_model_config(model_id)["num_hidden_layers"]
        layer_assignments = [
            (
                {
                    "type": "ElementWiseLinearCombination",
                    "params": {
                        "donors": [
                            {
                                "donor": model_id,
                                "hidden_layer_number": i,
                                "weight": alpha,
                                "element_wise_multiplier_filename": f"../data/fisher_info/{model_id.replace('/', '_')}-fisher-info.h5",
                            },
                            {
                                "donor": "textAttack/roberta-base-MNLI",
                                "hidden_layer_number": i,
                                "weight": 1.0 - alpha,
                                "element_wise_multiplier_filename": "../data/fisher_info/textAttack_roberta-base-MNLI-fisher-info.h5",
                            },
                        ],
                        "normalize": True,
                    },
                }
                if (i in replacement_layers)
                else {
                    "type": "SingleLayer",
                    "params": {
                        "donor": model_id,
                        "hidden_layer_number": i,
                    },
                }
            )
            for i in range(num_layers)
        ]

        blank_model_config = dict_overwrite(
            get_model_config(mnli_model_id),
            {
                "num_hidden_layers": len(layer_assignments),
            },
        )
        config = {
            "glue_task": normalize_glue_task_name(mnli_model_id),
            "tokenizer_model_id": model_id,
            "blank_model_config": blank_model_config,
            "layer_assignments": layer_assignments,
            "classification_head": {
                "type": "SingleClassificationHead",
                "params": {
                    "donor": mnli_model_id,
                },
            },
            "embeddings": {
                "type": "SingleEmbeddings",
                "params": {
                    "donor": model_id,
                },
            },
        }

        yield config


weave_configs = list(FisherMARENSLayers(model_id))

scores = Parallel(n_jobs=5, return_as="list")(
    delayed(calculate_score_from_weaving_config_cached)(
        weave_config,
        # n_examples=4096,
        n_examples=128,
        split="validation",
    )
    for weave_config in weave_configs
)
accuracies = [score["accuracy"] for score in scores]

records = []
for weave_config, accuracy in zip(weave_configs, accuracies):
    record = {}
    record["name"] = "FisherMARENSLayers"
    record["accuracy"] = accuracy
    records.append(record)
df_rte_vanilla = pd.DataFrame.from_records(records)
df_rte_vanilla

________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validation')
calculating score for weaving config md5sum: bb090244692bdc2828839a7ef33b05

2023-11-28 15:40:08.825775: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:40:09.911179: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:40:10.547794: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 26.6s, 0.4min
________________________________________________________________________________
[Memory] Calling llm_weaver.calculate_score_from_weaving_config...
calculate_score_from_weaving_config({ 'blank_model_config': { 'add_cross_attention': False,
                          'architectures': ['RobertaForSequenceClassification'],
                          'attention_probs_dropout_prob': 0.1,
                          'bad_words_ids': None,
                          'begin_suppress_tokens': None,
                          'bos_token_id': 0,
                          'chunk_size_feed_forward': 0,
                          'classifier_dropout': None,
                          'cross_attention_hidden_size': None,
                          'decoder_start_token_id': None,
                          'diversity_penalty': 0.0,
                          'do_sample': False,
                    ..., n_examples=128, split='validat

2023-11-28 15:40:35.276275: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:40:38.691635: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2023-11-28 15:40:40.704627: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. I

_____________________________calculate_score_from_weaving_config - 27.8s, 0.5min
_____________________________calculate_score_from_weaving_config - 28.8s, 0.5min
_____________________________calculate_score_from_weaving_config - 28.7s, 0.5min
_____________________________calculate_score_from_weaving_config - 28.7s, 0.5min
_____________________________calculate_score_from_weaving_config - 28.7s, 0.5min


Unnamed: 0,name,accuracy
0,FisherMARENSLayers,0.328125
1,FisherMARENSLayers,0.335938
2,FisherMARENSLayers,0.3125
3,FisherMARENSLayers,0.3125
4,FisherMARENSLayers,0.304688
5,FisherMARENSLayers,0.328125
6,FisherMARENSLayers,0.3125
7,FisherMARENSLayers,0.3125
8,FisherMARENSLayers,0.304688
9,FisherMARENSLayers,0.28125
