## Load the `textAttack` models as input models and inspect them

Note: We use TFAutoModelForSequenceClassification instead of AutoModelForSequenceClassification because
otherwise we get the following message:
```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("textattack/roberta-base-RTE")
model = AutoModelForSequenceClassification.from_pretrained("textattack/roberta-base-RTE")
```
```
Some weights of the model checkpoint at textattack/roberta-base-RTE were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
```

See https://github.com/huggingface/transformers/blob/2fc33ebead50383f7707b17f0e2a178d86347d10/src/transformers/models/roberta/modeling_roberta.py#L1157-L1246


In [1]:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

model_identifers = [
    "textattack/roberta-base-MNLI",
    "textattack/roberta-base-RTE"
]

def get_model_and_tokenizer(identifier):
    tokenizer = AutoTokenizer.from_pretrained(identifier)
    model = TFAutoModelForSequenceClassification.from_pretrained(identifier, from_pt=True)
    return model, tokenizer

input_models = {
    identifier: get_model_and_tokenizer(identifier)[0] for identifier in model_identifers
}
# https://github.com/huggingface/transformers/blob/2fc33ebead50383f7707b17f0e2a178d86347d10/src/transformers/models/roberta/modeling_tf_roberta.py#L1237-L1303



  from .autonotebook import tqdm as notebook_tqdm
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


## Create a target model, i.e. the Blank Model

This model will have the final architecture we want to use. This means the classification layer will need to output the number of classes for the task we want to perform.

In [2]:
# Instead of using an actual blank model, we'll use "textattack/roberta-base-RTE" as our blank model
# and overwrite it.
target_model = get_model_and_tokenizer("textattack/roberta-base-RTE")[0]

All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


In [3]:
# Show the model architecture
target_model.summary()

Model: "tf_roberta_for_sequence_classification_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 roberta (TFRobertaMainLaye  multiple                  124055040 
 r)                                                              
                                                                 
 classifier (TFRobertaClass  multiple                  592130    
 ificationHead)                                                  
                                                                 
Total params: 124647170 (475.49 MB)
Trainable params: 124647170 (475.49 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [4]:
for weight_obj in target_model.roberta.weights:
    print(weight_obj.name, weight_obj.shape)

tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/query/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/query/bias:0 (768,)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/key/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/key/bias:0 (768,)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/value/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/value/bias:0 (768,)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/output/dense/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/output/dense/bias:0 (768,)
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/output/LayerNorm/gamma:0 (768,)
tf_roberta_for_sequence_classification_2/roberta/en

## Get the weight sections corresponding to layers by the layer number within a model

In [5]:
def _get_layer_to_weights_map(model):
    from collections import defaultdict
    import re

    layer_to_weights_map = defaultdict(dict)
    for weight in model.weights:
        matches = re.findall(r'/layer_._(\d+)/', weight.name)
        if not matches:
            continue

        layer_number = int(matches[0])
        layer_to_weights_map[layer_number][weight.name.partition(f"/layer_._{layer_number}/")[-1]] = weight

    return {
        layer_number: dict(weights)
        for layer_number, weights in layer_to_weights_map.items()
    }


from pprint import pprint

for layer_number, weights in _get_layer_to_weights_map(target_model).items():
    for weight_name, weight_object in weights.items():
        print(f"Layer: {layer_number} -> weight.name suffix: {weight_name}")


Layer: 0 -> weight.name suffix: attention/self/query/kernel:0
Layer: 0 -> weight.name suffix: attention/self/query/bias:0
Layer: 0 -> weight.name suffix: attention/self/key/kernel:0
Layer: 0 -> weight.name suffix: attention/self/key/bias:0
Layer: 0 -> weight.name suffix: attention/self/value/kernel:0
Layer: 0 -> weight.name suffix: attention/self/value/bias:0
Layer: 0 -> weight.name suffix: attention/output/dense/kernel:0
Layer: 0 -> weight.name suffix: attention/output/dense/bias:0
Layer: 0 -> weight.name suffix: attention/output/LayerNorm/gamma:0
Layer: 0 -> weight.name suffix: attention/output/LayerNorm/beta:0
Layer: 0 -> weight.name suffix: intermediate/dense/kernel:0
Layer: 0 -> weight.name suffix: intermediate/dense/bias:0
Layer: 0 -> weight.name suffix: output/dense/kernel:0
Layer: 0 -> weight.name suffix: output/dense/bias:0
Layer: 0 -> weight.name suffix: output/LayerNorm/gamma:0
Layer: 0 -> weight.name suffix: output/LayerNorm/beta:0
Layer: 1 -> weight.name suffix: attention/

In [6]:
def assign_weights_from_one_layer_to_another(source_model, target_model, source_layer_number, target_layer_number):

    # This part is recalculated often, but it's fast. In the future we could 
    # cache it in a class as a cached property, but we'll leave it here for now.
    target_model_layer_to_weights_map = _get_layer_to_weights_map(target_model)
    source_model_layer_to_weights_map = _get_layer_to_weights_map(source_model)

    # Get the layer objects
    source_layer = source_model_layer_to_weights_map[source_layer_number]
    target_layer = target_model_layer_to_weights_map[target_layer_number]

    # Make sure that all the suffixes match
    assert set(source_layer.keys()) == set(target_layer.keys())

    # Make sure that all the shapes match
    for weight_name, weight_object in source_layer.items():
        assert weight_object.shape == target_layer[weight_name].shape
    
    # Assign weights from one layer to another
    for weight_name, weight_object in source_layer.items():
        target_layer[weight_name].assign(weight_object.numpy())


In [7]:
# Before assigning
print(target_model.weights[0].name)
print(target_model.weights[0].numpy()[:10, 0])

# tf_roberta_for_sequence_classification_4/roberta/encoder/layer_._0/attention/self/query/kernel:0
# [ 0.07277144 -0.00318369 -0.09025939 -0.00312138 -0.09253632  0.18560258
#  -0.04307384  0.02022669  0.00224997  0.06398872]

tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/query/kernel:0
[ 0.07277144 -0.00318369 -0.09025939 -0.00312138 -0.09253632  0.18560258
 -0.04307384  0.02022669  0.00224997  0.06398872]


In [9]:
source_model = input_models["textattack/roberta-base-MNLI"]
target_model = target_model

assign_weights_from_one_layer_to_another(
    source_model=source_model,
    target_model=target_model,
    source_layer_number=0,
    target_layer_number=0
)

# Where we copied from
print(source_model.weights[0].name)
print(source_model.weights[0].numpy()[:10, 0])

# After assigning
print(target_model.weights[0].name)
print(target_model.weights[0].numpy()[:10, 0])

tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/self/query/kernel:0
[ 0.07214946 -0.00097258 -0.0901752  -0.0041216  -0.09332298  0.18340059
 -0.04258322  0.0205772  -0.00232515  0.06097307]
tf_roberta_for_sequence_classification_2/roberta/encoder/layer_._0/attention/self/query/kernel:0
[ 0.07214946 -0.00097258 -0.0901752  -0.0041216  -0.09332298  0.18340059
 -0.04258322  0.0205772  -0.00232515  0.06097307]


## Define a configuration for the target model using the donor models

We can extend the configuration to accept more than just "SingleLayer" layer assignment types. We can also add Isometric, Fisher, etc. types that incorporate the weights from multiple layers and put them into a single layer.

In [10]:
# Let's create a config for a perfect weave, i.e. ABABABABABAB
model_weaving_config = {
    # The task (i.e. the classification head should match the task at hand)
    "target_model_template": "textattack/roberta-base-RTE",
    # layer assignments
    "layer_assignments": [
        {
            "type": "SingleLayer",
            "params": {
                "donor": "textattack/roberta-base-MNLI" if (i % 2 == 0) else "textattack/roberta-base-RTE",
                "hidden_layer_number": i
            },
        } for i in range(12)
    ],
}

model_weaving_config


{'target_model_template': 'textattack/roberta-base-RTE',
 'layer_assignments': [{'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-MNLI',
    'hidden_layer_number': 0}},
  {'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-RTE',
    'hidden_layer_number': 1}},
  {'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-MNLI',
    'hidden_layer_number': 2}},
  {'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-RTE',
    'hidden_layer_number': 3}},
  {'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-MNLI',
    'hidden_layer_number': 4}},
  {'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-RTE',
    'hidden_layer_number': 5}},
  {'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-MNLI',
    'hidden_layer_number': 6}},
  {'type': 'SingleLayer',
   'params': {'donor': 'textattack/roberta-base-RTE',
    'hidden_layer_number': 7}},
  {'type': 'SingleLaye

In [11]:
def weave_models(target_model_template, layer_assignments):
    target_model, target_tokenizer = get_model_and_tokenizer(target_model_template)

    source_model_names = set(
        layer_assignment["params"]["donor"]
        for layer_assignment in layer_assignments
    )
    source_models = {
        source_model_name: get_model_and_tokenizer(source_model_name)[0]
        for source_model_name in source_model_names
    }

    for layer_assignment in layer_assignments:
        if layer_assignment["type"] == "SingleLayer":
            assign_weights_from_one_layer_to_another(
                source_model=source_models[layer_assignment["params"]["donor"]],
                target_model=target_model,
                source_layer_number=layer_assignment["params"]["hidden_layer_number"],
                target_layer_number=layer_assignment["params"]["hidden_layer_number"]
            )
        else:
            raise NotImplementedError(f"Unknown layer assignment type: {layer_assignment['type']}")

    return target_model, target_tokenizer


    

In [12]:
weaved_model, weaved_model_tokenizer = weave_models(**model_weaving_config)

All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can 

In [13]:
weaved_model.summary()

Model: "tf_roberta_for_sequence_classification_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 roberta (TFRobertaMainLaye  multiple                  124055040 
 r)                                                              
                                                                 
 classifier (TFRobertaClass  multiple                  592130    
 ificationHead)                                                  
                                                                 
Total params: 124647170 (475.49 MB)
Trainable params: 124647170 (475.49 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [16]:
weaved_model.save_pretrained("data/weaved_model")

In [17]:
! du -h ./data/weaved_model

476M	./data/weaved_model


## Evaluate the weaved the model's performance

TODO