# Thoughts on the blank model

The weaving-prototype notebook loads the model it will overwrite in the same way it loads the source models.

See below how we use the "textattack/roberta-base-RTE" architecture as a "blank" model.


In [1]:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

def get_model_and_tokenizer(identifier):
    tokenizer = AutoTokenizer.from_pretrained(identifier)
    model = TFAutoModelForSequenceClassification.from_pretrained(identifier, from_pt=True)
    return model, tokenizer

target_model = get_model_and_tokenizer("textattack/roberta-base-RTE")[0]

  from .autonotebook import tqdm as notebook_tqdm
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


In [2]:
# Show the model architecture
target_model.summary()

Model: "tf_roberta_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 roberta (TFRobertaMainLaye  multiple                  124055040 
 r)                                                              
                                                                 
 classifier (TFRobertaClass  multiple                  592130    
 ificationHead)                                                  
                                                                 
Total params: 124647170 (475.49 MB)
Trainable params: 124647170 (475.49 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [3]:
# Show all the weights, from both the roberta and the classifier part of the model
for weight_obj in target_model.roberta.weights:
    print(weight_obj.name, weight_obj.shape)

tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/self/query/kernel:0 (768, 768)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/self/query/bias:0 (768,)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/self/key/kernel:0 (768, 768)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/self/key/bias:0 (768,)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/self/value/kernel:0 (768, 768)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/self/value/bias:0 (768,)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/output/dense/kernel:0 (768, 768)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/output/dense/bias:0 (768,)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/attention/output/LayerNorm/gamma:0 (768,)
tf_roberta_for_sequence_classification/roberta/encoder/layer_._0/atte

In [4]:
type(target_model)

transformers.models.roberta.modeling_tf_roberta.TFRobertaForSequenceClassification

## Can we create a blank model more generally?

Take a look at the class's code:

https://github.com/huggingface/transformers/blob/2fc33ebead50383f7707b17f0e2a178d86347d10/src/transformers/models/roberta/modeling_tf_roberta.py#L1237-L1246

See
```
    def __init__(self, config, *inputs, **kwargs):
        super().__init__(config, *inputs, **kwargs)
        self.num_labels = config.num_labels

        self.roberta = TFRobertaMainLayer(config, add_pooling_layer=False, name="roberta")
        self.classifier = TFRobertaClassificationHead(config, name="classifier")
```

It seems like we might be able to instantiate one with the right value for `config`.

For instance, we might be able to specify something other than 12 hidden layers.

In [5]:
# Let's look at the config for target_model

target_model.config

RobertaConfig {
  "_name_or_path": "textattack/roberta-base-RTE",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "finetuning_task": "glue:rte",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.35.0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

In [8]:
# Awesome! These look like the parameters that we'll need, except possibly _name_or_path and maybe architecures
# Let's try to create a new model with these parameters

from transformers import RobertaConfig

new_config_dict = target_model.config.to_dict()

new_config_dict["_name_or_path"] = "awesome_cs194_group/blank-model"

new_config = RobertaConfig(**new_config_dict)
new_config



RobertaConfig {
  "_name_or_path": "awesome_cs194_group/blank-model",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "finetuning_task": "glue:rte",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.35.0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

In [18]:
from transformers.models.roberta.modeling_tf_roberta import TFRobertaForSequenceClassification


blank_model = TFRobertaForSequenceClassification(new_config)
blank_model.summary()

ValueError: This model has not yet been built. Build the model first by calling `build()` or by calling the model on a batch of data.

In [19]:
# Oh, let's run .build first

blank_model.build()
blank_model

<transformers.models.roberta.modeling_tf_roberta.TFRobertaForSequenceClassification at 0x344cff460>

In [20]:
for item in blank_model.weights:
    print(item.name, item.shape)

tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/self/query/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/self/query/bias:0 (768,)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/self/key/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/self/key/bias:0 (768,)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/self/value/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/self/value/bias:0 (768,)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/output/dense/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/output/dense/bias:0 (768,)
tf_roberta_for_sequence_classification_6/roberta/encoder/layer_._0/attention/output/LayerNorm/gamma:0 (768,)
tf_roberta_for_sequence_classification_6/roberta/en

Nice!!

## Can we create a blank model with a different number of layers?

In [21]:
from transformers.models.roberta.modeling_tf_roberta import TFRobertaForSequenceClassification


new_config_dict_15_layer = target_model.config.to_dict()
new_config_dict_15_layer["num_hidden_layers"] = 15

new_config_15_layer = RobertaConfig(**new_config_dict_15_layer)

blank_model_15_layer = TFRobertaForSequenceClassification(new_config_15_layer)
blank_model_15_layer.build()
blank_model_15_layer.summary()

Model: "tf_roberta_for_sequence_classification_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 roberta (TFRobertaMainLaye  multiple                  145318656 
 r)                                                              
                                                                 
 classifier (TFRobertaClass  multiple                  592130    
 ificationHead)                                                  
                                                                 
Total params: 145910786 (556.61 MB)
Trainable params: 145910786 (556.61 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [23]:
for item in blank_model_15_layer.weights:
    print(item.name, item.shape)

tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/self/query/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/self/query/bias:0 (768,)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/self/key/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/self/key/bias:0 (768,)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/self/value/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/self/value/bias:0 (768,)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/output/dense/kernel:0 (768, 768)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/output/dense/bias:0 (768,)
tf_roberta_for_sequence_classification_7/roberta/encoder/layer_._0/attention/output/LayerNorm/gamma:0 (768,)
tf_roberta_for_sequence_classification_7/roberta/en

Is that it? Are we missing something?