## Automodelを列挙していく

* https://engineering.mobalab.net/2021/01/29/bert%E3%81%AE%E3%83%A2%E3%83%87%E3%83%AB%E6%A7%8B%E9%80%A0%E3%82%92%E3%82%82%E3%81%86%E5%B0%91%E3%81%97%E8%A9%B3%E3%81%97%E3%81%8F/

In [20]:
import torch
from transformers import AutoConfig, AutoModel
from torchinfo import summary

#model_name = "cl-tohoku/bert-base-japanese-whole-word-masking"
model_name = "cl-tohoku/bert-large-japanese"

### AutoModel (bare)

In [21]:
m1 = AutoModel.from_pretrained(model_name)

Some weights of the model checkpoint at cl-tohoku/bert-large-japanese were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [22]:
batch_size = 32
max_length = 512
summary(m1, input_size=(batch_size, max_length), dtypes=[torch.int, torch.long], depth=2)

Layer (type:depth-idx)                             Output Shape              Param #
BertModel                                          [32, 1024]                --
├─BertEmbeddings: 1-1                              [32, 512, 1024]           --
│    └─Embedding: 2-1                              [32, 512, 1024]           33,554,432
│    └─Embedding: 2-2                              [32, 512, 1024]           2,048
│    └─Embedding: 2-3                              [1, 512, 1024]            524,288
│    └─LayerNorm: 2-4                              [32, 512, 1024]           2,048
│    └─Dropout: 2-5                                [32, 512, 1024]           --
├─BertEncoder: 1-2                                 [32, 512, 1024]           --
│    └─ModuleList: 2-6                             --                        302,309,376
├─BertPooler: 1-3                                  [32, 1024]                --
│    └─Linear: 2-7                                 [32, 1024]                1,049,600


### AutoModelForClassification
* Bareの後ろにDropout-Linearをひっつけてるだけやね

In [23]:
from transformers import AutoModelForSequenceClassification
m2 = AutoModelForSequenceClassification.from_pretrained(model_name)

Some weights of the model checkpoint at cl-tohoku/bert-large-japanese were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were 

In [24]:
summary(m2, input_size=(batch_size, max_length), dtypes=[torch.int, torch.long], depth=3)

Layer (type:depth-idx)                                  Output Shape              Param #
BertForSequenceClassification                           [32, 2]                   --
├─BertModel: 1-1                                        [32, 1024]                --
│    └─BertEmbeddings: 2-1                              [32, 512, 1024]           --
│    │    └─Embedding: 3-1                              [32, 512, 1024]           33,554,432
│    │    └─Embedding: 3-2                              [32, 512, 1024]           2,048
│    │    └─Embedding: 3-3                              [1, 512, 1024]            524,288
│    │    └─LayerNorm: 3-4                              [32, 512, 1024]           2,048
│    │    └─Dropout: 3-5                                [32, 512, 1024]           --
│    └─BertEncoder: 2-2                                 [32, 512, 1024]           --
│    │    └─ModuleList: 3-6                             --                        302,309,376
│    └─BertPooler: 2-3          

### AutoModelForCausalLM

* outputのテンソル形が違う
* This is a generic model class that will be instantiated as one of the model classes of the library.
    * (with a causal language modeling head)
        * CLM : Causal Language Model ... 左から右に順々に単語を予測するモデル, https://cl.asahi.com/api_data/language_model.html

In [25]:
from transformers import AutoModelForCausalLM
m3 = AutoModelForCausalLM.from_pretrained(model_name)

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Some weights of the model checkpoint at cl-tohoku/bert-large-japanese were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [26]:
summary(m3, input_size=(batch_size, max_length), dtypes=[torch.int, torch.long], depth=4)

Layer (type:depth-idx)                                  Output Shape              Param #
BertLMHeadModel                                         [32, 512, 32768]          --
├─BertModel: 1-1                                        [32, 512, 1024]           --
│    └─BertEmbeddings: 2-1                              [32, 512, 1024]           --
│    │    └─Embedding: 3-1                              [32, 512, 1024]           33,554,432
│    │    └─Embedding: 3-2                              [32, 512, 1024]           2,048
│    │    └─Embedding: 3-3                              [1, 512, 1024]            524,288
│    │    └─LayerNorm: 3-4                              [32, 512, 1024]           2,048
│    │    └─Dropout: 3-5                                [32, 512, 1024]           --
│    └─BertEncoder: 2-2                                 [32, 512, 1024]           289,713,152
│    │    └─ModuleList: 3-54                            --                        (recursive)
│    │    │    └─BertLa

### AutoModelForMaskedLM

In [27]:
from transformers import AutoModelForCausalLM
m4 = AutoModelForCausalLM.from_pretrained(model_name)

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Some weights of the model checkpoint at cl-tohoku/bert-large-japanese were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [28]:
summary(m4, input_size=(batch_size, max_length), dtypes=[torch.int, torch.long], depth=1)

Layer (type:depth-idx)                                  Output Shape              Param #
BertLMHeadModel                                         [32, 512, 32768]          --
├─BertModel: 1-1                                        [32, 512, 1024]           336,392,192
├─BertOnlyMLMHead: 1-48                                 --                        (recursive)
├─BertModel: 1-49                                       --                        (recursive)
├─BertOnlyMLMHead: 1-48                                 --                        (recursive)
├─BertModel: 1-49                                       --                        (recursive)
├─BertOnlyMLMHead: 1-48                                 --                        (recursive)
├─BertModel: 1-49                                       --                        (recursive)
├─BertOnlyMLMHead: 1-48                                 --                        (recursive)
├─BertModel: 1-49                                       --               