In [1]:
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"


from transformers import (
    AutoModel,
    AutoModelForMaskedLM,
    AutoModelForSequenceClassification,
    AutoModelForTokenClassification,
    AutoTokenizer,
)

tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")

In [2]:
sents = [
    "选择珠江花园的原因就是方便。",
    "笔记本的键盘确实爽。",
    "房间太小。其他的都一般。",
    "今天才知道这书还有第6卷,真有点郁闷.",
    "机器背面似乎被撕了张什么标签，残胶还在。",
]

# model的输入可以是多个参数，但也可以只有input_ids，形状必须是2维张量。
inputs = tokenizer(sents, padding=True, return_tensors="pt")

# AutoModel架构的输出

AutoModel这个架构只包含基本的 Transformer 模块：给定一些输入，它**输出我们称之为隐藏层的东西**，也称为features。对于每个模型输入，我们将得到一个高维向量，表示Transformer 模型对该输入的上下文理解，可以理解成是做了word embedding。

1. 输出为BaseModelOutputWithPoolingAndCrossAttentions。
2. 包含'last_hidden_state'和'pooler_output'两个元素。
3. 'last_hidden_state'的形状是（batch size,sequence length,768)
4. 'pooler_output'的形状是(batch size,768)
> pooler output是取[CLS]标记处对应的向量后面接个全连接再接tanh激活后的输出。
虽然这些隐藏状态本身就很有用，但它们通常是模型另一部分（称为head ）的输入。在pipeline那一节中，可以使用相同的体系结构执行不同的任务，是因为这些任务中的每一个都有与之关联的不同头。

In [3]:
model = AutoModel.from_pretrained("bert-base-chinese")
outputs = model(**inputs)
outputs.keys()
outputs.last_hidden_state.shape
outputs.pooler_output.shape

Some weights of the model checkpoint at bert-base-chinese were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


odict_keys(['last_hidden_state', 'pooler_output'])

torch.Size([5, 22, 768])

torch.Size([5, 768])

# AutoModelForMaskedLM架构的输出

1. 输出为MaskedLMOutput
2. 包含'logits'元素，形状为[batch size,sequence length,21128]，21128是'vocab_size'。

In [4]:
model = AutoModelForMaskedLM.from_pretrained("bert-base-chinese")
outputs = model(**inputs)
outputs.keys()
outputs.logits.shape

Some weights of the model checkpoint at bert-base-chinese were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


odict_keys(['logits'])

torch.Size([5, 22, 21128])

# AutoModelForSequenceClassification架构的输出

In [5]:
model = AutoModelForSequenceClassification.from_pretrained("bert-base-chinese")
outputs = model(**inputs)
outputs.keys()
outputs.logits.shape

Some weights of the model checkpoint at bert-base-chinese were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

odict_keys(['logits'])

torch.Size([5, 2])

# AutoModelForTokenClassification架构的输出

1. 输出为TokenClassifierOutput
2. 包含'logits'元素，形状为[batch size,sequence length,2]。

In [6]:
model = AutoModelForTokenClassification.from_pretrained("bert-base-chinese")
outputs = model(**inputs)
outputs.keys()
outputs.logits.shape

Some weights of the model checkpoint at bert-base-chinese were not used when initializing BertForTokenClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-c

odict_keys(['logits'])

torch.Size([5, 22, 2])

# 模型输出logits解释

logits，即模型最后一层输出的原始的、非标准化的分数。要转换为概率，它们需要经过softmax(**所有🤗transformers模型都会输出logits，因为用于训练的损失函数通常会将最后一个激活函数(如SoftMax)与实际损失函数(如交叉熵)融合在一起**)

In [7]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# 输出结果为可识别的概率分数
# print(predictions)

## 要获得每个位置对应的标签，我们可以检查id2label模型配置的属性

In [8]:
model.config.id2label

{0: 'LABEL_0', 1: 'LABEL_1'}