#### ALBERT：A Lite BERT
1.从ALBERT的全称看出，它是一个轻量版的BERT模型。

2.embedding矩阵分解，把 V * H 分解成 V * E + E * H，比如E取128，可以降低参数量

3.12层的attention层和全连接层层共享参数

4.SOP任务替代NSP任务：

- SOP任务将负样本换成了同一篇文章中的两个逆序的句子，在预训练时，让模型去预测句子对是正序还是逆序，从而消除topic prediction，让模型学习更难的coherence prediction。实验证明，SOP任务带来的提升比NSP任务要好。

In [6]:
from transformers import AlbertModel
from transformers import AlbertTokenizer

In [8]:
pretrained_model = '../../models/albert-base-v2/'

tokenizer = AlbertTokenizer.from_pretrained(pretrained_model)
model = AlbertModel.from_pretrained(pretrained_model)

Some weights of the model checkpoint at ../../models/albert-base-v2/ were not used when initializing AlbertModel: ['predictions.LayerNorm.bias', 'predictions.decoder.weight', 'predictions.LayerNorm.weight', 'predictions.dense.bias', 'predictions.dense.weight', 'predictions.bias', 'predictions.decoder.bias']
- This IS expected if you are initializing AlbertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [9]:
model

AlbertModel(
  (embeddings): AlbertEmbeddings(
    (word_embeddings): Embedding(30000, 128, padding_idx=0)
    (position_embeddings): Embedding(512, 128)
    (token_type_embeddings): Embedding(2, 128)
    (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0, inplace=False)
  )
  (encoder): AlbertTransformer(
    (embedding_hidden_mapping_in): Linear(in_features=128, out_features=768, bias=True)
    (albert_layer_groups): ModuleList(
      (0): AlbertLayerGroup(
        (albert_layers): ModuleList(
          (0): AlbertLayer(
            (full_layer_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (attention): AlbertAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (attention_dropout): Dropout(p=0, inplace=False)
      

In [10]:
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

In [18]:
output

BaseModelOutputWithPooling(last_hidden_state=tensor([[[ 1.0633,  0.6634,  1.2338,  ..., -1.5131, -0.4445,  1.2011],
         [-0.2914, -0.5385, -1.6138,  ...,  0.2044,  2.1072, -0.3526],
         [ 0.3940,  0.8559, -0.5069,  ...,  0.8633,  0.4893,  0.2798],
         ...,
         [ 0.4754, -1.4797, -0.7564,  ...,  1.2648,  1.6309,  0.4099],
         [ 0.0298,  0.1406,  0.2338,  ..., -0.2372,  0.6055, -0.0437],
         [ 0.0726,  0.1270, -0.0512,  ..., -0.0985,  0.1229,  0.2115]]],
       grad_fn=<NativeLayerNormBackward0>), pooler_output=tensor([[-1.3040e-01,  1.0424e-01,  3.9711e-01, -4.7383e-01, -1.3758e-02,
         -9.8533e-01, -5.0131e-02,  1.2261e-02, -5.6495e-02, -9.9441e-01,
          9.4768e-01, -1.4119e-01, -5.6679e-01, -8.5202e-01, -8.8070e-01,
          2.3836e-01, -1.9595e-01, -1.6009e-01,  9.9556e-01,  6.8593e-02,
         -6.8246e-01, -9.9791e-01,  9.9637e-01,  9.3427e-01,  9.4481e-01,
          1.9624e-01, -1.2116e-01, -9.9659e-01, -9.5589e-01,  1.7546e-01,
         -9

In [22]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model=pretrained_model)
unmasker("spaCy is [MASK] to help you do real work — to build real products,")

[{'score': 0.3715081512928009,
  'token': 1380,
  'token_str': 'meant',
  'sequence': 'spacy is meant to help you do real work  to build real products,'},
 {'score': 0.21246449649333954,
  'token': 2293,
  'token_str': 'supposed',
  'sequence': 'spacy is supposed to help you do real work  to build real products,'},
 {'score': 0.027506107464432716,
  'token': 1006,
  'token_str': 'designed',
  'sequence': 'spacy is designed to help you do real work  to build real products,'},
 {'score': 0.01515924371778965,
  'token': 301,
  'token_str': 'something',
  'sequence': 'spacy is something to help you do real work  to build real products,'},
 {'score': 0.013648479245603085,
  'token': 677,
  'token_str': 'done',
  'sequence': 'spacy is done to help you do real work  to build real products,'}]