<a href="https://colab.research.google.com/github/sophia-zhang-qwq/model-tutorials/blob/main/ALBERT_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
from transformers import pipeline
from transformers import AlbertTokenizer, AlbertModel

In [5]:
# pipeline('fill-mask', model='albert-xlarge-v2') := pipeline is a high-level API from the transformer lib
#   - 'fill-mask' := use a masked language modeling(MLM) pieline, the model predicts the word that belongs in a [MASK] placeholder
#   - model='albert-xlarge-v2' := pretrianed ALBERT (A Lite Bert) model, faster and more memory-efficient than BERT

# unmasker("Hello I'm a [MASK] model.") := ret a list or predictions, with the predicted word, confidence score, the full sentence with the mask filled
unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
unmasker("Hello I'm a [MASK] model.")

Some weights of the model checkpoint at albert-xlarge-v2 were not used when initializing AlbertForMaskedLM: ['albert.pooler.bias', 'albert.pooler.weight']
- This IS expected if you are initializing AlbertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.014184744097292423,
  'token': 3161,
  'token_str': 'fashion',
  'sequence': "hello i'm a fashion model."},
 {'score': 0.012617520056664944,
  'token': 21359,
  'token_str': 'bikini',
  'sequence': "hello i'm a bikini model."},
 {'score': 0.011719798669219017,
  'token': 17089,
  'token_str': 'nude',
  'sequence': "hello i'm a nude model."},
 {'score': 0.011604255065321922,
  'token': 3679,
  'token_str': 'beauty',
  'sequence': "hello i'm a beauty model."},
 {'score': 0.011053560301661491,
  'token': 9541,
  'token_str': 'celebrity',
  'sequence': "hello i'm a celebrity model."}]

In [7]:
# in torch
#pretrianed on a large corpus(BooksCorpus+Wiki)
tokenizer = AlbertTokenizer.from_pretrained('albert-xlarge-v2')
model = AlbertModel.from_pretrained("albert-xlarge-v2")
text = "Replace me by any text you'd like."
#tokenize the input into a model-accpeted format := outputs aa dict with keys like 'input_ids' and 'attention_mask'
#{
#  'input_ids': tensor([[ 2, 4123, 456, ... , 3]]),  # token IDs
#  'attention_mask': tensor([[1, 1, 1, ..., 1]])     # padding mask
#}
# attention mask := a tensor of 0/1 that tells the model which tokens in the input are real words and which are just padding
#return_tensors='pt' := ret PyTroch tensors
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
# ret last_hidden_state: tensor of shape (batch_size, sequence_length, hidden_size) := for downstream tasks like classification, clustering and similarity

In [8]:
output

BaseModelOutputWithPooling(last_hidden_state=tensor([[[ 0.1278,  0.1512, -0.2929,  ...,  0.1888, -0.0011, -0.4852],
         [ 0.2171,  0.0129,  0.1571,  ..., -1.0309, -0.8904,  0.0414],
         [ 0.4438,  0.1890, -0.4023,  ..., -0.3362, -0.3815,  0.4132],
         ...,
         [ 0.0933,  0.1535, -0.4404,  ..., -0.3890, -0.1656, -0.7551],
         [-0.2673, -0.6808, -0.2891,  ...,  0.1444,  0.5476, -0.4405],
         [ 0.2998, -0.0307, -0.4272,  ..., -0.1084,  0.0254, -0.1646]]],
       grad_fn=<NativeLayerNormBackward0>), pooler_output=tensor([[-0.0501, -0.7932,  0.8507,  ...,  0.9916, -0.9576, -0.8506]],
       grad_fn=<TanhBackward0>), hidden_states=None, attentions=None)

In [9]:
# in tf
from transformers import AlbertTokenizer, TFAlbertModel
tokenizer = AlbertTokenizer.from_pretrained('albert-xlarge-v2')
model = TFAlbertModel.from_pretrained("albert-xlarge-v2")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
output

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFAlbertModel: ['predictions.bias', 'predictions.LayerNorm.weight', 'predictions.decoder.bias', 'predictions.dense.weight', 'predictions.LayerNorm.bias', 'predictions.dense.bias']
- This IS expected if you are initializing TFAlbertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFAlbertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFAlbertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFAlbertModel for predictions without further training.


TFBaseModelOutputWithPooling(last_hidden_state=<tf.Tensor: shape=(1, 12, 2048), dtype=float32, numpy=
array([[[ 0.12811783,  0.15173063, -0.2925173 , ...,  0.18904257,
         -0.00168011, -0.4854886 ],
        [ 0.21841048,  0.01373574,  0.15615556, ..., -1.0316228 ,
         -0.88968724,  0.03939243],
        [ 0.4439874 ,  0.18860255, -0.4020402 , ..., -0.33657092,
         -0.38227516,  0.4128933 ],
        ...,
        [ 0.09349871,  0.15469165, -0.44016474, ..., -0.38967422,
         -0.16517994, -0.75454545],
        [-0.26725313, -0.6817505 , -0.28828767, ...,  0.14476536,
          0.5475564 , -0.44049996],
        [ 0.2994498 , -0.03046895, -0.42721096, ..., -0.10833467,
          0.02565985, -0.16472901]]], dtype=float32)>, pooler_output=<tf.Tensor: shape=(1, 2048), dtype=float32, numpy=
array([[-0.05222906, -0.79243517,  0.8505734 , ...,  0.991507  ,
        -0.95753324, -0.8505196 ]], dtype=float32)>, hidden_states=None, attentions=None)

In [10]:
unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
unmasker("The man worked as a [MASK].")

Some weights of the model checkpoint at albert-xlarge-v2 were not used when initializing AlbertForMaskedLM: ['albert.pooler.bias', 'albert.pooler.weight']
- This IS expected if you are initializing AlbertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.048737939447164536,
  'token': 16661,
  'token_str': 'policeman',
  'sequence': 'the man worked as a policeman .'},
 {'score': 0.04362059012055397,
  'token': 20957,
  'token_str': 'salesman',
  'sequence': 'the man worked as a salesman .'},
 {'score': 0.037947267293930054,
  'token': 17304,
  'token_str': 'waiter',
  'sequence': 'the man worked as a waiter .'},
 {'score': 0.03099907748401165,
  'token': 2197,
  'token_str': 'teacher',
  'sequence': 'the man worked as a teacher .'},
 {'score': 0.02626478113234043,
  'token': 5425,
  'token_str': 'nurse',
  'sequence': 'the man worked as a nurse .'}]

In [16]:
unmasker("The woman worked as a [MASK].")

[{'score': 0.1342538595199585,
  'token': 5425,
  'token_str': 'nurse',
  'sequence': 'the woman worked as a nurse .'},
 {'score': 0.05892161652445793,
  'token': 2197,
  'token_str': 'teacher',
  'sequence': 'the woman worked as a teacher .'},
 {'score': 0.050817396491765976,
  'token': 13678,
  'token_str': 'waitress',
  'sequence': 'the woman worked as a waitress .'},
 {'score': 0.040556952357292175,
  'token': 22740,
  'token_str': 'nanny',
  'sequence': 'the woman worked as a nanny .'},
 {'score': 0.025493761524558067,
  'token': 1386,
  'token_str': 'secretary',
  'sequence': 'the woman worked as a secretary .'}]

In [15]:
unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
unmasker("An is the most [MASK] girl in the world!")

Some weights of the model checkpoint at albert-xlarge-v2 were not used when initializing AlbertForMaskedLM: ['albert.pooler.bias', 'albert.pooler.weight']
- This IS expected if you are initializing AlbertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.45959097146987915,
  'token': 1632,
  'token_str': 'beautiful',
  'sequence': 'an is the most beautiful girl in the world!'},
 {'score': 0.1138354167342186,
  'token': 5289,
  'token_str': 'amazing',
  'sequence': 'an is the most amazing girl in the world!'},
 {'score': 0.04539617896080017,
  'token': 5934,
  'token_str': 'wonderful',
  'sequence': 'an is the most wonderful girl in the world!'},
 {'score': 0.04238549619913101,
  'token': 10048,
  'token_str': 'gorgeous',
  'sequence': 'an is the most gorgeous girl in the world!'},
 {'score': 0.03133658319711685,
  'token': 9583,
  'token_str': 'precious',
  'sequence': 'an is the most precious girl in the world!'}]

In [31]:
unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
unmasker("An apple is a round, edible fruit produced by an [MASK] tree!")

Some weights of the model checkpoint at albert-xlarge-v2 were not used when initializing AlbertForMaskedLM: ['albert.pooler.bias', 'albert.pooler.weight']
- This IS expected if you are initializing AlbertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.9743994474411011,
  'token': 4037,
  'token_str': 'apple',
  'sequence': 'an apple is a round, edible fruit produced by an apple tree!'},
 {'score': 0.007509846705943346,
  'token': 23210,
  'token_str': 'edible',
  'sequence': 'an apple is a round, edible fruit produced by an edible tree!'},
 {'score': 0.0020631684456020594,
  'token': 3541,
  'token_str': 'oak',
  'sequence': 'an apple is a round, edible fruit produced by an oak tree!'},
 {'score': 0.001839998411014676,
  'token': 18993,
  'token_str': 'evergreen',
  'sequence': 'an apple is a round, edible fruit produced by an evergreen tree!'},
 {'score': 0.0012297132052481174,
  'token': 12635,
  'token_str': 'orchard',
  'sequence': 'an apple is a round, edible fruit produced by an orchard tree!'}]

In [25]:
unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
unmasker("Wang Zitao is the most [MASK] person in the world! ")

Some weights of the model checkpoint at albert-xlarge-v2 were not used when initializing AlbertForMaskedLM: ['albert.pooler.bias', 'albert.pooler.weight']
- This IS expected if you are initializing AlbertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.1223713681101799,
  'token': 1632,
  'token_str': 'beautiful',
  'sequence': 'wang zitao is the most beautiful person in the world!'},
 {'score': 0.0850471630692482,
  'token': 5289,
  'token_str': 'amazing',
  'sequence': 'wang zitao is the most amazing person in the world!'},
 {'score': 0.048757363110780716,
  'token': 5934,
  'token_str': 'wonderful',
  'sequence': 'wang zitao is the most wonderful person in the world!'},
 {'score': 0.04363106191158295,
  'token': 2177,
  'token_str': 'powerful',
  'sequence': 'wang zitao is the most powerful person in the world!'},
 {'score': 0.03813674673438072,
  'token': 7472,
  'token_str': 'brilliant',
  'sequence': 'wang zitao is the most brilliant person in the world!'}]

In [13]:
unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
unmasker("Zitao is the most [MASK] person in the world! He is the only light as I stumble in the midst of total darkness.")

Some weights of the model checkpoint at albert-xlarge-v2 were not used when initializing AlbertForMaskedLM: ['albert.pooler.bias', 'albert.pooler.weight']
- This IS expected if you are initializing AlbertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.19802334904670715,
  'token': 5289,
  'token_str': 'amazing',
  'sequence': 'zitao is the most amazing person in the world! he is the only light as i stumble in the midst of total darkness.'},
 {'score': 0.12309816479682922,
  'token': 5934,
  'token_str': 'wonderful',
  'sequence': 'zitao is the most wonderful person in the world! he is the only light as i stumble in the midst of total darkness.'},
 {'score': 0.0689011737704277,
  'token': 18224,
  'token_str': 'comforting',
  'sequence': 'zitao is the most comforting person in the world! he is the only light as i stumble in the midst of total darkness.'},
 {'score': 0.06749662756919861,
  'token': 1632,
  'token_str': 'beautiful',
  'sequence': 'zitao is the most beautiful person in the world! he is the only light as i stumble in the midst of total darkness.'},
 {'score': 0.04112880304455757,
  'token': 2177,
  'token_str': 'powerful',
  'sequence': 'zitao is the most powerful person in the world! he is the only light as