###Started with pipeline

### The easiest way to use a pretrained model on a given task is to use pipeline. ðŸ¤— Transformers provides the following tasks out of the box:

Sentiment analysis: is a text positive or negative?
Text generation (in English): provide a prompt and the model will generate what follows.
Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place, etc.)
Question answering: provide the model with some context and a question, extract the answer from the context.
Filling masked text: given a text with masked words (e.g., replaced by [MASK]), fill the blanks.
Summarization: generate a summary of a long text.
Translation: translate a text in another language.
Feature extraction: return a tensor representation of the text.
Let's see how this work for sentiment analysis (the other tasks are all covered in the task summary):



In [2]:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [3]:
classifier('I am happy to   show you the ðŸ¤— Transformers library.')

[{'label': 'POSITIVE', 'score': 0.9997319579124451}]

In [4]:
classifier('where can I find stability')

[{'label': 'NEGATIVE', 'score': 0.842963695526123}]

In [6]:
classifier('my sadness is my strength')

[{'label': 'POSITIVE', 'score': 0.9979116320610046}]

###You can use it on a list of sentences, which will be preprocessed then fed to the model as a batch, returning a list of dictionaries like this one:

In [7]:
#for multiple string list
results = classifier(["We are very happy to show you the ðŸ¤— Transformers library.",
           "We hope you don't hate it."])
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309


###You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is fairly neutral.

By default, the model downloaded for this pipeline is called "distilbert-base-uncased-finetuned-sst-2-english". We can look at its model page to get more information about it. It uses the DistilBERT architecture and has been fine-tuned on a dataset called SST-2 for the sentiment analysis task.

Let's say we want to use another model; for instance, one that has been trained on French data. We can search through the model hub that gathers models pretrained on a lot of data by research labs, but also community models (usually fine-tuned versions of those big models on a specific dataset). Applying the tags "French" and "text-classification" gives back a suggestion "nlptown/bert-base-multilingual-uncased-sentiment". Let's see how we can use it.

You can directly pass the name of the model to use to pipeline:

In [8]:
classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")

config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/669M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [9]:
classifier('Je suis trÃ¨s heureux de montrer mon travail')

[{'label': '5 stars', 'score': 0.643379271030426}]

##Now, to download the models and tokenizer we found previously, we just have to use the AutoModelForSequenceClassification.from_pretrained method (feel free to replace model_name by any other model from the model hub):

In [11]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

In [12]:
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
# This model only exists in PyTorch, so we use the `from_pt` flag to import that model in TensorFlow.
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

All the weights of TFBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [13]:
classifier("I am a awesome girl")

[{'label': '5 stars', 'score': 0.8656461238861084}]

In [14]:
inputs = tokenizer("We are very happy to show you the ðŸ¤— Transformers library.")

##This returns a dictionary string to list of ints. It contains the ids of the tokens, as mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an attention mask that the model will use to have a better understanding of the sequence:



In [15]:
inputs

{'input_ids': [101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

In [19]:
tf_batch = tokenizer(
    ["We are very happy to show you the ðŸ¤— Transformers library.", "We hope you don't hate it."],
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="tf"
)

In [20]:
for key, value in tf_batch.items():
    print(f"{key}: {value.numpy().tolist()}")

input_ids: [[101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102], [101, 11312, 18763, 10855, 11530, 112, 162, 39487, 10197, 119, 102, 0, 0, 0]]
token_type_ids: [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]


In [21]:
tf_outputs = model(tf_batch)

### ðŸ¤— Transformers, all outputs are tuples (with only one element potentially). Here, we get a tuple with just the final
activations of the model.

In [22]:
print(tf_outputs)

TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[-2.6222007 , -2.7745318 , -0.8966622 ,  2.0137324 ,  3.3063853 ],
       [ 0.00635777, -0.12577419, -0.05034586, -0.16553022,  0.13285828]],
      dtype=float32)>, hidden_states=None, attentions=None)


In [24]:
import tensorflow as tf
tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)

In [24]:
tf_predictions

<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[0.00205668, 0.00176608, 0.01154936, 0.2120929 , 0.77253497],
       [0.20841917, 0.18262216, 0.19692987, 0.17550425, 0.23652449]],
      dtype=float32)>

##Models are standard torch.nn.Module or tf.keras.Model so you can use them in your usual training loop. ðŸ¤— Transformers also provides a Trainer (or TFTrainer if you are using TensorFlow) class to help with your training (taking care of things such as distributed training, mixed precision, etc.). See the training tutorial for more details.

In [27]:
import tensorflow as tf
tf_outputs = model(tf_batch, labels = tf.constant([1, 0]))

In [28]:
tf_outputs

TFSequenceClassifierOutput(loss=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([6.3389955, 1.568204 ], dtype=float32)>, logits=<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[-2.6222007 , -2.7745318 , -0.8966622 ,  2.0137324 ,  3.3063853 ],
       [ 0.00635777, -0.12577419, -0.05034586, -0.16553022,  0.13285828]],
      dtype=float32)>, hidden_states=None, attentions=None)

Once your model is fine-tuned, you can save it with its tokenizer in the following way:

In [29]:
tokenizer.save_pretrained("/content/tokenizer_est")
model.save_pretrained("/content/model_test")

In [30]:
from transformers import AutoTokenizer, AutoModel,TFAutoModel

tokenizer = AutoTokenizer.from_pretrained("/content/tokenizer_est")
model = TFAutoModel.from_pretrained("/content/model_test")


Some layers from the model checkpoint at /content/model_test were not used when initializing TFBertModel: ['classifier', 'dropout_37']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at /content/model_test.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


You can then load this model back using the AutoModel.from_pretrained method by passing the directory name instead of the model name. One cool feature of ðŸ¤— Transformers is that you can easily switch between PyTorch and TensorFlow: any model saved as before can be loaded back either in PyTorch or TensorFlow. If you are loading a saved PyTorch model in a TensorFlow model, use TFAutoModel.from_pretrained like this:

In [31]:
tf_outputs = model(tf_batch, output_hidden_states=True, output_attentions=True)
all_hidden_states, all_attentions = tf_outputs[-2:]

In [32]:
len(all_hidden_states), len(all_attentions)

(13, 12)

In [33]:
all_attentions[0].shape

TensorShape([2, 12, 14, 14])

In [34]:
all_attentions[0]

<tf.Tensor: shape=(2, 12, 14, 14), dtype=float32, numpy=
array([[[[9.20647621e-01, 5.28097851e-03, 4.93410928e-03, ...,
          6.98074116e-04, 1.98035147e-02, 7.79700279e-03],
         [3.56333293e-02, 5.03868163e-02, 4.86882091e-01, ...,
          1.66191289e-03, 1.10877398e-02, 3.06572020e-03],
         [2.09704246e-02, 2.12142691e-01, 1.85903590e-02, ...,
          1.37747324e-03, 4.42773942e-03, 4.62271133e-03],
         ...,
         [6.27527237e-02, 4.47570020e-03, 2.34503532e-03, ...,
          2.36112308e-02, 1.80595696e-01, 8.00733045e-02],
         [6.04147874e-02, 3.66150727e-03, 9.11446009e-03, ...,
          2.17197910e-02, 1.48267508e-01, 6.34760320e-01],
         [9.01934206e-01, 4.92470805e-04, 5.00642229e-04, ...,
          7.89331447e-04, 7.08459988e-02, 1.51186492e-02]],

        [[7.72505581e-01, 5.84377022e-03, 3.81425978e-03, ...,
          2.70711910e-03, 2.70551387e-02, 1.07837811e-01],
         [2.31092274e-01, 3.46895941e-02, 3.47997993e-02, ...,
          