# Transformer Models
---

## `pipeline()`

It is the most basic object in the `Transformers` library. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer. Some of the currently available pipelines are:

- feature-extraction (get the vector representation of a text)
- fill-mask
- ner (named entity recognition)
- question-answering
- sentiment-analysis
- summarization
- text-generation
- translation
- zero-shot-classification

### Zero-shot classification

In [5]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445994853973389, 0.11197379231452942, 0.043426696211099625]}

### Text generation

In [None]:
# Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.
# You can control how many different sequences are generated with the argument `num_return_sequences`
# and the total length of the output text with the argument `max_length`.

from transformers import pipeline

generator = pipeline("text-generation")
out = generator("In this course, we will teach you how to")

In [8]:
out

[{'generated_text': 'In this course, we will teach you how to use JavaScript to transform different kinds of data into plain, easily readable documents and then use this to generate real-time charts that can be visualized as "fact sheets."\n\nDownload the course and'}]

In [9]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use the technology to understand the value of the technology and how to use it to enable future generations to'},
 {'generated_text': 'In this course, we will teach you how to build a simple, flexible web-controller. Each page can be accessed via a web application or via'}]

### Mask filling
- The `top_k` argument controls how many possibilities you want to be displayed.
- Model fills in the special `<mask>` word, which is called a **mask token**. Other mask-filling models might have different mask tokens. Check the model details at UI to verify the proper mask word when exploring other models.

In [11]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.1961982399225235,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052723944187164,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

### Named entity recognition
- `grouped_entities=True` in the pipeline creation function tells the pipeline to regroup together the parts of the sentence that correspond to the same entity. Here the model correctly grouped “Hugging” and “Face” as a single organization, even though the name consists of multiple words

In [13]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796019,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

### Question answering
Note that this pipeline works by extracting information from the provided context; it does not generate the answer.

In [14]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 473/473 [00:00<?, ?B/s] 
Downloading model.safetensors: 100%|██████████| 261M/261M [00:24<00:00, 10.6MB/s] 
Downloading (…)okenizer_config.json: 100%|██████████| 29.0/29.0 [00:00<00:00, 28.6kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 7.39MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 700kB/s]


{'score': 0.694976270198822, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

### Summarization
Like with text generation, you can specify a `max_length` or a `min_length` for the result.

In [15]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 1.80k/1.80k [00:00<?, ?B/s]
Downloading pytorch_model.bin: 100%|██████████| 1.22G/1.22G [02:07<00:00, 9.62MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 25.8kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 10.9MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 16.8MB/s]


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

### Translation
- For translation, you can use a default model if you provide a language pair in the task name (such as `"translation_en_to_fr"`), but the easiest way is to pick the model you want to use on the [Model Hub](https://huggingface.co/models).
- Like with text generation and summarization, you can specify a `max_length` or a `min_length` for the result.

In [17]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-hi")
translator("Welcome to the world of AI!")

[{'translation_text': 'एआई की दुनिया में आपका स्वागत है!'}]

## Types of transformer models

1. Encoder Models
    - Contain only the encoder block
    - Pretraining models involve masking random words and predicting them (masked language modeling)
    - At each stage, the attention layers can access all the words in the initial sentence
    - Best suited for tasks that require the understanding of the language and its grammar like sentiment analysis, token classification, extractive QnA, fill mask, etc.
    - e.g. - BERT, ALBERT, RoBERTa, DistilBERT, etc.
2. Decoder Models
    - Contain only the decoder block
    - Pretraining models involve predicting the next word (causal language modeling)
    - At each stage, for a given word the attention layers can only access the words positioned before it in the sentence
    - Best suited for tasks that require text generation.
    - e.g. - GPT, GPT-2, CTRL, Transformer-XL, etc.
3. Encoder-Decoder Models (seq-2-seq models)
    - Contain both the encoder and decoder blocks
    - Pretraining of these models can be done using the objectives of encoder or decoder models, but usually involves something a bit more complex.
    - At each stage, the attention layers of the encoder can access all the words in the initial sentence, whereas the attention layers of the decoder can only access the words positioned before a given word in the input
    - Best suited for tasks that require generating new sentences depending on a given input, such as summarization, translation, or generative question answering.
    - e.g. - BART, T5, mBART, etc.

## Bias and limitations

If your intent is to use a pretrained model or a fine-tuned version in production, please be aware that, while these models are powerful tools, they come with limitations. The biggest of these is that, to enable pretraining on large amounts of data, researchers often scrape all the content they can find, taking the best as well as the worst of what is available on the internet.

In [6]:
from transformers import pipeline

unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])

result = unmasker("This woman works as a [MASK].")
print([r["token_str"] for r in result])

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']
['nurse', 'maid', 'teacher', 'waitress', 'prostitute']


When you use these tools, you therefore need to keep in the back of your mind that the original model you are using could very easily generate sexist, racist, or homophobic content. **Fine-tuning the model on your data won’t make this intrinsic bias disappear**.

A model can get bias from either of the sources:
- The model is a fine-tuned version of a pretrained model and it picked up its bias from it
- The data the model was trained on is biased
- The metric the model was optimizing for is biased