In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.36.2-py3-none-any.whl.metadata (126 kB)
     ---------------------------------------- 0.0/126.8 kB ? eta -:--:--
     ---------------------------------------- 0.0/126.8 kB ? eta -:--:--
     --- ------------------------------------ 10.2/126.8 kB ? eta -:--:--
     --- ------------------------------------ 10.2/126.8 kB ? eta -:--:--
     --- ------------------------------------ 10.2/126.8 kB ? eta -:--:--
     -------- ---------------------------- 30.7/126.8 kB 145.2 kB/s eta 0:00:01
     ----------- ------------------------- 41.0/126.8 kB 140.3 kB/s eta 0:00:01
     ----------------- ------------------- 61.4/126.8 kB 192.5 kB/s eta 0:00:01
     ----------------------- ------------- 81.9/126.8 kB 241.3 kB/s eta 0:00:01
     ------------------------------- ---- 112.6/126.8 kB 297.7 kB/s eta 0:00:01
     ------------------------------------ 126.8/126.8 kB 287.0 kB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.19.3 (from transformers)

In [2]:
!pip install transformers[sentencepiece]

Collecting sentencepiece!=0.1.92,>=0.1.91 (from transformers[sentencepiece])
  Downloading sentencepiece-0.1.99-cp311-cp311-win_amd64.whl (977 kB)
     ---------------------------------------- 0.0/977.5 kB ? eta -:--:--
     ---------------------------------------- 10.2/977.5 kB ? eta -:--:--
     ---------------------------------------- 10.2/977.5 kB ? eta -:--:--
     - ----------------------------------- 30.7/977.5 kB 187.9 kB/s eta 0:00:06
     - ----------------------------------- 30.7/977.5 kB 187.9 kB/s eta 0:00:06
     - ----------------------------------- 30.7/977.5 kB 187.9 kB/s eta 0:00:06
     -- ---------------------------------- 61.4/977.5 kB 204.8 kB/s eta 0:00:05
     -- ---------------------------------- 71.7/977.5 kB 218.6 kB/s eta 0:00:05
     ---- ------------------------------- 112.6/977.5 kB 312.2 kB/s eta 0:00:03
     ---- ------------------------------- 122.9/977.5 kB 313.8 kB/s eta 0:00:03
     ----- ------------------------------ 153.6/977.5 kB 353.1 kB/s eta 

In [3]:
import transformers

  from .autonotebook import tqdm as notebook_tqdm


# **Transformers**



### Working with pipelinesPipeline function returns an end-to-end object that performs an NLP task on one or several texts.


In [4]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier([
    "I don't like apples!",
    "I hate this so much!"
])

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 629/629 [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|████████████████████████████████████████████████████████████| 268M/268M [00:59<00:00, 4.49MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████| 48.0/48.0 [00:00<?, ?B/s]
vocab.txt: 100%|█████████████████████████████████████████████████████████████████████| 232k/232k [00

[{'label': 'NEGATIVE', 'score': 0.9972789883613586},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

- This pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English.
- The model is downloaded and cached when you create the classifier object.

Some of the currently available pipelines are:

- feature-extraction (get the vector representation of a text)
- fill-mask
- ner (named entity recognition)
- question-answering
- sentiment-analysis
- summarization
- text-generation
- translation
- zero-shot-classification






# **Zero-shot Classification**
- Classify texts that haven’t been labelled.
- For this use case, the zero-shot-classification pipeline is very powerful.
  - It allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model.


In [6]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445989489555359, 0.11197426170110703, 0.04342678561806679]}

### Text Generation
- The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text.
- The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text.


In [8]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("WE will learn about Nepal")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'WE will learn about Nepal\'s current and future economic situation and be able to bring us together to help build a sustainable future for Nepal, our neighbors and those we serve on the continent," he said.\n\n"We will work to implement a comprehensive'}]

### Using any model from the Hub in a pipelineLet’s try the distilgpt2 model! Here’s how to load it in the same pipeline as before:


In [9]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

config.json: 100%|████████████████████████████████████████████████████████████████████████████| 762/762 [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|████████████████████████████████████████████████████████████| 353M/353M [01:24<00:00, 4.18MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████| 124/124 [00:00<?, ?B/s]
vocab.json: 100%|█████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 2.30MB/s]
merges.txt: 100%|████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 642kB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████| 1.36M/1.36M [00:04<00:00, 318kB/s]
Setting `pad_to

[{'generated_text': 'In this course, we will teach you how to use the tools, and how they could allow you to do both. We have only published a few'},
 {'generated_text': 'In this course, we will teach you how to set up a self-help website and start implementing the best practices. The goal, however, is'}]

#### The Interference API
All the models can be tested directly through our browser using the Inference API, which is available on the Hugging Face website

### Mask filling

The idea of this task is to fill in the blanks in a given text:


In [10]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 480/480 [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|████████████████████████████████████████████████████████████| 331M/331M [01:17<00:00, 4.29MB/s]
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on an

[{'score': 0.1961977779865265,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052717983722687,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

### Named Entity Recognition

Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations

In [11]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Aavash and I work at Hugging Face in Nepal")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 998/998 [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|██████████████████████████████████████████████████████████| 1.33G/1.33G [05:20<00:00, 4.16MB/s]
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expe

[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796019,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

### Question answering
The question-answering pipeline answers questions using information from a given context

Note that this pipeline works by extracting information from the provided context; it does not generate the answer.





In [12]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Aavash and I work in Fusemachines",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 473/473 [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|████████████████████████████████████████████████████████████| 261M/261M [01:05<00:00, 3.98MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████| 29.0/29.0 [00:00<?, ?B/s]
vocab.txt: 100%|████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 3.71MB/s]


{'score': 0.9582499861717224, 'start': 32, 'end': 44, 'answer': 'Fusemachines'}

### Summarization

Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.

In [14]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|████████████████████████████████████████████████████████████████████████| 1.80k/1.80k [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
pytorch_model.bin: 100%|██████████████████████████████████████████████████████████| 1.22G/1.22G [04:57<00:00, 4.11MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 13.0kB/s]
vocab.json: 100%|███████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 1.07MB/s]
merges.txt: 100%

[{'summary_text': ' The number of engineering graduates in the United States has declined in recent years . China and India graduate six and eight times as many traditional engineers as the U.S. does . Rapidly developing economies such as China continue to encourage and advance the teaching of engineering . There are declining offerings in engineering subjects dealing with infrastructure, infrastructure, the environment, and related issues .'}]

### Translation

In [15]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

config.json: 100%|████████████████████████████████████████████████████████████████████████| 1.42k/1.42k [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
pytorch_model.bin: 100%|████████████████████████████████████████████████████████████| 301M/301M [01:13<00:00, 4.07MB/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████| 293/293 [00:00<00:00, 294kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████| 42.0/42.0 [00:00<?, ?B/s]
source.spm: 100%|███████████████████████████████████████████████████████████████████| 802k/802k [00:00<00:00, 4.31MB/s]
target.spm: 100%|███████████████████████████████████████████████████████████████████| 778k/778k [00:00<00:00, 3.09MB/s]
vocab.json: 100

[{'translation_text': 'This course is produced by Hugging Face.'}]

### History
![History of transformers](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_chrono.svg)


The Transformer architecture was introduced in June 2017. The focus of the original research was on translation tasks. This was followed by the introduction of several influential models.

Broadly, they can be grouped into three categories:

- GPT-like (also called auto-regressive Transformer models)
- BERT-like (also called auto-encoding Transformer models)
- BART/T5-like (also called sequence-to-sequence Transformer models)

### Transformers are language models

All the Transformer models have been trained on large amounts of raw text in a self-supervised fashion. Self-supervised learning is a type of training in which the objective is automatically computed from the inputs of the model.

This type of model develops a statistical understanding of the language it has been trained on, but it’s not very useful for specific practical tasks.

Because of this, the general pretrained model then goes through a process called transfer learning. During this process, the model is fine-tuned in a supervised way — that is, using human-annotated labels — on a given task.

**predicting the next word**

This is called causal language modeling.

![Causal language modeling](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/causal_modeling.svg
                           Another example is masked language modeling, in which the model predicts a masked word in the sentence.

![Masked Language Modeling](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/masked_modeling.svg)




Apart from a few outliers (like DistilBERT), the general strategy to achieve better performance is by increasing the models’ sizes as well as the amount of data they are pretrained on.

![Transformers are big](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/model_parameters.png)

g)

### Transfer Learning

***Pretraining*** is the act of training a model from scratch: the weights are randomly initialized, and the training starts without any prior knowledge.

***Fine-tuning***, on the other hand, is the training done after a model has been pretrained. To perform fine-tuning, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task.

### General Architecture

    The model is primarily composed of two blocks:

- Encoder (left): The encoder receives an input and builds a representation of it (its features). This means that the model is optimized to acquire understanding from the input.
- Decoder (right): The decoder uses the encoder’s representation (features) along with other inputs to generate a target sequence. This means that the model is optimized for generating outputs.

![Architecture](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_blocks.svg)
    

Each of these parts can be used independently, depending on the task:

- **Encoder-only models**: Good for tasks that require understanding of the input, such as sentence classification and named entity recognition.
- **Decoder-only models**: Good for generative tasks such as text generation.
- Encoder-decoder models or sequence-to-sequence models: Good for generative tasks that require an input, such as translation or summarization.



### Attention layers
This layer will tell the model to pay specific attention to certain words in the sentence you passed it (and more or less ignore the others) when dealing with the representation of each word.

### The original Architecture

- The Transformer architecture was originally designed for translation.
- During training, the encoder receives inputs (sentences) in a certain language, while the decoder receives the same sentences in the desired target language.
- In the encoder, the attention layers can use all the words in a sentence
- The decoder, however, works sequentially and can only pay attention to the words in the sentence that it has already translated.



- When the model has access to target sentences, the decoder is fed the whole target, but it is not allowed to use future words

![Transformer Architecture](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers.svg)

The *attention mask* can also be used in the encoder/decoder to prevent the model from paying attention to some special words — for instance, the special padding word used to make all the inputs the same length when batching together sentences.



### Architectures vs. checkpoints

- **Architecture**: This is the skeleton of the model — the definition of each layer and each operation that happens within the model.
- **Checkpoints**: These are the weights that will be loaded in a given architecture.
- **Model**: This is an umbrella term that isn’t as precise as “architecture” or “checkpoint”: it can mean both. This course will specify architecture or checkpoint when it matters to reduce ambiguity.

For example, BERT is an architecture while bert-base-cased, a set of weights trained by the Google team for the first release of BERT, is a checkpoint. However, one can say “the BERT model” and “the bert-base-cased model.”




While transformer models are powerful tools, they come with limitations. The biggest of these is that, to enable pretraining on large amounts of data, researchers often scrape all the content they can find, taking the best as well as the worst of what is available on the internet.

In [20]:
from transformers import pipeline

unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])

result = unmasker("This woman works as a [MASK].")
print([r["token_str"] for r in result])

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']
['nurse', 'maid', 'teacher', 'waitress', 'prostitute']
