In [1]:
# transformer models are used to solve all kinds of NLP tasks

# the pipeline function groups all the preprocessing, model, and postprocessing steps required

In [2]:
import torch
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


# Sentiment Analysis

In [9]:
# Sentiment analysis classifies text as positive or negative.

classifier = pipeline('sentiment-analysis')

classifier([
    "I've been waiting for a HuggingFace course my whole life.",
    "I have the worst cold.",
    "I like dark clouds",
    "I love dark clouds",
    "I like winter",
    "Winter is coming"
])

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598051905632019},
 {'label': 'NEGATIVE', 'score': 0.9997242093086243},
 {'label': 'NEGATIVE', 'score': 0.7810968160629272},
 {'label': 'POSITIVE', 'score': 0.9995312690734863},
 {'label': 'POSITIVE', 'score': 0.9976855516433716},
 {'label': 'POSITIVE', 'score': 0.9872573018074036}]

# Zero shot classification

In [11]:
# The zero shot classification pipeline lets you select the labels for classification.

classifier = pipeline('zero-shot-classification')

classifier(
    "This is a course about the Transformers library",
    candidate_labels = ["education", "politics", "business"])

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445988893508911, 0.11197422444820404, 0.04342687502503395]}

# Text generation

In [15]:
# Text generation pipeline uses an input prompt to generate text

generator = pipeline('text-generation')

generator('In this course, we will teach you how to')

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to successfully handle and manage tasks through the use of interactive UI elements such as buttons, lists and widgets. In this course we will understand interactive UI elements using common programming concepts.\n\nIn this class,'}]

In [16]:
generator('My cat is small and')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My cat is small and there is not a lot of room in our house for her to have such a small cat." She added that she still had the intention to open the house for cats.\n\nThe family say the first move will not only'}]

In [17]:
generator('My cat has no teeth, but')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My cat has no teeth, but she can chew on a lot of potatoes this week.\n\n"But this is just my best food, and I\'m having a lot of fun having all of that at the same time," he said. "'}]

In [19]:
generator('Our cat is cute and she also likes to wear boots')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Our cat is cute and she also likes to wear boots. She wears them to protect her hands as she\'s always wearing her best pair. She can do her job when she needs to," she said.'}]

In [21]:
generator('Yesterday, I had a piece of candy')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Yesterday, I had a piece of candy. When I got home from the gym, I looked at the candy. That was on top of my head. It was actually the first candy I'd ever seen it. I got a little bit scared and"}]

# Text generation with distilgpt2

In [27]:
generator = pipeline('text-generation', model='distilgpt2')

generator(
    'In this course, we will teach you how to',
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to get some basic resources for use with a new framework to help you start your own project.'},
 {'generated_text': 'In this course, we will teach you how to practice it. I\u200d\u200d\u200d is a course of knowledge, instruction and learning in'}]

In [28]:
generator(
    'My cat is small and',
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My cat is small and needs to learn to be active, as well as enjoy the outdoors. With that said, what this cat feels like when you'},
 {'generated_text': "My cat is small and you can't see what has gone wrong if you're trying to see the details of my dog as normal. Her mouth is"}]

In [35]:
generator(
    'Who wants a quarter or a',
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Who wants a quarter or a grand. But not all people are like that, because if they were to get back into it, they would surely start'},
 {'generated_text': "Who wants a quarter or a half years? Do their kids have any problems? Do they know they're in a situation when you'd get a shot"}]

# Fill-mask

In [36]:
# The fill-mask pipeline will predict missing words in a sentence

unmasker = pipeline('fill-mask')
unmasker('This course will teach you all about <mask> models', top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 480/480 [00:00<00:00, 187kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 331M/331M [00:04<00:00, 69.2MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 4.68MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 3.50MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 7.51MB/s]


[{'score': 0.1963152289390564,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models'},
 {'score': 0.04449247196316719,
  'token': 745,
  'token_str': ' building',
  'sequence': 'This course will teach you all about building models'}]

In [40]:
unmasker('My <mask> moved to the front of the line.', top_k=2)

[{'score': 0.03633992746472359,
  'token': 512,
  'token_str': ' car',
  'sequence': 'My car moved to the front of the line.'},
 {'score': 0.03284739330410957,
  'token': 1141,
  'token_str': ' wife',
  'sequence': 'My wife moved to the front of the line.'}]

In [46]:
unmasker('Hurricane <mask> brings high winds and heavy rain.', top_k=2)

[{'score': 0.381716251373291,
  'token': 8547,
  'token_str': ' Irma',
  'sequence': 'Hurricane Irma brings high winds and heavy rain.'},
 {'score': 0.1378900557756424,
  'token': 4508,
  'token_str': ' Matthew',
  'sequence': 'Hurricane Matthew brings high winds and heavy rain.'}]

# Named Entity Recognition

In [47]:
# The NER pipeline identifies entities such as persons, organizations, or locations in a sentence

ner = pipeline('ner', grouped_entities=True)

ner('At least 200,000 Russians have left the country since President Vladimir Putin’s draft began.')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 998/998 [00:00<00:00, 387kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1.33G/1.33G [00:22<00:00, 59.5MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 60.0/60.0 [00:00<00:00, 22.9kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 2.11MB/s]


[{'entity_group': 'MISC',
  'score': 0.9973852,
  'word': 'Russians',
  'start': 17,
  'end': 25},
 {'entity_group': 'PER',
  'score': 0.9991536,
  'word': 'Vladimir Putin',
  'start': 64,
  'end': 78}]

In [52]:
ner("You’re doing it for your family and your friends, Dr. Ashish Jha, \
the White House’s Covid coordinator, told The Washington Post.")

[{'entity_group': 'PER',
  'score': 0.9984539,
  'word': 'Ashish Jha',
  'start': 54,
  'end': 64},
 {'entity_group': 'LOC',
  'score': 0.99894553,
  'word': 'White House',
  'start': 70,
  'end': 81},
 {'entity_group': 'ORG',
  'score': 0.6163364,
  'word': 'Covid',
  'start': 84,
  'end': 89},
 {'entity_group': 'ORG',
  'score': 0.9986109,
  'word': 'The Washington Post',
  'start': 108,
  'end': 127}]

# Question Answering

In [53]:
question_answerer = pipeline('question-answering')

question_answerer(
    question='Why are we doing this?',
    context='You’re doing it for your family and your friends, Dr. Ashish Jha, \
the White House’s Covid coordinator, told The Washington Post.')

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 473/473 [00:00<00:00, 179kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 261M/261M [00:04<00:00, 65.2MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 29.0/29.0 [00:00<00:00, 10.9kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 2.90MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 436k/436k [00:00<00:00, 4.00M

{'score': 0.3739520013332367,
 'start': 16,
 'end': 48,
 'answer': 'for your family and your friends'}

In [55]:
question_answerer(
    question='What should I do next?',
    context='Today I need to climb a volcano, hug a redwood, and take a falconry class.')

{'score': 0.15458625555038452,
 'start': 52,
 'end': 73,
 'answer': 'take a falconry class'}

In [56]:
question_answerer(
    question='What should I do first?',
    context='Today I need to climb a volcano, hug a redwood, and take a falconry class.')

{'score': 0.2575291693210602,
 'start': 16,
 'end': 31,
 'answer': 'climb a volcano'}

In [69]:
question_answerer(
    question='Who are you?',
    context='Today I need to climb a volcano, hug a redwood, and take a falconry class.')

{'score': 0.0881502777338028,
 'start': 33,
 'end': 46,
 'answer': 'hug a redwood'}

In [68]:
question_answerer(
    question='Who am I?',
    context='Today I need to climb a volcano, hug a redwood, and take a falconry class.')

{'score': 0.06383596360683441,
 'start': 59,
 'end': 73,
 'answer': 'falconry class'}

# Summarization

In [57]:
summarizer = pipeline('summarization')

summarizer('Whether to get a booster shot is a closer call for healthy people under 50, \
            many experts believe. Rates of severe Covid are already so low among this group \
            that booster shots don’t seem to have a huge health benefit. Of course, the downsides \
            of the shots also seem to be small, because research has consistently shown them to be safe. \
            But getting a booster shot is not wholly without downsides. Some people are fearful of \
            needles or prefer to avoid taking unnecessary medicines. Other people were sick for a day \
            or two after getting an earlier Covid shot and would prefer not to repeat the experience. \
            For hourly workers and single parents, a day in bed can also bring financial or logistical burdens,\
            especially in a country without guaranteed sick leave or child care.')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.80k/1.80k [00:00<00:00, 675kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1.22G/1.22G [00:19<00:00, 64.1MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 9.74kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 4.77MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 2.62MB/s]


[{'summary_text': ' Rates of severe Covid are already so low among this group that booster shots don’t seem to have a huge health benefit . Some people are fearful of  eedles or prefer to avoid taking unnecessary medicines . Others were sick for a day or two after getting an earlier Covid shot and would prefer not to repeat the experience .'}]

In [58]:
summarizer('Not long after dropping out of college to pursue a career in cryptocurrencies, Ben Weintraub woke up to some bad news.Mr. Weintraub and two classmates from the University of Chicago had spent the past few months working on a software platform called Beanstalk, which offered a stablecoin, a type of cryptocurrency with a fixed value of $1. To their surprise, Beanstalk became an overnight sensation, attracting crypto speculators who viewed it as an exciting contribution to the experimental field of decentralized finance, or DeFi. \
Then it collapsed. In April, a hacker exploited a flaw in Beanstalk’s design to steal more than $180 million from users, one of a series of thefts this year targeting DeFi ventures. The morning of the hack, Mr. Weintraub, 24, was home for Passover in Montclair, N.J. He walked into his parents’ bedroom. \
“Wake up,” he said. “Beanstalk is dead.” \
Hackers have terrorized the crypto industry for years, stealing Bitcoin from online wallets and raiding the exchanges where investors buy and sell digital currencies. But the rapid proliferation of DeFi start-ups like Beanstalk has given rise to a new type of threat.\
These loosely regulated ventures allow people to borrow, lend and conduct other transactions without banks or brokers, relying instead on a system governed by code. Using DeFi software, investors can take out loans without revealing their identities or even undergoing a credit check. As the market surged last year, the emerging sector was hailed as the future of finance, a democratic alternative to Wall Street that would give amateur traders access to more capital. Crypto users entrusted roughly $100 billion in virtual currency to hundreds of DeFi projects.\
But some of the software was built on faulty code. This year, $2.2 billion in cryptocurrency has been stolen from DeFi projects, according to the crypto tracking firm Chainalysis, putting the overall industry on pace for its worst year of hacking losses.')

[{'summary_text': ' Ben Weintraub dropped out of college to pursue a career in cryptocurrencies . In April, a hacker exploited a flaw in Beanstalk’s design to steal more than $180 million from users . This year, $2.2 billion in cryptocurrency has been stolen from DeFi projects .'}]

# Translation

In [59]:
# The translator provides translation from one language to another.

translator = pipeline('translation', model='Helsinki-NLP/opus-mt-fr-en')
translator('Ce cours est produit par Hugging Face.')

Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.42k/1.42k [00:00<00:00, 543kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 301M/301M [00:05<00:00, 59.0MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 42.0/42.0 [00:00<00:00, 13.9kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 802k/802k [00:00<00:00, 4.68MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 778k/778k [00:00<00:00, 4.07MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1.34M/1.34M [00:00<00:00, 6.25MB/s]


[{'translation_text': 'This course is produced by Hugging Face.'}]

# Transformers

There are three main categories of transformer models
* GPT-like (auto-regressive)
* BERT-like (auto-encoding)
* BART/T5-like (sequence-to-sequence)

These are all language models that have been trained on tons of text data.  
They are self-supervised (humans are not needed to label the data). The labels are created automatically from the inputs.  
Better performance is achieved by increasing the model size, but that has a COST (time, compute resources, and carbon).  
* Choose a low carbon compute center
* Use pretrained model
* Fine-tune a model instead of training from scratch
* Start smaller and debug as you go
* Do a lit review to choose hyperparameter
* Do a random search instead of a grid search

## Transfer Learning

#### Pretraining

Pre-training - training a model from scratch.  
Weights are randomly initialized.  
Training starts without any prior knowledge.  


#### Fine-tuning

Acquire a pretrained language model, then perform additional training with a dataset specific to your task 

#### Transfer learning 
Initializing a model with another model's weights

GPT2 was pretrained on 40GB of internet text posted by users on Reddit.  
BERT was pretrained on the content of the English wikipedia and 11,000 unpublished books.  

Transfer learning is applied by dropping the head of the pretrained model while keeping its body.  
The pretrained model should be as similar as possible to the task it needs to be fine tuned on.  
The pretrained model transfers its knowledge but also any bias it contains. 

# Transformer Architecture

The transformer is based on the attention mechanism.  
The transformer architecture has two pieces: an encoder and a decoder.

Inputs: The encoder encodes text into numerical representations, creating embeddings or features.
These are fed into the decoder.  
The decoder decodes the representations from the encoder.  
It produces output probabilities.  
  
Attention layers tell the model to pay attention to specific words in the sentence you provide to it.  
A word has meaning but it is deeply affected by the context in which it appears.  
The attention mask can be used to prevent the model from paying attention to special words.  


**Architecture** - skeleton of the model, the definition of each layer and each operation that happen within the model.  
**Checkpoints** - weights that will be loaded in a given architecture

## Encoders

The encoder outputs a numerical representation for each word used as input.  
Outputs a feature vector (or feature tensor) for each word of the initial sequence.  
Each word in the initial sequence affects *every* word's representation.  
The vector holds the meaning of the word within the text.  


##### When to use?
* bi-directional - context from the left and the right
* good at extracting meaningful information
* they are very good at: sequence classification, question answering, masked language modeling, sentiment analysis, named entity recognition, extractive question
* natural language understanding
* BERT, RoBERTa, ALBERT

## Decoders

The decoder creates a feature tensor from initial sequence.  
The decoder outputs numerical representation.  
The self-attention mechanism is different from an encoder.  
It is using masked self-attention. It hides the values of context on the right.  
Words can only see words on their left side. The right side is hidden.  



##### When to use?
* uni-directional - access to their left OR right context
* great at causal tasks and generating sequences, guessing the next word in a sentence, text generation
* NLG - natural language generation 
* GPT-2, GPT Neo

## Sequence-to-sequence models

Encoder-decoder use both parts.  
The encoder outputs are used an input for the decoder. We also give the decoder the start of a sequence word.  
Using this representation and a prompt as input, the decoder generates a word.  
Throw away the encoder part.  
The word it has just output can now be used as an input to the decoder.  



The encoder takes care of understanding the sequence and extracting the context into a vector.  
The decoder takes carer of generating a sequence according to the understanding of the encoder.   
Output length is independent of input length in the encoder-decoder model.  
They also handle variable output lengths.  

##### When to use?
* many-to-many, translation, summarization, generative question answering
* weights are not necessarily shared across the encoder and decoder
* input distribution is different from output distribution
* BART, T5, Pegasus, ProphetNet

# Beware of Bias

#### The models aren't neutral. 
#### Depending upon the input data, the model could very easily generate sexist, racist, or homophobic content
#### You can't fine tune this away

In [72]:
from transformers import pipeline

unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])

result = unmasker("This woman works as a [MASK].")
print([r["token_str"] for r in result])

Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 216kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 440M/440M [00:10<00:00, 40.8MB/s]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading: 100%|█████████████████████████████████████████████████

['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']
['nurse', 'maid', 'teacher', 'waitress', 'prostitute']
