<a href="https://colab.research.google.com/github/oliverguhr/deep-nlp-workshop/blob/main/workshop-high-level-nlp-samples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## How can I leverage State-of-the-Art Natural Language Models with only one line of code ?

Newly introduced in transformers v2.3.0, **pipelines** provides a high-level, easy to use,
API for doing inference over a variety of downstream-tasks, including: 

- ***Sentence Classification _(Sentiment Analysis)_***: Indicate if the overall sentence is either positive or negative, i.e. *binary classification task* or *logitic regression task*.
- ***Token Classification (Named Entity Recognition, Part-of-Speech tagging)***: For each sub-entities _(*tokens*)_ in the input, assign them a label, i.e. classification task.
- ***Question-Answering***: Provided a tuple (`question`, `context`) the model should find the span of text in `content` answering the `question`.
- ***Mask-Filling***: Suggests possible word(s) to fill the masked input with respect to the provided `context`.
- ***Summarization***: Summarizes the ``input`` article to a shorter article.
- ***Translation***: Translates the input from a language to another language.
- ***Feature Extraction***: Maps the input to a higher, multi-dimensional space learned from the data.

Pipelines encapsulate the overall process of every NLP process:
 
 1. *Tokenization*: Split the initial input into multiple sub-entities with ... properties (i.e. tokens).
 2. *Inference*: Maps every tokens into a more meaningful representation. 
 3. *Decoding*: Use the above representation to generate and/or extract the final output for the underlying task.

The overall API is exposed to the end-user through the `pipeline()` method with the following 
structure:

```python
from transformers import pipeline

# Using default model and tokenizer for the task
pipeline("<task-name>")

# Using a user-specified model
pipeline("<task-name>", model="<model_name>")

# Using custom model/tokenizer as str
pipeline('<task-name>', model='<model name>', tokenizer='<tokenizer_name>')
```

In [1]:
!pip install -q transformers

[K     |████████████████████████████████| 3.1 MB 4.3 MB/s 
[K     |████████████████████████████████| 895 kB 42.4 MB/s 
[K     |████████████████████████████████| 59 kB 7.2 MB/s 
[K     |████████████████████████████████| 3.3 MB 38.8 MB/s 
[K     |████████████████████████████████| 596 kB 44.2 MB/s 
[?25h

In [2]:
!nvidia-smi

Thu Nov 11 20:30:31 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [4]:
from __future__ import print_function
import ipywidgets as widgets
from transformers import pipeline

## 1. Sentence Classification - Sentiment Analysis

In [5]:
nlp_sentence_classif = pipeline('sentiment-analysis')
nlp_sentence_classif('Such a bad weather outside !')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

[{'label': 'NEGATIVE', 'score': 0.9998005032539368}]

In [8]:
nlp_sentence_classif('such a nice weather outside!')

[{'label': 'POSITIVE', 'score': 0.9997655749320984}]

In [9]:
nlp_sentence_classif('A black tree behind a black car.')

[{'label': 'NEGATIVE', 'score': 0.9803095459938049}]

In [14]:
nlp_sentence_classif('Das Wetter ist heute nice')

[{'label': 'POSITIVE', 'score': 0.9226218461990356}]

## 2. Token Classification - Named Entity Recognition

In [15]:
nlp_token_class = pipeline('ner')
nlp_token_class('Donald Trump is not any longer the president of the United States.')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

[{'end': 6,
  'entity': 'I-PER',
  'index': 1,
  'score': 0.9991111,
  'start': 0,
  'word': 'Donald'},
 {'end': 12,
  'entity': 'I-PER',
  'index': 2,
  'score': 0.9994508,
  'start': 7,
  'word': 'Trump'},
 {'end': 58,
  'entity': 'I-LOC',
  'index': 11,
  'score': 0.9996983,
  'start': 52,
  'word': 'United'},
 {'end': 65,
  'entity': 'I-LOC',
  'index': 12,
  'score': 0.9995801,
  'start': 59,
  'word': 'States'}]

## 3. Question Answering

In [16]:
nlp_qa = pipeline('question-answering')
nlp_qa(context='Joe Biden has won the elections in 2020. Donald Trump is not any longer the president of the United States.', question='Who is not any longer the president?')

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

{'answer': 'Donald Trump', 'end': 53, 'score': 0.9965042471885681, 'start': 41}

## 4. Text Generation - Mask Filling

In [17]:
nlp_fill = pipeline('fill-mask')
nlp_fill('There is a ' + nlp_fill.tokenizer.mask_token +' tree in our garden.')


No model was supplied, defaulted to distilroberta-base (https://huggingface.co/distilroberta-base)


Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/316M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

[{'score': 0.08593308925628662,
  'sequence': 'There is a maple tree in our garden.',
  'token': 27287,
  'token_str': ' maple'},
 {'score': 0.07691144198179245,
  'sequence': 'There is a Christmas tree in our garden.',
  'token': 1619,
  'token_str': ' Christmas'},
 {'score': 0.06446558237075806,
  'sequence': 'There is a mango tree in our garden.',
  'token': 32184,
  'token_str': ' mango'},
 {'score': 0.06433035433292389,
  'sequence': 'There is a cherry tree in our garden.',
  'token': 20075,
  'token_str': ' cherry'},
 {'score': 0.06248725578188896,
  'sequence': 'There is a pine tree in our garden.',
  'token': 22716,
  'token_str': ' pine'}]

In [35]:
results = nlp_fill(f'I think Donald Trump is {nlp_fill.tokenizer.mask_token}.')
for result in results:
  print(result["sequence"])

I think Donald Trump is wrong.
I think Donald Trump is right.
I think Donald Trump is delusional.
I think Donald Trump is crazy.
I think Donald Trump is nuts.


## 5. Summarization

Summarization is currently supported by `Bart` and `T5`.

In [19]:
TEXT_TO_SUMMARIZE = """ 
New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York. 
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband. 
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other. 
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage. 
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the 
2010 marriage license application, according to court documents. 
Prosecutors said the marriages were part of an immigration scam. 
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further. 
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective 
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002. 
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say. 
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages. 
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted. 
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s 
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali. 
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force. 
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

summarizer = pipeline('summarization')
summarizer(TEXT_TO_SUMMARIZE)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

[{'summary_text': ' Liana Barrientos pleaded not guilty to two counts of "offering a false instrument for filing in the first degree" She has been married to 10 men, nine of them between 1999 and 2002 . She is believed to still be married to four men, and at one time, she was married to eight men at once .'}]

In [21]:
little_red_riding_hood = """
Once upon a time there was a sweet little girl. Everyone who saw her liked her, but most of all her grandmother, who did not know what to give the child next. Once she gave her a little cap made of red velvet. Because it suited her so well, and she wanted to wear it all the time, she came to be known as Little Red Riding Hood. One day her mother said to her: "Come Little Red Riding Hood. Here is a piece of cake and a bottle of wine. Take them to your grandmother. She is sick and weak, and they will do her well. Mind your manners and give her my greetings. Behave yourself on the way, and do not leave the path, or you might fall down and break the glass, and then there will be nothing for your sick grandmother."

Little Red Riding Hood promised to obey her mother. The grandmother lived out in the woods, a half hour from the village. When Little Red Riding Hood entered the woods a wolf came up to her. She did not know what a wicked animal he was, and was not afraid of him. "Good day to you, Little Red Riding Hood." - "Thank you, wolf." - "Where are you going so early, Little Red Riding Hood?" - "To grandmother's." - "And what are you carrying under your apron?" - "Grandmother is sick and weak, and I am taking her some cake and wine. We baked yesterday, and they should give her strength." - "Little Red Riding Hood, just where does your grandmother live?" - "Her house is a good quarter hour from here in the woods, under the three large oak trees. There's a hedge of hazel bushes there. You must know the place," said Little Red Riding Hood. The wolf thought to himself: "Now there is a tasty bite for me. Just how are you going to catch her?" Then he said: "Listen, Little Red Riding Hood, haven't you seen the beautiful flowers that are blossoming in the woods? Why don't you go and take a look? And I don't believe you can hear how beautifully the birds are singing. You are walking along as though you were on your way to school in the village. It is very beautiful in the woods."

Little Red Riding Hood opened her eyes and saw the sunlight breaking through the trees and how the ground was covered with beautiful flowers. She thought: "If a take a bouquet to grandmother, she will be very pleased. Anyway, it is still early, and I'll be home on time." And she ran off into the woods looking for flowers. Each time she picked one she thought that she could see an even more beautiful one a little way off, and she ran after it, going further and further into the woods. But the wolf ran straight to the grandmother's house and knocked on the door. "Who's there?" - "Little Red Riding Hood. I'm bringing you some cake and wine. Open the door for me." - "Just press the latch," called out the grandmother. "I'm too weak to get up." The wolf pressed the latch, and the door opened. He stepped inside, went straight to the grandmother's bed, and ate her up. Then he took her clothes, put them on, and put her cap on his head. He got into her bed and pulled the curtains shut.
"""

summarizer(little_red_riding_hood)

[{'summary_text': " Once a sweet little girl was known as Little Red Riding Hood . Her grandmother lived out in the woods, a half hour from the village . A wolf came up to her and took her to her grandmother's house . The wolf ate her up in the grandmother's bed and pulled the curtains shut ."}]

In [22]:
# https://www.apple.com/mac/m1/
apple_m1 = """
M1 is here. Our first chip designed specifically for Mac, it delivers incredible performance, custom technologies, 
and revolutionary power efficiency. And it was designed from the very start to work with the most advanced desktop operating system in the world, macOS Big Sur. 
With a giant leap in performance per watt, every Mac with M1 is transformed into a completely different class of product. This isn’t an upgrade. It’s a breakthrough.

Until now, a Mac needed multiple chips to deliver all of its features — including the processor, I/O, security, and memory. With M1, these technologies are 
combined into a single system on a chip (SoC), delivering a new level of integration for more simplicity, more efficiency, and amazing performance. And with 
incredibly small transistors measured at an atomic scale, M1 is remarkably complex — packing the largest number of transistors we’ve ever put into a single 
chip. It’s also the first personal computer chip built using industry‑leading 5‑nanometer process technology.

M1 also features our unified memory architecture, or UMA. M1 unifies its high‑bandwidth, low‑latency memory into a single pool within a custom package. 
As a result, all of the technologies in the SoC can access the same data without copying it between multiple pools of memory. This dramatically improves 
performance and power efficiency. Video apps are snappier. Games are richer and more detailed. Image processing is lightning fast. And your entire system 
is more responsive.

The 8‑core CPU in M1 is by far the highest‑performance CPU we’ve ever built. Designed to crush tasks using the least amount of power, M1 features two types 
of cores: high performance and high efficiency. So from editing family photos to exporting iMovie videos for the web to managing huge RAW libraries in Lightroom 
to checking your email, M1 blazes right through it all — without blazing through battery life.

M1 features four performance cores, each designed to run a single task as efficiently as possible while maximizing performance. Our high‑performance 
core is the world’s fastest CPU core when it comes to low‑power silicon.3 And because M1 has four of them, multithreaded workloads take a huge leap 
in performance as well.

M1 has four efficiency cores to handle lighter workloads. They use a tenth of the power while still delivering outstanding performance. 
These e‑cores are the most efficient place to run lightweight tasks, allowing the performance cores to be used for your most demanding workflows.
"""

summarizer(apple_m1)

[{'summary_text': " M1 is Apple's first chip designed specifically for Mac, it delivers incredible performance, custom technologies, and revolutionary power efficiency . Until now, a Mac needed multiple chips to deliver all of its features — including processor, I/O, security, and memory . With M1, these technologies are combined into a single system on a chip (SoC)"}]

## 6. Translation

Translation is currently supported by `T5` for the language mappings English-to-French (`translation_en_to_fr`), English-to-German (`translation_en_to_de`) and English-to-Romanian (`translation_en_to_ro`).

In [23]:
# English to French
translator = pipeline('translation_en_to_fr')
translator("HuggingFace is a French company that is based in New York City HuggingFace's mission is to solve NLP one commit at a time")

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


Downloading:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/850M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

[{'translation_text': 'HuggingFace est une entreprise française établie à New York. La mission de HuggingFace est de résoudre les problèmes de PNL, un engagement à la fois.'}]

In [24]:
# English to German
translator = pipeline('translation_en_to_de')
translator("The history of natural language processing (NLP) generally started in the 1950s, although work can be found from earlier periods.")

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


[{'translation_text': 'Die Geschichte der natürlichen Sprachenverarbeitung (NLP) begann im Allgemeinen in den 1950er Jahren, obwohl Arbeit aus früheren Zeiten gefunden werden kann.'}]

In [25]:
translator("M1 is Apple's first chip designed specifically for Mac, it delivers incredible performance, custom technologies, and revolutionary power efficiency.")

[{'translation_text': 'M1 ist Apples erster speziell für Mac entwickelter Chip, er bietet unglaubliche Leistung, kundenspezifische Technologien und revolutionäre Leistungseffizienz.'}]

## 7. Text Generation

Text generation is currently supported by GPT-2, OpenAi-GPT, TransfoXL, XLNet, CTRL and Reformer.

In [26]:
text_generator = pipeline("text-generation")
text_generator("Today is a beautiful day and I will")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Today is a beautiful day and I will always be here for you this weekend. I look forward to making that night special again and for you guys to come back tomorrow for you to see this beautiful wedding. So far thank you to all of you for'}]

In [27]:
text_generator = pipeline("text-generation", model="dbmdz/german-gpt2")
text_generator("Heute ist ein schöner Tag und ich werde")

Downloading:   0%|          | 0.00/865 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/487M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.37M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Heute ist ein schöner Tag und ich werde alles für Sie tun.\nDie Arbeit wurde gestrichen.\nAlles klar.\nHaben Sie schon mit den anderen gearbeitet?\nIch konnte nur sehen, dass sie in der Wohnung arbeiteten.\nJetzt sind sie nicht'}]

In [28]:
text_generator = pipeline("text-generation", model="dbmdz/german-gpt2-faust")
text_generator("Heute ist ein schöner Tag und ich werde")

Downloading:   0%|          | 0.00/775 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/492M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/62.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/847k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/501k [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:52000 for open-end generation.


[{'generated_text': 'Heute ist ein schöner Tag und ich werde ihn nie vergessen. Allein wie traurig steigt die Flut heran! Man sagt, der König wolle mit wenig Federzügen warten, Und, wie die Sage Alter Zeitenreicht, Da wird ein großes Unglück mich ergötzen'}]

## 8. Projection - Features Extraction 

In [29]:
import numpy as np
nlp_features = pipeline('feature-extraction')
output = nlp_features('Hugging Face is a French company based in Paris')
np.array(output).shape   # (Samples, Tokens, Vector Size)


No model was supplied, defaulted to distilbert-base-cased (https://huggingface.co/distilbert-base-cased)


Downloading:   0%|          | 0.00/411 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/251M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertModel: ['vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

(1, 12, 768)

Alright ! Now you have a nice picture of what is possible through transformers' pipelines, and there is more
to come in future releases. 

In the meantime, you can try the different pipelines with your own inputs

In [30]:
task = widgets.Dropdown(
    options=['sentiment-analysis', 'ner', 'fill_mask'],
    value='ner',
    description='Task:',
    disabled=False
)

input = widgets.Text(
    value='',
    placeholder='Enter something',
    description='Your input:',
    disabled=False
)

def forward(_):
    if len(input.value) > 0: 
        if task.value == 'ner':
            output = nlp_token_class(input.value)
        elif task.value == 'sentiment-analysis':
            output = nlp_sentence_classif(input.value)
        else:
            if input.value.find('<mask>') == -1:
                output = nlp_fill(input.value + ' <mask>')
            else:
                output = nlp_fill(input.value)                
        print(output)

input.on_submit(forward)
display(task, input)

Dropdown(description='Task:', index=1, options=('sentiment-analysis', 'ner', 'fill_mask'), value='ner')

Text(value='', description='Your input:', placeholder='Enter something')

In [31]:
context = widgets.Textarea(
    value='Einstein is famous for the general theory of relativity',
    placeholder='Enter something',
    description='Context:',
    disabled=False
)

query = widgets.Text(
    value='Why is Einstein famous for ?',
    placeholder='Enter something',
    description='Question:',
    disabled=False
)

def forward(_):
    if len(context.value) > 0 and len(query.value) > 0: 
        output = nlp_qa(question=query.value, context=context.value)            
        print(output)

query.on_submit(forward)
display(context, query)

Textarea(value='Einstein is famous for the general theory of relativity', description='Context:', placeholder=…

Text(value='Why is Einstein famous for ?', description='Question:', placeholder='Enter something')