#Pre-Trained Models with Pipelines (Part II)

In this tutorial, let's continue to explore how to use pre-trained models from *transformers* library in a very convenient way - using *pipelines*.

Have fun!



In [2]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m42.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m115.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m73.4 MB/s[0m eta [36m0:00:

#1. Text Generation
Models trained for the classic language modeling task (also known as causal language modelling) can be used for text generation. In this pipeline, GPT-2 (124M) is used by default.

Let's try using GPT-2 Large with 774M parameters. You may notice that the model downloading time and execution time is much longer than the models we tried earlier.

In [1]:
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
device

device(type='cpu')

In [4]:
from transformers import pipeline
text_generator = pipeline("text-generation", model="gpt2-large", device=device)

Downloading pytorch_model.bin:   0%|          | 0.00/3.25G [00:00<?, ?B/s]

Downloading tf_model.h5:   0%|          | 0.00/3.10G [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.25G [00:00<?, ?B/s]

Downloading tf_model.h5:   0%|          | 0.00/3.10G [00:00<?, ?B/s]

ValueError: Could not load model gpt2-large with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>, <class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>, <class 'transformers.models.gpt2.modeling_tf_gpt2.TFGPT2LMHeadModel'>).

## 1.1 Generating using Greedy Search

In [None]:
text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=False)
print(text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


As far as I am concerned, I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not be a part of this. I will not


## 1.2 Bringing in random selection of the next word according to its conditional probability distribution

In [None]:

text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=True, num_return_sequences=3)
for t in text:
  print('-----')
  print(t['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


-----
As far as I am concerned, I will have to go back and look at how I implemented that decision."

McIlwraith said he did have to address the media.

"I was quite honest with the media that I couldn't afford to travel, because of my illness, to the game in Adelaide and, if I could have flown out, I wouldn't have been able to do anything," he said.

"So I can't tell you I didn't listen
-----
As far as I am concerned, I will have a complete understanding of each of the issues in order to take every possible action I should find applicable. And, what I know about their plans, I will have the time it takes to study it before I actually vote. I will not, however, blindly follow what I read in the Guardian as it will be impossible for me to do so. In the end, you will be the judge about what I am doing to help my country.


-----
As far as I am concerned, I will continue to be a vocal leader on the issues surrounding the future of the club and we are already hard at work building our club

## 1.3 Using beam search, other higher probability sequences get a chance, too. Try with different number of beams.

In [None]:

text = text_generator("As far as I am concerned, I will", max_length=100, num_beams=5, num_return_sequences=3)
for t in text:
  print('-----')
  print(t['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


-----
As far as I am concerned, I will do whatever is necessary to keep this country safe," he said.

"I will do whatever is necessary to keep this country safe. I will do whatever is necessary to keep this country safe. I will do whatever is necessary to keep this country safe. I will do whatever is necessary to keep this country safe. I will do whatever is necessary to keep this country safe. I will do whatever is necessary to keep this country safe. I will do
-----
As far as I am concerned, I will be the last person to say anything about it."

He added: "I don't want to say anything about it because I don't want to get involved in it.

"I don't want to get involved in it because I don't want to get involved in it. I don't want to get involved in it because I don't want to get involved in it. I don't want to get involved in it because I don
-----
As far as I am concerned, I will be the first to admit that I am not a fan of the current state of affairs in the US. I do not want to see 

## 1.4 Stopping the annoying repetition. Try different ngram sizes.

In [None]:

text = text_generator("As far as I am concerned, I will", max_length=100, num_beams=5, no_repeat_ngram_size=3, num_return_sequences=3)
for t in text:
  print('-----')
  print(t['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


-----
As far as I am concerned, I will do whatever I can to make sure that this does not happen again," she said.

"I'm not going to let this happen again. I'm going to do everything in my power to ensure that this doesn't happen to anyone else."

Topics: crime, law-crime-and-justice, crime-prevention, sydney-2000, nsw, australia

First posted
-----
As far as I am concerned, I will continue to do everything in my power to make sure that we do not have a repeat of what happened in Charlottesville."

Trump, who has been criticized for his response to the violence in Charlottesville, said in a statement that he condemned "in the strongest possible terms this egregious display of hatred, bigotry and violence on many sides. On many sides!"

"Racism is evil, and those who cause violence in its name are criminals and thugs
-----
As far as I am concerned, I will never be able to go back to the way things were.

"I have to move on. I have to get on with my life. I can't live in the past."


## 1.5 Sampling can be helpful to avoid boredom. Let's try TopK Sampling

In [None]:

text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=True, top_k=10, num_return_sequences=3)
for t in text:
  print('-----')
  print(t['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


-----
As far as I am concerned, I will always remain a member of that club."

The club has not commented on any of that and there have been no comments from the club regarding the incident.
-----
As far as I am concerned, I will always be loyal to the team," said the veteran center. "I want to stay with the team, I want to be around the team and I want to help the team. And I want to help this organization. I want to help this team win."

The Capitals' new captain, Alexander Ovechkin, is also eager to stay with the Capitals.

"Yeah I am, yeah I am, I am," he said
-----
As far as I am concerned, I will take it all the way, to the end of the battle."

"I am glad to hear this, Captain," said the chief, "because the last thing that I want is to lose you at once, for you are my most valuable asset. You will have to take charge of the fleet."

"I will not leave my men in the hands of those who do not belong to the navy."

"You will have to


## 1.6 And Top P Sampling

In [None]:
text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=True, top_p=0.9, num_return_sequences=3)
for t in text:
  print('-----')
  print(t['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


-----
As far as I am concerned, I will continue to fight it. I will continue to fight it every single day, because I don't care what the court says, I care about my children. The court is saying that I don't have the right to make these decisions for myself," she said, adding she will ask her lawyer to take the case to the next level.

The Supreme Court in a ruling on Jan 15 had ruled that the parents could keep two of their children, who
-----
As far as I am concerned, I will not be in the team that has a chance to get to the finals or even a quarter-final if I am not fit," he said.

The World Cup is just four weeks away. But for the young gun, the prospect of playing for Australia or Australia A is still daunting.

"It's been a long process and every moment feels like a big loss. Now, it is my time to come out, show what I can
-----
As far as I am concerned, I will never stop fighting the battle of life. My heart will never be satisfied. I will not stop until we kill this plague in e

#2. Text Summarization
To summarize a long text/article into a shorter text. Here the pipeline by default uses a Bart model that was fine-tuned on the CNN / Daily Mail data set.

In [None]:
#=====summarization
from transformers import pipeline
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
import textwrap
ARTICLE = """Democrats formally nominated Joe Biden for president on Tuesday (Aug 18), with elder statesmen and rising stars promising he would  repair a pandemic-devastated America and end the chaos of Republican President Donald Trump.
The convention's second night, under the theme "Leadership Matters", aimed to make the case that Biden would represent a return to normalcy.
"At a time like this, the Oval Office should be a command centre," former US President Bill Clinton said in a prerecorded video.
"Instead, it's a storm centre. There's only chaos. Just one thing never changes - his determination to deny responsibility and shift the blame."
With the four-day convention largely virtual due to the coronavirus, delegates from around the country cast votes remotely to confirm Biden as the nominee.
In clips from around the country, Democrats of all stripes explained why they were supporting Biden while putting their own state-specific spin on the proceedings, from a calamari appetiser in Rhode Island to a herd of cattle in Montana.
Following his home state of Delaware, which went last in his honor, Biden appeared live for the first time at a Delaware school, where his wife, Jill, was set to deliver the night's headline address later in the evening.
"Thank you very, very much from the bottom of my heart," said Biden, who will deliver his acceptance speech on Thursday. "It means the world to me and my family."
Democratic presidential candidate and former Vice President Joe Biden and running mate Senator Kamala Harris are seen on screen at virtual 2020 Democratic Convention hosted from Milwaukee, Wisconsin.
The programme started by showcasing some of the party's rising politicians. But rather than a single keynote speech that could be a star-making turn, as it was for then-state Senator Barack Obama in 2004, the programme featured 17 stars in a video address, including Stacey Abrams, the one-time Georgia gubernatorial nominee whom Biden once considered for a running mate.
"America faces a triple threat: A public health catastrophe, and economic collapse and a reckoning with racial justice and inequality," Abrams said.
"So our choice is clear: A steady experienced public servant who can lead us out of this crisis just like he's done before, or a man who only knows how to deny and distract."
As they did on Monday's opening night, Democrats featured a handful of Republicans who have crossed party lines to praise Biden, 77, over Trump, 74, ahead of the Nov 3 election.
Cindy McCain, widow of Republican Senator John McCain, was scheduled to appear in a video talking about her husband's long friendship with Biden, according to a preview posted online. Trump clashed with McCain, who was the Republican nominee for president in 2008, and the president criticised McCain even after his 2018 death.
Republican former Secretary of State Colin Powell, a retired four-star general who endorsed Biden in June, was one of several national security officials due to speak on the Democrat's behalf.
"Our country needs a commander in chief who takes care of our troops in the same way he would his own family," he said.
“He will trust our diplomats and our intelligence community, not the flattery of dictators and despots. He will make it his job to know when anyone dares to threaten us. He will stand up to our adversaries with strength and experience. They will know he means business.”
Democratic former Secretary of State John Kerry said of Trump: "When this president goes overseas, it isn’t a goodwill mission, it’s a blooper reel. He breaks up with our allies and writes love letters to dictators. America deserves a president who is looked up to, not laughed at."
Biden's vice presidential pick, Senator Kamala Harris, will headline Wednesday night's programme along with Obama.
Without the cheering crowds at the in-person gathering originally planned for Milwaukee, Wisconsin, TV viewership on Monday was down from 2016. But an additional 10.2 million people watched on digital platforms, the Biden campaign said, for a total audience of nearly 30 million.
Aiming to draw attention away from Biden, Trump, trailing in opinion polls, held a campaign rally in Arizona, a hotly contested battleground state that can swing to either party and play a decisive role in the election.
The convention was being held amid worries about the safety of in-person voting. Democrats have pushed mail-in ballots as an alternative and pressured the head of the US Postal Service, a top Trump donor, to suspend cost cuts that delayed mail deliveries.
Bowing to that pressure, Postmaster General Louis DeJoy put off the cost-cutting measures until after the election.
"""


In [None]:
result = summarizer(ARTICLE, max_length=100, min_length=20, do_sample=False)


In [None]:
print(textwrap.fill(result[0]['summary_text'], width=100))

 Democrats formally nominate Joe Biden for president on Tuesday (Aug 18), with elder statesmen and
rising stars promising he would repair a pandemic-devastated America and end the chaos of Republican
President Donald Trump . Biden appeared live for the first time at a Delaware school, where his
wife, Jill, was set to deliver the night's headline address later in the evening . Biden's vice
presidential pick, Senator Kamala Harris, will headline Wednesday night's programme along with Obama
.


We can also use "t5" for summarization task.

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [None]:
# T5 uses a max_length of 512 so we cut the article to 512 tokens.
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=100, min_length=20, repetition_penalty=2.5, length_penalty=1.0, num_beams=2, early_stopping=True)
print(outputs)

tensor([[    0,     3, 22878,    45,   300,     8,   684,  4061, 11839, 20081,
            12,  3606,  4967,  2106,   537,    38,  2753,     3,     5,    96,
           155,   598,     8,   296,    12,   140,    11,    82,   384,   976,
           845,  2647,   537,     6,   113,    56,  2156, 11122,  5023,    30,
          2721,     3,     5,  4291,  8346,  6523,    57, 11064,    18,    89,
            23,  6079,     3,    17,   208,  1229,     3,    75,    29,    29,
            19,   294,    13,  6503,  2066,     3,     5,     1]])


In [None]:

print(textwrap.fill(tokenizer.decode(outputs[0]), width=100))

<pad> delegates from around the country cast votes remotely to confirm Joe Biden as president. "it
means the world to me and my family," says biden, who will deliver acceptance speech on Thursday.
virtual convention hosted by wi-fi giant tv network cnn is part of 2020 campaign.</s>


You can try both models on other texts.



#3. Machine Translation
T5 supports machine translation between English and several European languages, like French, German, etc.

In [None]:
#=====translation===
from transformers import pipeline
translator = pipeline("translation_en_to_fr")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


Downloading:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/850M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

[{'translation_text': 'Hugging Face est une entreprise technologique basée à New York et à Paris.'}]


In [None]:
translator = pipeline("translation_en_to_de")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]


# 4. Conversation
With models trained on dialogue data, conversational responses can be generated based on user inputs.

In [None]:
from transformers import pipeline, Conversation
chat = pipeline("conversational", model="microsoft/DialoGPT-large")

Downloading (…)lve/main/config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.75G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:

conversation_1 = Conversation("Going to the movies tonight - any suggestions?")
conversation_2 = Conversation("What's the last book you have read?")
chat([conversation_1, conversation_2])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[Conversation id: b82061cf-00ab-4932-bee8-d2118ee612d7 
 user >> Going to the movies tonight - any suggestions? 
 bot >> The Last Airbender ,
 Conversation id: f6952112-79b6-42ce-9ddc-e326537c1a02 
 user >> What's the last book you have read? 
 bot >> The Last Book of the Jedi ]

In [None]:
conversation_1.add_user_input("Is it an action movie?")
conversation_2.add_user_input("What is the genre of this book?")

chat([conversation_1, conversation_2])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[Conversation id: b82061cf-00ab-4932-bee8-d2118ee612d7 
 user >> Going to the movies tonight - any suggestions? 
 bot >> The Last Airbender 
 user >> Is it an action movie? 
 bot >> It's a comedy. ,
 Conversation id: f6952112-79b6-42ce-9ddc-e326537c1a02 
 user >> What's the last book you have read? 
 bot >> The Last Book of the Jedi 
 user >> What is the genre of this book? 
 bot >> It's a Star Wars novel. ]

In [None]:
conversation_1.add_user_input("Can you recommend an action movie?")
chat([conversation_1])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Conversation id: b82061cf-00ab-4932-bee8-d2118ee612d7 
user >> Going to the movies tonight - any suggestions? 
bot >> The Last Airbender 
user >> Is it an action movie? 
bot >> It's a comedy. 
user >> Can you recommend an action movie? 
bot >> The Last Airbender 

In [None]:
conversation_1.add_user_input("But you said it's a comedy? I want to watch an action movie.")
chat([conversation_1])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Conversation id: b82061cf-00ab-4932-bee8-d2118ee612d7 
user >> Going to the movies tonight - any suggestions? 
bot >> The Last Airbender 
user >> Is it an action movie? 
bot >> It's a comedy. 
user >> Can you recommend an action movie? 
bot >> The Last Airbender 
user >> But you said it's a comedy? I want to watch an action movie. 
bot >> The Last Airbender 

#5. Zero-shot Classification
Zero-shot classification allows us to perform text classification on any labels without going through task-specific finetuning using labeled data.

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
classifier(
    "Singapore new private home sales decline at slower pace than launches in December",
    candidate_labels=["politics", "business", "sports", "technology", "entertainment"],
)

{'labels': ['business', 'entertainment', 'technology', 'sports', 'politics'],
 'scores': [0.7892554998397827,
  0.09854736924171448,
  0.05912058427929878,
  0.028882605955004692,
  0.024193953722715378],
 'sequence': 'Singapore new private home sales decline at slower pace than launches in December'}

In [None]:
classifier(
    "Spicy Peanut Chicken Stir-Fry",
    candidate_labels=["Korean", "Chinese", "Western", "Mediterranean"],
)

{'labels': ['Chinese', 'Korean', 'Western', 'Mediterranean'],
 'scores': [0.7245069146156311,
  0.15263397991657257,
  0.07127302885055542,
  0.051586002111434937],
 'sequence': 'Spicy Peanut Chicken Stir-Fry'}

In [None]:
classifier(
    "renal severe, vision is good, fall risk is high",
    candidate_labels=["good vision", "bad vision"],
)

{'labels': ['good vision', 'bad vision'],
 'scores': [0.9932595491409302, 0.006740417797118425],
 'sequence': 'renal severe, vision is good, fall risk is high'}

#Reference
Transformers documentations: https://huggingface.co/transformers/index.html