<a href="https://colab.research.google.com/github/iVibudh/Hugging-Face-Course/blob/main/notebooks/CER_00_intro_what_can_transformers_do.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformers, what can they do?

This is an introductory notebook to quickly go over the main tasks we can do with the `transformers` library.

We'll start by import the `pipeline` module. The `pipeline` module is the "higher-level" API for the library. It allows you to quickly use pre-trained models on a given task. All you need to do is specify the task you want to perform. However, as we'll see, you can also specify the model you want to use.

Notice in the next cell that we first instantiate the `pipeline` object, and then we classify the sentiment of a given sentence.

In [None]:
!pip install datasets transformers --quiet

# Imports

In [None]:
from transformers import pipeline
from time import time

In [None]:
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for the Huggingface course my whole life!")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9881473183631897}]

As we can see, the pre-trained model is downloaded from the Huggingface model hub, and then it is used to classify the sentiment of the sentence. In this case, the pre-trained model was a model from the BERT class of models: `DistilBERT-base-uncased`. The model was fine-tuned on English text with the task of classify the sentiment of a sentence.

Note: once the model has been downloaded, it is cached on your machine. This speeds up future runs of the notebook since you won't have to download it again.

Now, let's pass a list of strings to the classifier.

In [None]:
classifier(["I've been waiting for the Huggingface course my whole life!", 
"I hate this so much"])

[{'label': 'POSITIVE', 'score': 0.9881473183631897},
 {'label': 'NEGATIVE', 'score': 0.9995144605636597}]

There are three main steps involved when you pass some text to a pipeline:

1. Tokenization: The text is preprocessed into a format the model can understand.
2. Word vectors (tokenized/preprocessed inputs) do a forward-pass in the model.
3. The predictions of the model are post-processed, so you can make sense of them.

There are a ton of different tasks you can do with `pipeline`. Here are a few examples:

* feature-extraction (get the vector representation of a text)
* fill-mask
* ner (named entity recognition)
* question-answering
* sentiment-analysis
* summarization
* text-generation
* translation
* zero-shot-classification

We will cover some of these below.

What do you think that the next task does?

In [None]:
classifier = pipeline("zero-shot-classification")
classifier("This is a course about the Transformers library",
candidate_labels=["education", "politics", "business"])

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

{'labels': ['education', 'business', 'politics'],
 'scores': [0.8445994257926941, 0.11197391897439957, 0.043426647782325745],
 'sequence': 'This is a course about the Transformers library'}

Below we have the NER task, which allows us to recognize named entities in a text. We can see that 

In [None]:
ner = pipeline('ner', aggregation_strategy="simple")
ner("The current CEO of the CER is Gitane De Silva.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

[{'end': 26,
  'entity_group': 'ORG',
  'score': 0.9947985,
  'start': 23,
  'word': 'CER'},
 {'end': 45,
  'entity_group': 'PER',
  'score': 0.99895877,
  'start': 30,
  'word': 'Gitane De Silva'}]

In [None]:
question_answerer = pipeline('question-answering')
question_answerer(question="Which bachelor's degree did Gitane De Silva recieve?", 
context="""The current CEO of the CER is Gitane De Silva. Ms. De Silva became the Chief Executive Officer of the CER in August 2020. Prior to joining the CER, she was a Special Advisor at TransAlta Corporation. She previously served as Alberta's Senior Representative to the United States and as Deputy Minister for Alberta International and Intergovernmental Relations.

Before joining the Alberta Public Service, Ms. De Silva spent 12 years in Canada's Foreign Service as a specialist in Canada-U.S. relations, serving in a variety of roles, including as Consul General of Canada in Chicago and as Counsellor (Environment & Fisheries) at the Canadian Embassy in Washington, D.C. She also served as Deputy Head of Agency at Status of Women Canada.

Ms. De Silva has a Bachelor of Arts in International Relations from the University of British Columbia and is a 2013 recipient of The International Alliance for Women (TIAW) World of Difference Award.""",)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

{'answer': 'Bachelor of Arts in International Relations',
 'end': 801,
 'score': 0.9861729145050049,
 'start': 758}

You can also add the specific model you want to use with `pipeline`. Here we will generate text using a distilled (smaller version) of the `gpt2` model:

In [None]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/336M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create and manage all your projects and get there automatically.'},
 {'generated_text': 'In this course, we will teach you how to build a well built, smart cloud-based cloud environment that includes features like multi-cloud cloud services'}]

Here we will summarize Gitane's short biography!

In [None]:
# Inference on CPU
model_load_start_time = time()
summarizer = pipeline("summarization", device=-1)
model_load_end_time = time()

inference_start_time = time()
print(summarizer(
"""
Ms. De Silva became the Chief Executive Officer of the CER in August 2020. Prior to joining the CER, she was a Special Advisor at TransAlta Corporation. She previously served as Alberta's Senior Representative to the United States and as Deputy Minister for Alberta International and Intergovernmental Relations.

Before joining the Alberta Public Service, Ms. De Silva spent 12 years in Canada's Foreign Service as a specialist in Canada-U.S. relations, serving in a variety of roles, including as Consul General of Canada in Chicago and as Counsellor (Environment & Fisheries) at the Canadian Embassy in Washington, D.C. She also served as Deputy Head of Agency at Status of Women Canada.

Ms. De Silva has a Bachelor of Arts in International Relations from the University of British Columbia and is a 2013 recipient of The International Alliance for Women (TIAW) World of Difference Award.
"""))
inference_end_time = time()
print('Model load time for CPU inference: ', str(model_load_end_time - model_load_start_time))
print('CPU inference time: ', str(inference_end_time - inference_start_time))
print('Total time: ', str(inference_end_time - model_load_start_time))

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


[{'summary_text': " Ms. De Silva became the Chief Executive Officer of the CER in August 2020 . She previously served as Alberta's Senior Representative to the United States and as Deputy Minister for Alberta International and Intergovernmental Relations . She has a Bachelor of Arts in International Relations from the University of British Columbia ."}]
Model load time for CPU inference:  8.28551435470581
CPU inference time:  6.071478366851807
Total time:  14.357011556625366


However, all of the code that we've been running for now has been running on the CPU. If you don't specify the `device` argument in `pipeline`, the data will automatically be placed on the CPU (which is defined by `-1` in `pipeline`).

Essentially, the text data is converted into a series of word vectors via tokenization, and then those vectors are used to do calculations on the CPU. That is because that's where the model weights for the deep learning model are loaded! So, if we want to do inference/predictions with the help of a GPU, you can simply use `device=0` (the integer points to a specific GPU, it could be 1 if you want to use your second GPU).

Now, people won't typically use GPU inference in production unless you really need to speed up the inference time. This is because GPU compute costs a lot more. If you are using pure PyTorch, however, you have the added complexity that comes with loading data onto a GPU for inference, so people don't typically do this unless they really need to. However, a library like transformers makes it super easy to use the GPU for inference, so it's worth considering since there is no additional technical debt (well... except having to deal with CUDA).

For the CER, I will say that the most likely case where you want to use a GPU is when you have a lot of data you want to send to `pipeline` and it would simply take way too long to run inference on GPU. So, let's say you have 30k sentences from PDF documents you'd like to perform Named Entity Recognition, it would likely be worth it to run inference on GPU.

In [None]:
# Inference on GPU
model_load_start_time = time()
summarizer = pipeline("summarization", device=0)
model_load_end_time = time()

inference_start_time = time()
print(summarizer(
"""
Ms. De Silva became the Chief Executive Officer of the CER in August 2020. Prior to joining the CER, she was a Special Advisor at TransAlta Corporation. She previously served as Alberta's Senior Representative to the United States and as Deputy Minister for Alberta International and Intergovernmental Relations.

Before joining the Alberta Public Service, Ms. De Silva spent 12 years in Canada's Foreign Service as a specialist in Canada-U.S. relations, serving in a variety of roles, including as Consul General of Canada in Chicago and as Counsellor (Environment & Fisheries) at the Canadian Embassy in Washington, D.C. She also served as Deputy Head of Agency at Status of Women Canada.

Ms. De Silva has a Bachelor of Arts in International Relations from the University of British Columbia and is a 2013 recipient of The International Alliance for Women (TIAW) World of Difference Award.
"""))
inference_end_time = time()
print('Model load time for GPU inference: ', str(model_load_end_time - model_load_start_time))
print('GPU inference time: ', str(inference_end_time - inference_start_time))
print('Total time: ', str(inference_end_time - model_load_start_time))

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


[{'summary_text': " Ms. De Silva became the Chief Executive Officer of the CER in August 2020 . She previously served as Alberta's Senior Representative to the United States and as Deputy Minister for Alberta International and Intergovernmental Relations . She has a Bachelor of Arts in International Relations from the University of British Columbia ."}]
Model load time for GPU inference:  8.986631870269775
GPU inference time:  0.8285939693450928
Total time:  9.815245151519775


As you can see, the model takes just a bit longer to load the model weights onto the GPU, but the inference time is +7 times faster in this instance. Of course, we only asked it to summarize one piece of text, so I'm not certain how much faster it would be if we passed 1k examples. My suspicion is that it would either remain 7 times faster or even faster than that.