# Transformers, what can they do?

The explanation of this notebook is in the Hugging Face course, chapter 1, section 3: [Transformers, what can they do?](https://huggingface.co/course/chapter1/3?fw=pt)

The original code of this notebook is in the Hugging Face's SageMaker repository: [section_3.ipynb](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter1/section3.ipynb)

## Run conditions

This notebook has been tested in the following environment:
- Environment: Project created in [Paperspace Gradient](https://gradient.paperspace.com) with Python 3.9.13.
- Machine: P5000 (30GiB RAM 8 CPU 16GiB GPU) (more details on [Paperspace Machines](https://docs.paperspace.com/gradient/machines/)).
- IDE: Visual Studio Code with remote Jupyter server.

## Install dependencies

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [2]:
# Install the libraries datasets v2.7.1, evaluate v0.3.0, and transformers v4.25.1 with quiet and upgrade flags.
%pip install -q datasets==2.7.1 evaluate==0.3.0 transformers==4.25.1 --upgrade
# Install PyTorch with quiet and upgrade options.
%pip install -q torch==1.13.0 --upgrade

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Sentiment analysis pipeline

In [3]:
# From transformers create a classifier from a sentiment analysis pipeline.
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
# Run the classifier on a sentence.
classifier("We are very happy to show you the 🤗 Transformers library.")

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9997795224189758}]

In [4]:
# Run the classifier on two sentences.
classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])

[{'label': 'POSITIVE', 'score': 0.9997795224189758},
 {'label': 'NEGATIVE', 'score': 0.5308621525764465}]

## Zero-shot pipeline

In [5]:
# From transformers create a zero-shot classifier with a sentence and three labels.
from transformers import pipeline

zero_shot_classifier = pipeline("zero-shot-classification")
sequence_to_classify = "This is a course about the Transformers library"
candidate_labels = ["education", "politics", "business"]
zero_shot_classifier(sequence_to_classify, candidate_labels)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445981740951538, 0.11197477579116821, 0.04342705383896828]}

## Text generation pipeline

In [6]:
# From transformers create a text generation pipeline.
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to", max_length=30, num_return_sequences=2)

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create and analyze data to help you solve problems and improve your workflow. You will become a strong entrepreneur'},
 {'generated_text': 'In this course, we will teach you how to construct a set of algorithms to implement a number of different aspects of programming - which will provide you with'}]

In [7]:
# From transformers create a text generation pipeline with a specific model.
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator("In this course, we will teach you how to", max_length=30, num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to think, create and manipulate different types of minds. This course is in cooperation with a team of psychologists'},
 {'generated_text': 'In this course, we will teach you how to use your favorite tools: WebKit. We will create a class to explain how you can use web'}]

## Mask filling pipeline

In [8]:
# From transformers create a mask filling pipeline for a sentence with a mask and two options.
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("Hello I'm a <mask> model.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.0467366948723793,
  'token': 265,
  'token_str': ' business',
  'sequence': "Hello I'm a business model."},
 {'score': 0.03846113383769989,
  'token': 18150,
  'token_str': ' freelance',
  'sequence': "Hello I'm a freelance model."}]

## Named entity recognition pipeline

In [9]:
# From transformers create a ner for a sentence with grouped entities.
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge which is visible from the window.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'entity_group': 'ORG',
  'score': 0.9970663,
  'word': 'Hugging Face Inc',
  'start': 0,
  'end': 16},
 {'entity_group': 'LOC',
  'score': 0.9993778,
  'word': 'New York City',
  'start': 40,
  'end': 53},
 {'entity_group': 'LOC',
  'score': 0.9571147,
  'word': 'DUMBO',
  'start': 79,
  'end': 84},
 {'entity_group': 'LOC',
  'score': 0.9838141,
  'word': 'Manhattan Bridge',
  'start': 114,
  'end': 130}]

## Question answering pipeline

In [10]:
# From transformers create a question answering pipeline for a question and a context.
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(question="What is the name of the repository ?",
                    context="Pipeline have been included in the huggingface/transformers repository")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.5135967135429382,
 'start': 35,
 'end': 59,
 'answer': 'huggingface/transformers'}

## Summarization pipeline

In [11]:
# From transformers create a summarization pipeline for a large article.
from transformers import pipeline

summarizer = pipeline("summarization")
ARTICLE = """
America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' America suffers an increasingly serious decline in the number of engineering graduates and a lack of well-educated engineers . Rapidly developing economies such as China and India, as well as other industrial countries, continue to encourage and advance the teaching of engineering .'}]

# Translation pipeline

In [12]:
# From transformers create a translation pipeline with Helsinki-NLP model to translate from English to Spanish.
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("Hugging Face is a technology company based in New York and Paris")



[{'translation_text': 'Hugging Face es una empresa tecnológica con sede en Nueva York y París.'}]