In [1]:
from transformers import pipeline #make sure to have torch installed

`pipeline()` is the most basic object int he Transformers library.
Connects a model with its necessary preprocessing and postprocessing steps

In [2]:
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

In [3]:
# we can use several sentences
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

When we call classifier without any parameters, it'll select a particular pretrained model that has been fine tuned for the task at hand, i.e. sentiment analysis in english.\
The model needs to be downloaded and cached where you create the `classifier` object. Rerunning the command will use the cached model (no need to redownload).

Three main steps if we pass text to a pipeline:
- Text is preprocessed into a format the model understands
- Preprocessed inputs are passed to the model
- Predictions of the model are post processed - helping us to make sense of them.

An overview of [available pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) are:

    feature-extraction (get the vector representation of a text)
    fill-mask
    ner (named entity recognition)
    question-answering
    sentiment-analysis
    summarization
    text-generation
    translation
    zero-shot-classification



## Zero-shot classification

Zero-shot classification uses a pretrained model and use your **own** labels.

In [4]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445998430252075, 0.11197364330291748, 0.04342653974890709]}

## Text generation

Juicy!\
The main idea is to provide a prompt and the model will auto-complete it by generating the remaining text. We can control how many different sequences are created with the argument `num_return_sequences` and the total length of the output text with the argument `max_length`.

In [5]:
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In this course, we will teach you how to create real-time interactive tools for both desktop and mobile computing. We'll discuss how to create an online dashboard-based web application as well as help you create interactive applications from simple scripts. At least"}]

In [6]:
generator = pipeline("text-generation", num_return_sequences=20)
generator("In this course, we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to build your own network over your desktop and mobile device.\n\nThis course will be provided by NetBSD.\n\n\nInstructions for the introductory computer course and computer related equipment is described later in this'},
 {'generated_text': 'In this course, we will teach you how to make your own tea in a manner similar to your own, and introduce you to the basics of a traditional Japanese tea recipe.'},
 {'generated_text': 'In this course, we will teach you how to use your hands to manipulate the shapes of the body using special materials, such as the film atlas of your feet. The material will change in length and shape over time so you can make your own'},
 {'generated_text': 'In this course, we will teach you how to become more comfortable in your social situations. From your job to your college to your day job, it is important for you to become more relaxed when dealing with your peers and with family and friends.\n'},
 {'g

In [7]:
generator = pipeline("text-generation", max_length=20)
generator("In this course, we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use Python objects, how to use C++ classes'}]

## Using any model from the Hub in a pipeline

Above we were using standard models (i.e. GPT2 for text generation), but we can use specific models for a pipeline.\
Models are listed in the [Model Hub](https://huggingface.co/models) page.\
In the example below we'll use the [distilgpt2](https://huggingface.co/distilgpt2) model.

In [8]:

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use a number of the key elements of the application. Most of which will be required to read and'},
 {'generated_text': 'In this course, we will teach you how to use your experience building software. In this course, you will learn how to create beautiful software.'}]