### Working w/ pipelines

In [1]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace counrce my whole life.")

2024-03-04 16:22:58.265384: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-04 16:22:58.295098: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-04 16:22:58.295121: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-04 16:22:58.296170: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-04 16:22:58.301357: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-04 16:22:58.301829: I tensorflow/core/platform/cpu_feature_guard.cc:1

[{'label': 'POSITIVE', 'score': 0.8458861112594604}]

In [2]:
classifier([
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!"
])

[{'label': 'POSITIVE', 'score': 0.9598047137260437},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

### Zero-shot classification

In [3]:
classifier = pipeline("zero-shot-classification")
classifier(
   "This is a counrse about the Transformer library",
   candidate_labels=["education", "politics", "business"] 
)

No model was supplied, defaulted to FacebookAI/roberta-large-mnli and revision 130fb28 (https://huggingface.co/FacebookAI/roberta-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 688/688 [00:00<00:00, 2.41MB/s]
model.safetensors: 100%|██████████| 1.43G/1.43G [01:08<00:00, 20.9MB/s]
2024-03-04 16:56:57.813660: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 205885440 exceeds 10% of free system memory.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 151kB/s]
vocab.json: 100%|██████████| 899k/899k [00:

{'sequence': 'This is a counrse about the Transformer library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.48972809314727783, 0.3369224965572357, 0.17334936559200287]}

### Text generation

In [4]:
generator = pipeline("text-generation")
generator("In this cource, we will teach you how to")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 665/665 [00:00<00:00, 3.98MB/s]
model.safetensors: 100%|██████████| 548M/548M [00:04<00:00, 134MB/s] 
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 18.7kB/s]
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 3.15MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 922kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 7.97MB/s]
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gen

[{'generated_text': 'In this cource, we will teach you how to be very careful. After my death, and during my life in jail, it was only to help people at our service. It may not have taken to much to save lives, but I knew'}]

### Using any model from the Hub in a pipeline

In [5]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2
)

config.json: 100%|██████████| 762/762 [00:00<00:00, 3.99MB/s]
model.safetensors: 100%|██████████| 353M/353M [00:20<00:00, 16.9MB/s] 
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 131kB/s]
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 1.58MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 7.43MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 7.64MB/s]
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you ca

[{'generated_text': 'In this course, we will teach you how to make a successful decision. This course teaches you how to identify and motivate and motivate more students to become'},
 {'generated_text': 'In this course, we will teach you how to use two kinds of tools:\n\n\n\n\n\n\n\n\n\n\n\n\n\n'}]

### Mask filling

In [6]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you about <mask> models", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 480/480 [00:00<00:00, 2.60MB/s]
model.safetensors: 100%|██████████| 331M/331M [00:03<00:00, 94.9MB/s] 
All PyTorch model weights were used when initializing TFRobertaForMaskedLM.

All the weights of TFRobertaForMaskedLM were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForMaskedLM for predictions without further training.
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 155kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 1.36MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 105MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 7.85MB/s]


[{'score': 0.20977766811847687,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you about mathematical models'},
 {'score': 0.05309794843196869,
  'token': 27930,
  'token_str': ' predictive',
  'sequence': 'This course will teach you about predictive models'}]

### Named entity recognition

In [7]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 998/998 [00:00<00:00, 5.84MB/s]
model.safetensors: 100%|██████████| 1.33G/1.33G [01:02<00:00, 21.5MB/s]
All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.
tokenizer_config.json: 100%|██████████| 60.0/60.0 [00:00<00:00, 275kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.27MB/s]


[{'entity_group': 'PER',
  'score': 0.9986171,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9777994,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9889684,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

### Question answering

In [8]:
question_answer = pipeline("question-answering")
question_answer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn"
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 473/473 [00:00<00:00, 2.25MB/s]
model.safetensors: 100%|██████████| 261M/261M [00:12<00:00, 20.6MB/s] 
All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.
tokenizer_config.json: 100%|██████████| 29.0/29.0 [00:00<00:00, 149kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 642kB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 2.61MB/s]


{'score': 0.6949760317802429, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

### Summarization