# Using pipelines

In [3]:
from transformers import pipeline

In [4]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 629/629 [00:00<00:00, 142kB/s]
Downloading: 100%|██████████| 268M/268M [00:06<00:00, 40.3MB/s] 
2022-11-18 14:55:54.239548: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your ta

In [7]:
test = classifier('Hello it is raining outside')
test

[{'label': 'POSITIVE', 'score': 0.9752295017242432}]

In [9]:
generator = pipeline('text-generation', model='distilgpt2')
generator('We will go outside',
max_length=30,
num_return_sequences=2)

Downloading: 100%|██████████| 762/762 [00:00<00:00, 437kB/s]
Downloading: 100%|██████████| 328M/328M [00:07<00:00, 41.7MB/s] 
All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at distilgpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Downloading: 100%|██████████| 1.04M/1.04M [00:00<00:00, 1.12MB/s]
Downloading: 100%|██████████| 456k/456k [00:00<00:00, 560kB/s]  
Downloading: 100%|██████████| 1.36M/1.36M [00:00<00:00, 1.52MB/s]
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


[{'generated_text': "We will go outside and you'll hear a lot about that and you'll even see a lot about it. You'll almost always get a new one"},
 {'generated_text': 'We will go outside and take care of this, and I don\'t think it will be a waste of time," he said. "We have to'}]

In [10]:
classifier = pipeline('zero-shot-classification')
classifier('This is about how to create your own GCP project.', candidate_labels=['education','politics','business'])

No model was supplied, defaulted to roberta-large-mnli and revision 130fb28 (https://huggingface.co/roberta-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 688/688 [00:00<00:00, 231kB/s]
Downloading: 100%|██████████| 1.43G/1.43G [00:24<00:00, 57.4MB/s]
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at roberta-large-mnli.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
Downloading: 100%|██████████| 899k/899k [00:00<00:00, 1.23MB/s] 
Downloading: 100%|██████████| 456k/456k [00:00<00:00, 1.01MB/s]
Downloading: 100%|██████████| 1.36M/1.36M [00:00<00:00, 2.19MB/s]


{'sequence': 'This is about how to create your own GCP project.',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.4043148159980774, 0.345976322889328, 0.24970892071723938]}

In [None]:
# pipeines: https://huggingface.co/docs/transformers/main_classes/pipelines

# Using componnents of pipelines (model, preprocessing, etc)

In [4]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

In [2]:
model_name = 'distilbert-base-uncased-finetuned-sst-2-english'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Downloading: 100%|██████████| 268M/268M [00:05<00:00, 47.8MB/s] 


In [6]:
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

In [7]:
classifier('It is snowing')

[{'label': 'POSITIVE', 'score': 0.9834219813346863}]

## Take a look at the tokenizer

In [10]:
sentence = 'The flowers are beautiful.'
print(sentence)

print(tokenizer(sentence))

tokens = tokenizer.tokenize(sentence)
print(tokens)

ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

decoded = tokenizer.decode(ids)
print(decoded)

The flowers are beautiful.
{'input_ids': [101, 1996, 4870, 2024, 3376, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
['the', 'flowers', 'are', 'beautiful', '.']
[1996, 4870, 2024, 3376, 1012]
the flowers are beautiful.
