# Sentiment analysis example

In [1]:
from transformers import (
    pipeline,
    TFAutoModelForSequenceClassification,
    AutoTokenizer
)
from pprint import pprint

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.






All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
Device set to use 0


In [3]:
sentence = "I've been waiting for a HuggingFace course my whole life."
res = classifier(sentence)
print(res)

[{'label': 'POSITIVE', 'score': 0.9598047137260437}]


In [4]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
Device set to use 0


In [5]:
res = classifier(sentence)
print(res)

[{'label': 'POSITIVE', 'score': 0.9598047137260437}]


In [6]:
res = tokenizer(sentence)
pprint(res)

{'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 'input_ids': [101,
               1045,
               1005,
               2310,
               2042,
               3403,
               2005,
               1037,
               17662,
               12172,
               2607,
               2026,
               2878,
               2166,
               1012,
               102]}


In [7]:
tokens = tokenizer.tokenize(sentence)
print(tokens)

['i', "'", 've', 'been', 'waiting', 'for', 'a', 'hugging', '##face', 'course', 'my', 'whole', 'life', '.']


In [8]:
ids = tokenizer.convert_tokens_to_ids(tokens)
pprint(ids)

[1045,
 1005,
 2310,
 2042,
 3403,
 2005,
 1037,
 17662,
 12172,
 2607,
 2026,
 2878,
 2166,
 1012]


In [9]:
decoded_string = tokenizer.decode(ids)
print(decoded_string)
decoded_string = tokenizer.decode(res.input_ids)
print(decoded_string)

i've been waiting for a huggingface course my whole life.
[CLS] i've been waiting for a huggingface course my whole life. [SEP]


# Text generation example

In [10]:
generator = pipeline("text-generation", model="distilgpt2")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Device set to use 0


In [11]:
input_sentence = "In this course we will teach you how to"

res = generator(
    input_sentence,
    max_length=30,
    num_return_sequences=2,
)

pprint(res)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course we will teach you how to successfully '
                    'communicate with other people within the network, such as '
                    'with other people. It\u200a should be noted'},
 {'generated_text': 'In this course we will teach you how to become as '
                    'effective as possible by using the best teaching '
                    'materials you have ever heard by professional and '
                    'personal tutor.'}]


In [12]:
input_sentence = "to be or not to be"

res = generator(
    input_sentence,
    max_length=30,
    num_return_sequences=3,
)

pprint(res)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'to be or not to be seen, or that to be taken under some '
                    'circumstances.\n'
                    '\n'
                    '\n'
                    '\n'
                    '\n'
                    '\n'
                    'The rules governing a woman, if'},
 {'generated_text': 'to be or not to be the most intelligent person you '
                    'know.›\n'
                    '\n'
                    '\n'
                    'When a woman is seen staring at her for what might be'},
 {'generated_text': 'to be or not to be."\n'
                    '\n'
                    '\n'
                    '\n'
                    '\n'
                    '\n'
                    '"It is a matter of time before the authorities or '
                    'politicians, or the elected officials'}]


# Zero shot classification

In [13]:
classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to FacebookAI/roberta-large-mnli and revision 2a8f12d (https://huggingface.co/FacebookAI/roberta-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
Device set to use 0


In [14]:
sentence = "I am looking for a course on natural language processing"
candidate_labels = ["education", "politics", "business"]

res = classifier(
    sentence,
    candidate_labels
)

print(res)

{'sequence': 'I am looking for a course on natural language processing', 'labels': ['education', 'politics', 'business'], 'scores': [0.7484874129295349, 0.1414242684841156, 0.11008834093809128]}


# GPT2

In [15]:
# Use a pipeline as a high-level helper
pipe = pipeline("text-generation", model="openai-community/gpt2")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Device set to use 0


In [16]:
pipe('hi!')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'hi! Just like you know what happens in real life?"\n\nShen Qing was completely silent and her eyes did not waver.\n\nThe little dragon smiled once: "That\'s right! Although in a city where even death is rare'}]

# facebook/blenderbot-400M-distill

In [17]:
# Use a pipeline as a high-level helper
pipe = pipeline("text2text-generation", model="facebook/blenderbot-400M-distill")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All model checkpoint layers were used when initializing TFBlenderbotForConditionalGeneration.

Some layers of TFBlenderbotForConditionalGeneration were not initialized from the model checkpoint at facebook/blenderbot-400M-distill and are newly initialized: ['final_logits_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use 0


In [18]:
pipe('Hi!')

[{'generated_text': ' Hello! How are you doing today? I just got back from walking my dog, how about you?'}]