<a href="https://colab.research.google.com/github/nikhil-1e9/hugging-face-nlp/blob/main/chapter1/section3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformers, what can they do?

## Install the Transformers, Datasets, and Evaluate libraries

In [22]:
# !pip install datasets evaluate transformers[sentencepiece]

## Sentiment analysis

In [23]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("There should be peace and harmony on Earth.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9986742734909058}]

In [4]:
classifier(
    ["That was the best day of my life.", "He is suffering from depression"]
)

[{'label': 'POSITIVE', 'score': 0.9998372793197632},
 {'label': 'NEGATIVE', 'score': 0.9987448453903198}]

## Zero Shot Classification

In [10]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "Hurry, there is a storm coming!",
    candidate_labels=["education", "politics", "business", "geography", "weather", "art", "technology"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'Hurry, there is a storm coming!',
 'labels': ['weather',
  'business',
  'geography',
  'technology',
  'politics',
  'education',
  'art'],
 'scores': [0.9494072794914246,
  0.02139625884592533,
  0.00732309278100729,
  0.007233645301312208,
  0.005403521936386824,
  0.004826264455914497,
  0.004409927874803543]}

If no model is supplied it automatically selects the default model. There are different default models for different tasks. For example - **GPT-2** is the default model for **text generation**.

## Text generation

In [9]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("ChatGPT has revolutionized the")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'ChatGPT has revolutionized the way we spend our money. With a focus on simplicity, high performance, and speed, our services have become popular worldwide. Today, almost 100 million users use us daily, but our technology is getting better daily thanks'}]

It just completes the sentence which has no relevance to ChatGPT because it does not know what it is.

The model to be used can be provided in the pipeline itself. Moreover, we can also specify the output length and number of different results by providing values for `max_length` and `num_return_sequences`

In [8]:
# Text generation using DistilGPT-2 model
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "Just writing some random sentence",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Just writing some random sentence for today's series. It does not just make it funny, it's funny. It also gives the series a fun and"},
 {'generated_text': 'Just writing some random sentence that we will always be happy to show them are true, yet the idea is simple and elegant.'}]

## Filling masked values

In [12]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("There is a huge <mask> between the rich and the poor.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.38455602526664734,
  'token': 4044,
  'token_str': ' gap',
  'sequence': 'There is a huge gap between the rich and the poor.'},
 {'score': 0.16631922125816345,
  'token': 35957,
  'token_str': ' gulf',
  'sequence': 'There is a huge gulf between the rich and the poor.'}]

Named Entity Recognition (NER)

In [15]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Jon Snow and I work at Game of Thrones in London.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.99814796,
  'word': 'Jon Snow',
  'start': 11,
  'end': 19},
 {'entity_group': 'ORG',
  'score': 0.97871894,
  'word': 'Game of Thrones',
  'start': 34,
  'end': 49},
 {'entity_group': 'LOC',
  'score': 0.998781,
  'word': 'London',
  'start': 53,
  'end': 59}]

## Question Answering

In [17]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="What is the color of the bag?",
    context="I am standing on 46th crossroad with a red color bag in my hand.",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.9564961194992065, 'start': 39, 'end': 42, 'answer': 'red'}

## Text Summarization

In [19]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    Philosophy of Education is a label applied to the study of the purpose, process, nature, and ideals of education.
    It can be considered a branch of both philosophy and education. Education can be defined as the teaching and
    learning of specific skills, and the imparting of knowledge, judgment, and wisdom, and is something broader than
    the societal institution of education we often speak of.

    Many educationalists consider it a weak and woolly field, too far removed from the practical applications of the
    real world to be useful. But philosophers dating back to Plato and the Ancient Greeks have given the area much
    thought and emphasis, and there is little doubt that their work has helped shape the practice of education over
    the millennia.

    Plato is the earliest important educational thinker, and education is an essential element in “The Republic” (his
    most important work on philosophy and political theory, written around 360 B.C.). In it, he advocates some rather
    extreme methods: removing children from their mothers’ care and raising them as wards of the state, and
    differentiating children suitable to the various castes, the highest receiving the most education, so that they
    could act as guardians of the city and care for the less able. He believed that education should be holistic,
    including facts, skills, physical discipline, music, and art. Plato believed that talent and intelligence are not
    distributed genetically and thus are found in children born to all classes, although his proposed system of
    selective public education for an educated minority of the population does not follow a democratic model.

    Aristotle considered human nature, habit, and reason to be equally important forces to be cultivated in education,
    the ultimate aim of which should be to produce good and virtuous citizens. He proposed that teachers lead their
    students systematically, and that repetition be used as a key tool to develop good habits, unlike Socrates’ emphasis
    on questioning his listeners to bring out their ideas. He emphasized the balancing of the theoretical and practical
    aspects of subjects taught, among which he explicitly mentions reading, writing, mathematics, music, physical
    education, literature, history, and a wide range of sciences, as well as play, which he also considered important.
    """
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' Philosophy of Education is a label applied to the study of the purpose, process, nature, and ideals of education . Many educationalists consider it a weak and woolly field, too far removed from the practical applications of the real world to be useful . But philosophers dating back to Plato and the Ancient Greeks have given the area much thought and emphasis .'}]

## Language translation

In [21]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Voici votre carte d'embarquement. L'embarquement aura lieu à la porte 3")

[{'translation_text': 'This is your boarding pass. The boarding will take place at Gate 3'}]