In [None]:
!pip install datasets evaluate transformers[sentencepiece]


In [4]:
#pipeline() function connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer
from transformers import pipeline

In [5]:

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for you. I hate this soo much")

# By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English.
# 3 main steps involved when u pass some text to a pipeline
# 1. Text is preprocessed
# 2. Preprocessed input is passed to the model
# 3. Predictions of the model are post processed to make sense out of them.

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.988263726234436}]

**Some Examples of Pipeline models**

In [7]:
# Zero shot classification
# to classify texts that have not been labelled
# it allows which labels to use for classification

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about transformers library",
    candidate_labels=['education', 'politics', 'business'],
)

# This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8946443200111389, 0.08065718412399292, 0.024698464199900627]}

In [9]:
# Text Generation
# provide a prompt and the model will auto-complete it by generating the remaining text.

generator = pipeline("text-generation")
generator("In this world, there is no")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this world, there is no God but God. The only truth is in God. No human form can stand beyond God. Only the most un-natural will is the answer. If God is God, He has to come first - first because'}]

In [10]:
# mask filling
# this model fill in the blanks in the text

unmasker = pipeline("fill-mask")
unmasker(
    "this course will teach you about <mask> models.",
    top_k=2
)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]

Downloading (…)olve/main/merges.txt: 0.00B [00:00, ?B/s]

Downloading (…)/main/tokenizer.json: 0.00B [00:00, ?B/s]

[{'score': 0.17697356641292572,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'this course will teach you about mathematical models.'},
 {'score': 0.05288225784897804,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'this course will teach you about computational models.'}]

In [12]:
# Name entity recognition
# Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.

ner = pipeline("ner", grouped_entities=True)
ner("My name is Hani and I work at Facebook in brooklyn")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt: 0.00B [00:00, ?B/s]



[{'entity_group': 'PER',
  'score': 0.99850345,
  'word': 'Hani',
  'start': 11,
  'end': 15},
 {'entity_group': 'ORG',
  'score': 0.99868757,
  'word': 'Facebook',
  'start': 30,
  'end': 38}]

In [13]:
# Summerization
# reducing the text into shorter text

summarizer = pipeline("summarization")
summarizer(
    """
     America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
    """
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json: 0.00B [00:00, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]

Downloading (…)olve/main/merges.txt: 0.00B [00:00, ?B/s]

[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil, electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India, as well as other industrial countries in Europe and Asia, continue to encourage and advance engineering .'}]

In [14]:
# translation

translator = pipeline("translation", model='Helsinki-NLP/opus-mt-fr-en')
translator("Ce cours est produit par Hugging Face.")

Downloading (…)lve/main/config.json: 0.00B [00:00, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading (…)olve/main/source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading (…)olve/main/target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]



[{'translation_text': 'This course is produced by Hugging Face.'}]