# Transformers!

This notebook explains major Out-of-the-box functionalities of transformers built in HuggingFace library.

In [2]:
! pip install transformers

Collecting transformers
  Downloading transformers-4.19.2-py3-none-any.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 5.2 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 55.8 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 40.2 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.6.0-py3-none-any.whl (84 kB)
[K     |████████████████████████████████| 84 kB 2.8 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed huggingface-hub-0.6.0 py

In [3]:
import transformers

### Pipeline

First things first - pipelines in transformers library

The most basic object in the 🤗 Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer

In [5]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I'm really excited and really happy")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9998819828033447}]

In [7]:
# We can even pass a list of sentences

classifier(['Rain makes me happy!',
            'Rain makes me sad!'])

[{'label': 'POSITIVE', 'score': 0.9998798370361328},
 {'label': 'NEGATIVE', 'score': 0.992042601108551}]

Whenever we call the pipeline, it loads the default model for the task and it's cached. So model download only happens once.

There are three main steps involved when you pass some text to a pipeline:

1. The text is preprocessed into a format the model can understand.
2. The preprocessed inputs are passed to the model.
3. The predictions of the model are post-processed, so you can make sense of them.

Let's look at few other text analysis pipelines available

### Zero-shot classification

Text classification might be challenging in case of no or limited labelled training data. In such scenarios, Zero-Shot Classification might prove to be a powerful solution.

> This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!

In [11]:
from transformers import pipeline

zero_shot_classifier = pipeline('zero-shot-classification')

zero_shot_classifier('Keras and TensorFlow can be used to build really powerful networks',
                      candidate_labels = ['education', 'politics', 'sports']
                    )

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


{'labels': ['education', 'sports', 'politics'],
 'scores': [0.40961164236068726, 0.36124107241630554, 0.2291473001241684],
 'sequence': 'Keras and TensorFlow can be used to build really powerful networks'}

In [12]:
# What if we add Technology as the candidate label?

zero_shot_classifier('Keras and TensorFlow can be used to build really powerful networks',
                      candidate_labels = ['education', 'politics', 'sports', 'technology']
                    )

{'labels': ['technology', 'education', 'sports', 'politics'],
 'scores': [0.9920065402984619,
  0.0032742363400757313,
  0.0028875854332000017,
  0.0018316920613870025],
 'sequence': 'Keras and TensorFlow can be used to build really powerful networks'}

That's .... woww!!