# Huggingface Pipeline Examples

- Huggingface is the most popular open-source library for natural language processing (NLP) models and tasks
- Pipeline is a high-level API tool provided by Huggingface that makes it easy to perform various NLP tasks using pre-trained models with minimal coding.

## Installation

PyTorch and tensorflow are the computational backend for training and running deep learning models like BERT, which is the backbone of the Transformers library. Therefore we need to install one of them to be able to use the transformers library. Make sure you have Pytorch installed and install the  `transformers` using `pip install transformers`.

## Example 1: Using BERT for masked-language modeling

One of the tasks used for pre-training BERT is masked language modeling (MLM). In MLM, a certain percentage of the tokens in a sentence are randomly replaced with a special token, such as [MASK], and the goal is to predict the original token(s) that were replaced. 

Since BERT was designed for this task, we can directly use the original pre-trained weights of BERT to perform MLM.

In [6]:
import tqdm as notebook_tqdm
from transformers import pipeline
model = pipeline("fill-mask", model="bert-base-uncased") # This will download and cache the model if it is not already downloaded (bert is approx 500MB)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [7]:
query_text = "The capital city of [MASK] is known for the Eiffel Tower."

results = model(query_text)

for result in results:
    print(f"{result['sequence']:<70} with score: {result['score']:.4f}%")

the capital city of paris is known for the eiffel tower.               with score: 0.3470%
the capital city of luxembourg is known for the eiffel tower.          with score: 0.2427%
the capital city of france is known for the eiffel tower.              with score: 0.1014%
the capital city of brussels is known for the eiffel tower.            with score: 0.0736%
the capital city of monaco is known for the eiffel tower.              with score: 0.0530%


## Example 2: Bert for Sentiment Analysis

Now let's try a different task. Sentiment analysis is a popular task in NLP. The original BERT model was not trained on this task. We need some model fine-tuned on sentiment analysis.
Let's use the distillbert model fine-tuned on the sst-2 dataset. Distilbert is a lighter version of BERT (half the size) and SST-2 is the "Stanford Sentiment Treebank", a dataset of 11,855 sentences of movie reviews where each sentence is labeled as either positive or negative.

In [4]:
from transformers import pipeline
model = pipeline(task='sentiment-analysis',model="distilbert-base-uncased-finetuned-sst-2-english") # This will download and cache the model if it is not already downloaded (distilbert is approx 250MB)

In [5]:

reviews = [
    "This is a fantastic movie!",
    "Meh, I didn't like it that much.",
    "Liked the acting, not much the plot.",
]

results = model(reviews)

for i,result in enumerate(results):
    print(f"{reviews[i]:<40} predicted label: {result['label']:<10} with score: {result['score']:.4f}%")

This is a fantastic movie!               predicted label: POSITIVE   with score: 0.9999%
Meh, I didn't like it that much.         predicted label: NEGATIVE   with score: 0.9989%
Liked the acting, not much the plot.     predicted label: POSITIVE   with score: 0.5962%
