# Workbook NLP Course Hugging Face

https://huggingface.co/learn/nlp-course/chapter1/1

# 1. Transformer Models

## Transformers, what can they do?

https://huggingface.co/learn/nlp-course/chapter1/3?fw=pt#transformers-what-can-they-do

The most basic object in the Hugging Face Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer:

Some of the currently available pipelines are:

- feature-extraction (get the vector representation of a text)
- fill-mask
- ner (named entity recognition)
- question-answering
- sentiment-analysis
- summarization
- text-generation
- translation
- zero-shot-classification

### Some examples of using a pipeline:

#### Sentiment analysis: simple usage with one input sequence

In [1]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

#### Sentiment analysis: simple usage with one input sequence given by the user

In [32]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
text = input("Type the desired sentence for the sentiment analysis:")
classifier(text)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Type the desired sentence for the sentiment analysis: The fish in my tank are swimming from left to right all the time.


[{'label': 'NEGATIVE', 'score': 0.9445313811302185}]

#### Sentiment analysis: Specifying the model/checkpoint, using a list of sequences as input and outputting to a pandas dataframe
**Note**: we have to use padding=True now, because the model can only work with tensors of same size.  
**Note**: truncation=True is not necessary, as long as the sequences are shorter than the model's max sequence length.

In [8]:
from transformers import pipeline
import pandas as pd

classifier = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english", task="sentiment-analysis", padding=True, truncation=True)
text = ["I love pizza!", "My mom is the best!", "Getting old is nothing for beginners!"]
outputs = classifier(text)
result = pd.DataFrame(outputs)  # create a dataframe from the models output
result.insert(0, 'sequence', text)  # adding the input texts as first column to the dataframe
result

Unnamed: 0,sequence,label,score
0,I love pizza!,POSITIVE,0.999813
1,My mom is the best!,POSITIVE,0.999877
2,Getting old is nothing for beginners!,NEGATIVE,0.999535


#### Zero-shot classification: simple usage with one input sequence

In [47]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
outputs = classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)
outputs

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445996046066284, 0.11197374761104584, 0.04342668503522873]}

#### Zero-shot classification: simple usage with output to a dataframe

In [104]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

sequence = "To deliver on infrastructure transition at speed and scale, we put digitalization and technology at the heart of our approach and empower our customers to scale sustainable impact. Together, we create energy efficiency â€“ through CO2 transparency, renewable integration, and electrification. We help customers to improve asset performance, availability, and reliability, through resource-efficient and circular products which optimize production and supply chains throughout their entire lifecycle. We enable them to offer safe and comfortable environments that understand and adapt to the needs of their users."
candidate_labels=["technology", "politics", "business", "leisure"]

outputs = classifier(sequence, candidate_labels)
pd.DataFrame(outputs)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Unnamed: 0,sequence,labels,scores
0,To deliver on infrastructure transition at spe...,technology,0.863212
1,To deliver on infrastructure transition at spe...,business,0.122046
2,To deliver on infrastructure transition at spe...,leisure,0.010227
3,To deliver on infrastructure transition at spe...,politics,0.004514


#### Zero-shot classification: multiple sequences, unpack output to / from a dataframe

In [105]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

sequence = ["My mom is fat", "My dad is in prison", "I stole the car of my friend", "I love crack"]
candidate_labels = ["crime", "fiction", "family"]

outputs = classifier(sequence, candidate_labels)

results = pd.concat([pd.DataFrame(x) for x in outputs])
results.index = range(len(x))
results

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Unnamed: 0,sequence,labels,scores
0,My mom is fat,family,0.985589
1,My mom is fat,fiction,0.008131
2,My mom is fat,crime,0.006281
3,My dad is in prison,family,0.724129
4,My dad is in prison,crime,0.269179
5,My dad is in prison,fiction,0.006692
6,I stole the car of my friend,crime,0.963348
7,I stole the car of my friend,family,0.020569
8,I stole the car of my friend,fiction,0.016084
9,I love crack,crime,0.718623
