# Lets hug!
In this example we'll see how we can use the HuggingFace library, super useful for NLP

NOTE: HuggingFace is mostly done with Pytorch

1
1


In [5]:
import transformers
from transformers import pipeline #pipeline is an easy way to implement NLP tasks

#ipywidgets doesnt work well, let's avoid it
#import logging
#transformers.logging.get_verbosity = lambda: logging.NOTSET

Transformers is a very simple library to use, and quite powerful. Let's see a simple example for sentiment classification:

In [6]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
# This model only exists in PyTorch, so we use the `from_pt` flag to import that model in TensorFlow.
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

All the weights of TFBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [7]:
#Now let's use it

results = classifier(["NLP is amazing and so easy to do with HuggingFace!",
           "I don't like japanese food"])
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: 5 stars, with score: 0.8788
label: 2 stars, with score: 0.3993


With pipelines you don't even need to specify the model, if you don't want to

In [9]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I love machine learning and NLP so much!.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9997584223747253}]

The available pipelines in HuggingFace are:

* feature-extraction (get the vector representation of a text)
* fill-mask
* ner (named entity recognition)
* question-answering
* sentiment-analysis
* summarization
* text-generation
* translation
* zero-shot-classification

Zero-shot classification allows to assign a label of your choosing to a given text. "Zero-shot" comes from the
Let's give it a try:

In [10]:
clf = pipeline(task = 'zero-shot-classification')

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [12]:
clf("This notebook is an example of how to use HuggingFace",
    candidate_labels = ["adventure", "food", "learning"])

{'sequence': 'This notebook is an example of how to use HuggingFace',
 'labels': ['learning', 'adventure', 'food'],
 'scores': [0.8656226992607117, 0.1023477166891098, 0.032029617577791214]}

In [13]:
clf("Alita Battle Angel is full of adrenaline and axcitement!",
    candidate_labels = ["adventure", "food", "learning"])

{'sequence': 'Alita Battle Angel is full of adrenaline and axcitement!',
 'labels': ['adventure', 'learning', 'food'],
 'scores': [0.9895821213722229, 0.008084353059530258, 0.0023334897123277187]}

In [14]:
clf("Gordon Ramsey is a famous TV chef",
    candidate_labels = ["adventure", "food", "learning"])

{'sequence': 'Gordon Ramsey is a famous TV chef',
 'labels': ['food', 'learning', 'adventure'],
 'scores': [0.8565803170204163, 0.10174693167209625, 0.04167276993393898]}

Text generation is what it sounds like, it'll generate text from a given sentence

In [15]:
gen = pipeline(task = "text-generation")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [19]:
gen("It was a dark and stormy night. She approached slowly and said",
    max_length=50,
    num_return_sequences=3,)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'It was a dark and stormy night. She approached slowly and said in a low, whisper. She looked out a window, and her eyes saw the clouds. A big blue cloud. She looked at it then. She wanted to see why it'},
 {'generated_text': "It was a dark and stormy night. She approached slowly and said with a smile, 'You had better stay at home.' And she did.' In fact, the day the wedding was happening in the spring (March 11, 1992) she never"},
 {'generated_text': 'It was a dark and stormy night. She approached slowly and said she was asleep and that he was dead. She was about to burst into tears as he had broken away before that was. The wind was so strong, so heavy and so strong'}]

Question-answering finds the answers of a given question-answer pair

In [24]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Are we men or are we dancers?",
    context="My name is Francisco and I can be both a man and a dancer"
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

{'score': 0.17471382021903992,
 'start': 39,
 'end': 57,
 'answer': 'a man and a dancer'}

We can also build the pipeline itself. The first step in NLP analysis is
to use a tokenizer that will separate text into tokens.

In [25]:
from transformers import AutoTokenizer

checkpoint = "dccuchile/bert-base-spanish-wwm-cased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer


PreTrainedTokenizerFast(name_or_path='dccuchile/bert-base-spanish-wwm-cased', vocab_size=31002, model_max_len=512, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})

In [29]:
raw_inputs = [
    "No entiendo como crearon BETO. Me imagino que fue magia.",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

{'input_ids': tensor([[    4,  1125,  4419,  1184, 17692, 13065,  6524,  1009,  1369, 14237,
          1038,  1341, 10490,  1009,     5]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


HuggingFace trabaja con tensors, por lo que hay que indicar el tipo de resultado que deseas. "return_tensors='pt' " indica que estamos trabajando con Pytorch.

Ahora tenemos que bajar el modelo:

In [30]:
from transformers import AutoModel

model = AutoModel.from_pretrained(checkpoint)
model


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased and are newly initialized: ['bert.pooler.dense.bi

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(31002, 768, padding_idx=1)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

This model has as outputs the *features* or *hidden states* that can be used as you wish.
In particular, the model head can use it to create the actual final model output.


In [40]:
res = model( **inputs )
res[0].shape

torch.Size([1, 15, 768])

The first number is the batch size, i.e., the number of sentences processed (only one here).

The 2nd number is the tokens sequence length, 15 in this example.

The 3rd number is the hidden size, the vector dimension for each given input.

If we have a specific task we are working on, we can use a specific module . For instance, if we're working with classification:


In [44]:
from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
raw_inputs = [
    "I am a big fan of Resident Evil!",
    "I can't handle spicy food, it hurts my stomach."
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")


model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)


In [46]:
print(outputs.logits)

tensor([[-3.4606,  3.6981],
        [ 4.3463, -3.5400]], grad_fn=<AddmmBackward0>)


Logits are the raw output of the model. To have interpretation, we need to pass them to the last layer or *head* of the model
For classification, that is a sigmoid layer with crossentropy for loss.

In [49]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)
print( model.config.id2label )

#Indeed! First phrase was positive, second one was negative :)

tensor([[7.7749e-04, 9.9922e-01],
        [9.9962e-01, 3.7572e-04]], grad_fn=<SoftmaxBackward0>)
{0: 'NEGATIVE', 1: 'POSITIVE'}


Now you have the power of HuggingFace in your hands! Good luck :)


