# Session 10 - Using BERT-style models via ```Huggingface```

In the lecture today, we saw how exploring the different layers and self-attention heads in BERT-style models can gives us a more nuanced breakdown of how the model has performed and what it has learned.

There are three main tools which can be used for this task:

- BERTviz
    - https://github.com/jessevig/bertviz
- Ecco
    - https://github.com/jalammar/ecco
- Language Interpretability Toolkit (LIT)
    - https://github.com/PAIR-code/lit

Each of these has empirical results in peer reviewed journals as evidence of robustness, but each does something a little different. Feel free to explore them in this class, or in your own time.

A second thing we saw was that BERT (and BERT-style) models can be *finetuned* in order to perform specific tasks. In this class, we're going to see how this can be used for the purposes of cultural data science. To do this, we're going to be using the library called ```HuggingFace``` or sometimes just ```🤗```.

## Creating ```HuggingFace``` pipelines

We're specifically going to use the ```pipelines()``` abstraction in HuggingFace. This allows us to load a finetuned model, initialize it with the necessary requirements, and use it for the specific task for which it was finetuned. You can read more [here](https://huggingface.co/docs/transformers/v4.27.2/en/task_summary#natural-language-processing).

We're going to use the ```text-classification``` pipeline in this class (and [Assignment 4](https://classroom.github.com/a/BhnScEmU)).

In [3]:
from transformers import pipeline

### Text classification

To begin with, let's use the defaul sentiment classification model to see how we can return a binary sentiment classification for a document.

In [4]:
classifier = pipeline(task="sentiment-analysis") # defining the task as "sentiment-analysis"

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [5]:
preds = classifier("Hugging Face is the best thing since sliced bread!") #using the classifier object on a string

In [6]:
print(preds) # the label is positive and its score. Its an accuracy score. So its positive with 0.99%. 

[{'label': 'POSITIVE', 'score': 0.9990912675857544}]


### Question answering

We can also use BERT-style models for much more complex texts, such as *question answering*. Again, there's a ```HuggingFace``` pipeline for this!

Let's start by defining a text we want to use as our *context*:

In [7]:
text = "In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all previously reported ensembles."

We then initalize our question-answering pipeline.

In [8]:
question_answerer = pipeline(task="question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 473/473 [00:00<00:00, 90.1kB/s]
Downloading pytorch_model.bin: 100%|██████████| 261M/261M [00:01<00:00, 181MB/s]  
Downloading (…)okenizer_config.json: 100%|██████████| 29.0/29.0 [00:00<00:00, 5.71kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.01MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 1.39MB/s]


And then we define the question we want to ask of our text:

In [9]:
answer = question_answerer(
    context = text,
    question="What are the main results of this paper?",
)

In [10]:
print(answer)
#score 0.06
#answer: given the context, which answer answers the question best. the answer will be generated from tokens in the text (i.e., it does not use words it has no knowledge of from the input text "text")
#ie., den kan ik bare pludselig skrive mit navn, for det findes ik i "text"

{'score': 0.0676712617278099, 'start': 505, 'end': 570, 'answer': 'our best model outperforms even all previously reported ensembles'}


### Text summarization

HuggingFace also allows us to use other styles of transformers models, such as T5 and GPT, which we'll be looking at in coming weeks. These allow us to do interesting things like *text summarization* and *text generation*

In [11]:
summarizer = pipeline(task="summarization")

summary = summarizer(text)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 1.80k/1.80k [00:00<00:00, 143kB/s]
Downloading pytorch_model.bin: 100%|██████████| 1.22G/1.22G [00:04<00:00, 257MB/s] 
Downloading (…)okenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 5.87kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 8.10MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 1.40MB/s]
Your max_length is set to 142, but you input_length is only 117. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=58)


In [12]:
print(summary)

[{'summary_text': ' The Transformer is the first sequence transduction model based entirely on attention . It replaces the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention . For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers .'}]


### Text generation 

Compare how this performs relative to your trained RNN and consider that we're only using the default parameters here:

In [13]:
prompt = "Hugging Face is a community-based open-source platform for machine learning."

In [14]:
generator = pipeline(task="text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 665/665 [00:00<00:00, 151kB/s]
Downloading pytorch_model.bin: 100%|██████████| 548M/548M [00:01<00:00, 376MB/s]  
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<00:00, 16.6kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 2.44MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 2.10MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 2.61MB/s]


In [15]:
generated = generator(prompt)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [16]:
print(generated)

[{'generated_text': 'Hugging Face is a community-based open-source platform for machine learning. Every Sunday since 1st of April, we host a workshop on Machine Learning by Mark Ellington based on the results from the following papers. The workshop takes place at'}]


### Using a different model

So far, we've only been using the default models and parameters for these tasks. But if you check out the ```HuggingFace``` model universe, you'll see that there are many (in some cases hundreds) of finetuned models which can be slotted into these pipelines.

The current model we are using (https://huggingface.co/j-hartmann/emotion-english-distilroberta-base)

Check out the options [here](https://huggingface.co/models).

In [17]:
classifier = pipeline("text-classification", 
                      model="j-hartmann/emotion-english-distilroberta-base",  #try to change this 
                      return_all_scores=True)

Downloading (…)lve/main/config.json: 100%|██████████| 1.00k/1.00k [00:00<00:00, 211kB/s]
Downloading pytorch_model.bin: 100%|██████████| 329M/329M [00:02<00:00, 144MB/s]  
Downloading (…)okenizer_config.json: 100%|██████████| 294/294 [00:00<00:00, 67.9kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.92MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 2.20MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 2.58MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 41.8kB/s]


In [18]:
classifier("I love this!")

[[{'label': 'anger', 'score': 0.004419781267642975},
  {'label': 'disgust', 'score': 0.0016119900392368436},
  {'label': 'fear', 'score': 0.0004138521908316761},
  {'label': 'joy', 'score': 0.9771687984466553},
  {'label': 'neutral', 'score': 0.005764583125710487},
  {'label': 'sadness', 'score': 0.002092392183840275},
  {'label': 'surprise', 'score': 0.008528688922524452}]]

This final pipeline forms the basis of [Assignment 4](https://classroom.github.com/a/BhnScEmU), which you should start working on now!