# **Hugging Face Ecosystem**

## **TOC:**


- 1) **[Introduction](#intro)**
    - 1.1) **[Requirements](#requirements)**
    - 1.2) **[Ecosystem](#ecosystem)**
    - 1.4) **[Datasets](#datasets)**
    - 1.5) **[Accelerate](#accelerate)**


- 2) **[Usual Text Tasks](#tasks)**

    - 2.1) **[Sentiment Analysis](#sentiment_analysis)**
    - 2.2) **[Named Entity Recognition (NER)](#ner)**
    - 2.3) **[Question Answering](#question_answering)**
    - 2.4) **[Summarization](#summarization)**
    - 2.5) **[Translation](#translation)**
    - 2.6) **[Text Generation](#text_generation)**  


- 3) **[Main Challenges with Transformers](#challenges)**

# 1) **Introduction** <a class="anchor" id="intro"></a>

## 1.1) **Requirements** <a class="anchor" id="requirements"></a>

```zsh
transformers==4.18.0
sentencepiece==0.1.96
torch
pandas
```

## 1.2) **Ecosystem** <a class="anchor" id="ecosystem"></a>

<center><img src="figures/ecosystem.png" width=300></center>

## 1.3) **Datasets** <a class="anchor" id="datasets"></a>

Interface for thousands of datasets that can be found on the [Hub](https://huggingface.co/datasets).

**Features**:
- Smart Caching (download only once)
- Handle RAM limitations (mechanism called memory mapping)
- Reproducible Experiments 

## 1.4) **Accelerate** <a class="anchor" id="accelerate"></a>

Change the infrastructure when necessary. A layer of abstraction to the custom logic necessary to train models.

**Features**: 
- Easy use of multiple GPUs

---

# 2) **Usual Text Tasks** <a class="anchor" id="tasks"></a>

In [1]:
import pandas as pd
from transformers import pipeline

The model will be downloaded automatic by default in the ```~/.cache/huggingface/dataset```
folder.

Reference: [Pipelines](https://huggingface.co/transformers/v3.0.2/main_classes/pipelines.html)

In [2]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

## 2.1) **Sentiment Analysis** <a class="anchor" id="sentiment_analysis"></a>


In [3]:
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
classifier(text)

[{'label': 'NEGATIVE', 'score': 0.9015461802482605}]

## 2.2) **Named Entity Recognition (NER)** <a class="anchor" id="ner"></a>


In [4]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


In [5]:
pd.DataFrame(outputs).head()

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.55657,Mega,208,212
4,PER,0.590256,##tron,212,216


## 2.3) **Question Answering** <a class="anchor" id="question_answering"></a>


In [6]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


In [7]:
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


## 2.4) **Summarization** <a class="anchor" id="summarization"></a>


In [8]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)
Your min_length=56 must be inferior than your max_length=45.


In [9]:
print(outputs[0]['summary_text'])

 Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead.


## 2.5) **Translation** <a class="anchor" id="translation"></a>


In [10]:
translator = pipeline("translation_en_to_de",
                      model="Helsinki-NLP/opus-mt-en-de")

Downloading:   0%|          | 0.00/750k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

In [11]:
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von Ihnen zu hören. Aufrichtig, Bumblebee.


## 2.6) **Text Generation** <a class="anchor" id="text_generation"></a>


In [12]:
generator = pipeline("text-generation")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


In [13]:
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. If the order

mistake was, your mistake and it's failure must remain, please contact Bumblebee immediately and we

can work out exactly what action figure is being shipped instead. Please note that I have purchased a Optimus Prime action figure that is "incomplete"



outputs

---

# 3) **Main Challenges with Transformers** <a class="anchor" id="challenges"></a>




- Language (Mainly english pretrained models).
- Data availability (Need labeled data).
- Working with long documents (Lenght performance).
- Opacity (Why the model give this prediction?)
- Bias (Is the model racist or sexist?)