# NLP with Hugging Face and Transformers

Hugging Face is an open-source framework and platform that specializes in Natural Language Processing (NLP) and machine learning. It's known for its Transformers library, which provides a wide range of pre-trained models for tasks like text classification, translation, summarization, and more. Hugging Face makes it easy for developers to leverage these powerful models and integrate them into various applications.

## Text Classification
Text classification is a fundamental task in natural language processing (NLP) that involves categorizing or assigning labels to text based on its content. The goal is to classify the input text into one or more predefined categories or classes. 

Example of a comment from a customer to do perform text classification using Hugging Face Transformers

In [3]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, John."""

Create the text classification pipeline

In [4]:
from transformers import pipeline
classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Let’s make some predictions

In [5]:
import pandas as pd
outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.888494


## Named Entity Recognition

In NLP, real-world objects like products, places, and people are called named entities, and extracting them from text is called named entity recognition (NER). Let's take a look at NER by loading the corresponding pipeline and feeding our customer review to it.

In [6]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.866123,Amazon,5,11
1,MISC,0.991707,Optimus Prime,36,49
2,LOC,0.999765,Germany,90,97
3,MISC,0.580447,Mega,208,212
4,PER,0.502781,##tron,212,216
5,ORG,0.683059,Decept,253,259
6,MISC,0.509665,##icons,259,264
7,MISC,0.786567,Megatron,350,358
8,MISC,0.98893,Optimus Prime,367,380
9,PER,0.992709,John,502,506


As you can see, the pipeline found all of the entities and assigned them a category such as ORG (organization) for the text.

## Question Answering

In question answering, we give the model a passage of text known as the context, as well as a question whose answer we want to extract. The model then returns the text span associated with the answer. Let's take a look at what happens if we ask a specific question about customer feedback:

In [7]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Unnamed: 0,score,start,end,answer
0,0.627353,335,358,an exchange of Megatron


You can see the answer. Note that the pipeline also gives us the start and end integers corresponding to the character indices where the answer range is located.

## Summarization

With text summarization, you can take a long text as input and generate a short version. Let's take a look at this technique.

In [12]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Your min_length=56 must be inferior than your max_length=45.


 John ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when he opened the package, he discovered to his horror that he had been sent an action figure of Megatron instead. As a


As you can see that the model was able to capture the essence of the problem and correctly identify.

## Translation

Translation, like summarization, is a task whose output is generated text. To translate an English text to German, let's use a translation.

In [13]:
translator = pipeline("translation_en_to_de", 
                      model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])



Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur von Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Eingeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, John.


As you can see translation isn't bad. You can find models for thousands of language pairs on the Hugging Face Hub.

## Text Generation

Assume you want to be able to respond to customer feedback more quickly by having access to an autocomplete function. This is possible with a text generation model:

In [19]:
generator = pipeline("text-generation")
response = "Dear John, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, John.

Customer service response:
Dear John, I am sorry to hear that your order was mixed up. The order was shipped and your order shipped, however I must reiterate: this is not the case (the order was ordered in Japan, not in Germany, according to orders.com). I am sorry about this. As I said in my reply, with that kind of information, I'm sorry what happened to you and I have reached out to you sincerely


You can generate a response like this to calm the customer.

Now that you've seen several great applications of transformer models. All the models we use in this section are public and have already been fine-tuned for the task at hand. But in general, you can fine-tune models on your own data.