<a href="https://colab.research.google.com/github/moghalis/almoghalisAI/blob/main/Hello_Hugging_Face_Transformers_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#📚 Introduction to Hugging Face Transformers

Welcome to this hands-on tutorial exploring the Hugging Face Transformers library!
In this notebook, you will learn how to easily build powerful Natural Language Processing (NLP) applications using pre-trained models and the pipeline API.

✨ What You’ll Learn
- How to install and set up the Hugging Face Transformers library
- Running NLP models with just a few lines of code
- Using the pipeline abstraction for quick tasks like:

1- Text Classification - Sentiment Analysis

2- Named Entity Recognition (NER)

3- Question Answering

4- Summarization

5- Translation

6- Text Generation





In [None]:
#pip install transformers

# Hello Hugging Face Transformers

In [None]:
text = """Dear Amazon,
I am writing from Gotham City, New York, regarding a major disappointment. A few days ago, I ordered a premium Superman\n
collectible figure from your online store. When the box arrived at Wayne Manor, the packaging was crushed, and the figure inside\n
was in terrible condition — the shield emblem was chipped, and the left arm was broken.\n
I don’t tolerate failure in Gotham, and I certainly don’t expect it from a company of your reputation.\n
This was meant to be a tribute gift, and now it’s ruined. I demand either an immediate replacement in perfect condition\n
or a full refund. Photos of the damaged item and shipping labels are attached for your review.\n
You have 48 hours to respond before I escalate this complaint further.\n
— Batman (Bruce Wayne)\n
Wayne Manor, Gotham City, New York, USA\n
"""

### 1- Sentiment Analysis (Text Classification)

In [None]:
from transformers import pipeline
import pandas as pd

classifier = pipeline("text-classification")
output= classifier(text)


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
output

[{'label': 'NEGATIVE', 'score': 0.9996312856674194}]

In [None]:
pd.DataFrame(output)

Unnamed: 0,label,score
0,NEGATIVE,0.999631


### 2- Named Entity Recognition (NER)

In [None]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)


No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
outputs

[{'entity_group': 'ORG',
  'score': np.float32(0.9559783),
  'word': 'Amazon',
  'start': 5,
  'end': 11},
 {'entity_group': 'LOC',
  'score': np.float32(0.9985744),
  'word': 'Gotham City',
  'start': 31,
  'end': 42},
 {'entity_group': 'LOC',
  'score': np.float32(0.9993133),
  'word': 'New York',
  'start': 44,
  'end': 52},
 {'entity_group': 'MISC',
  'score': np.float32(0.5419842),
  'word': 'Superman',
  'start': 124,
  'end': 132},
 {'entity_group': 'LOC',
  'score': np.float32(0.8741362),
  'word': 'Wayne Manor',
  'start': 201,
  'end': 212},
 {'entity_group': 'LOC',
  'score': np.float32(0.9963251),
  'word': 'Gotham',
  'start': 381,
  'end': 387},
 {'entity_group': 'ORG',
  'score': np.float32(0.69862854),
  'word': 'Batman',
  'start': 748,
  'end': 754},
 {'entity_group': 'PER',
  'score': np.float32(0.7941628),
  'word': 'Bruce Wayne',
  'start': 756,
  'end': 767},
 {'entity_group': 'ORG',
  'score': np.float32(0.6171477),
  'word': 'Wayne',
  'start': 770,
  'end': 775

In [None]:
pd.DataFrame(outputs)

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.955978,Amazon,5,11
1,LOC,0.998574,Gotham City,31,42
2,LOC,0.999313,New York,44,52
3,MISC,0.541984,Superman,124,132
4,LOC,0.874136,Wayne Manor,201,212
5,LOC,0.996325,Gotham,381,387
6,ORG,0.698629,Batman,748,754
7,PER,0.794163,Bruce Wayne,756,767
8,ORG,0.617148,Wayne,770,775
9,LOC,0.890812,Manor,776,781


### 3- Question Answering

In [None]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)


No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
outputs

{'score': 0.22134383022785187,
 'start': 531,
 'end': 594,
 'answer': 'an immediate replacement in perfect condition\n\nor a full refund'}

In [None]:
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.221344,531,594,an immediate replacement in perfect condition\...


### 4- Summarization

In [None]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu
Your min_length=56 must be inferior than your max_length=45.


In [None]:
outputs

[{'summary_text': " The packaging was crushed, and the figure inside was in terrible condition. The shield emblem was chipped and the left arm was broken. The figure was meant to be a tribute gift, and now it's ruined"}]

In [None]:
outputs[0]['summary_text']

" The packaging was crushed, and the figure inside was in terrible condition. The shield emblem was chipped and the left arm was broken. The figure was meant to be a tribute gift, and now it's ruined"

### 5- Translation

In [None]:
translator = pipeline("translation_en_to_de", model = "Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)


config.json:   0%|          | 0.00/1.33k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/298M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/298M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/768k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/797k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.27M [00:00<?, ?B/s]

Device set to use cpu


In [None]:
outputs

[{'translation_text': 'Lieber Amazon, ich schreibe aus Gotham City, New York, in Bezug auf eine große Enttäuschung. Vor ein paar Tagen, bestellte ich eine Premium Superman Sammlerfigur aus Ihrem Online-Shop. Als die Box kam in Wayne Manor, die Verpackung wurde zerquetscht, und die Figur innen war in schrecklichem Zustand — das Schild Emblem wurde gechipt, und der linke Arm wurde gebrochen. Ich verträgt nicht Versagen in Gotham, und ich sicherlich nicht erwarten, dass es von einer Firma Ihres Rufs. Dies sollte ein Tribut Geschenk sein, und jetzt ist es ruiniert. Ich verlange entweder einen sofortigen Ersatz in perfektem Zustand oder eine volle Rückerstattung. Fotos der beschädigten Artikel und Versandetiketten sind für Ihre Bewertung. Sie haben 48 Stunden, um zu antworten, bevor ich diese Beschwerde weiter eskalieren. — Batman (Bruce Wayne) Wayne Manor, Gotham City, New York, USA'}]

In [None]:
print(outputs[0]['translation_text'])

Lieber Amazon, ich schreibe aus Gotham City, New York, in Bezug auf eine große Enttäuschung. Vor ein paar Tagen, bestellte ich eine Premium Superman Sammlerfigur aus Ihrem Online-Shop. Als die Box kam in Wayne Manor, die Verpackung wurde zerquetscht, und die Figur innen war in schrecklichem Zustand — das Schild Emblem wurde gechipt, und der linke Arm wurde gebrochen. Ich verträgt nicht Versagen in Gotham, und ich sicherlich nicht erwarten, dass es von einer Firma Ihres Rufs. Dies sollte ein Tribut Geschenk sein, und jetzt ist es ruiniert. Ich verlange entweder einen sofortigen Ersatz in perfektem Zustand oder eine volle Rückerstattung. Fotos der beschädigten Artikel und Versandetiketten sind für Ihre Bewertung. Sie haben 48 Stunden, um zu antworten, bevor ich diese Beschwerde weiter eskalieren. — Batman (Bruce Wayne) Wayne Manor, Gotham City, New York, USA


### 6- Text Generation

In [None]:
from transformers import set_seed
set_seed(42)

generator = pipeline("text-generation") #default model : openai-community/gpt2
response = "Dear Batman, We are sorry to hear that your order "
prompt = text +"\n\nCustomer service response:\n"+ response
print(prompt)
outputs = generator(prompt, max_length=500)


No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon,
I am writing from Gotham City, New York, regarding a major disappointment. A few days ago, I ordered a premium Superman

collectible figure from your online store. When the box arrived at Wayne Manor, the packaging was crushed, and the figure inside

was in terrible condition — the shield emblem was chipped, and the left arm was broken.

I don’t tolerate failure in Gotham, and I certainly don’t expect it from a company of your reputation.

This was meant to be a tribute gift, and now it’s ruined. I demand either an immediate replacement in perfect condition

or a full refund. Photos of the damaged item and shipping labels are attached for your review.

You have 48 hours to respond before I escalate this complaint further.

— Batman (Bruce Wayne)

Wayne Manor, Gotham City, New York, USA



Customer service response:
Dear Batman, We are sorry to hear that your order 


In [None]:
outputs

[{'generated_text': 'Dear Amazon,\nI am writing from Gotham City, New York, regarding a major disappointment. A few days ago, I ordered a premium Superman\n\ncollectible figure from your online store. When the box arrived at Wayne Manor, the packaging was crushed, and the figure inside\n\nwas in terrible condition — the shield emblem was chipped, and the left arm was broken.\n\nI don’t tolerate failure in Gotham, and I certainly don’t expect it from a company of your reputation.\n\nThis was meant to be a tribute gift, and now it’s ruined. I demand either an immediate replacement in perfect condition\n\nor a full refund. Photos of the damaged item and shipping labels are attached for your review.\n\nYou have 48 hours to respond before I escalate this complaint further.\n\n— Batman (Bruce Wayne)\n\nWayne Manor, Gotham City, New York, USA\n\n\n\nCustomer service response:\nDear Batman, We are sorry to hear that your order ’had been accepted and arrived with no change. We believe that if y

In [None]:
print(outputs[0]['generated_text'])

Dear Amazon,
I am writing from Gotham City, New York, regarding a major disappointment. A few days ago, I ordered a premium Superman

collectible figure from your online store. When the box arrived at Wayne Manor, the packaging was crushed, and the figure inside

was in terrible condition — the shield emblem was chipped, and the left arm was broken.

I don’t tolerate failure in Gotham, and I certainly don’t expect it from a company of your reputation.

This was meant to be a tribute gift, and now it’s ruined. I demand either an immediate replacement in perfect condition

or a full refund. Photos of the damaged item and shipping labels are attached for your review.

You have 48 hours to respond before I escalate this complaint further.

— Batman (Bruce Wayne)

Wayne Manor, Gotham City, New York, USA



Customer service response:
Dear Batman, We are sorry to hear that your order ’had been accepted and arrived with no change. We believe that if you do not return us an item, we may send yo