# Hugging Face

Hugging Face is a library that centralizes models in different areas of knowledge, but it focus specially in Natural Language Processing. You can find more about the library here: [Hugging Face](https://huggingface.co/)

![image alt><](https://avatars.githubusercontent.com/u/25720743?s=200&v=4)

In [None]:
# Installing the Library
!pip install transformers -q

[K     |████████████████████████████████| 5.5 MB 23.7 MB/s 
[K     |████████████████████████████████| 7.6 MB 57.7 MB/s 
[K     |████████████████████████████████| 182 kB 42.5 MB/s 
[?25h

In [None]:
from transformers import pipeline

## Applications of Pretrained models available in Hugging Face

A basic way to access the pre-trained models is through the Hugging Face pipeline. There are multiple models prepared for different use cases that can be easily downloaded and used.

You can find more information about the [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) and also, you can find different available models on the [ModelHub](https://huggingface.co/models)

In this case we are using Google Colab to avoid the download of this large models to our PCs and make use of the Google resources.

## Classification

#### Model A

In [None]:
texts = [
    "I don't like the mondays",
    "I love the weather in the summer",
    "Why are you so late?",
    "The movie was really good",
    "I've been waiting for a huggingface course my whole life."
]

In [None]:
# Classification
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [None]:
for text in texts:
  pred = classifier(text)
  print(text)
  print(pred)
  print()

I don't like the mondays
[{'label': 'NEGATIVE', 'score': 0.9946926236152649}]

I love the weather in the summer
[{'label': 'POSITIVE', 'score': 0.9998717308044434}]

Why are you so late?
[{'label': 'NEGATIVE', 'score': 0.9980792999267578}]

The movie was really good
[{'label': 'POSITIVE', 'score': 0.9998579025268555}]

I've been waiting for a huggingface course my whole life.
[{'label': 'POSITIVE', 'score': 0.9598049521446228}]



#### Model B

In [None]:
# Classifier with Roberta
classifier = pipeline(model="roberta-large-mnli")

Downloading:   0%|          | 0.00/688 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.70k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
for text in texts:
  pred = classifier(text)
  print(text)
  print(pred)
  print()

I don't like the mondays
[{'label': 'NEUTRAL', 'score': 0.7821896076202393}]

I love the weather in the summer
[{'label': 'NEUTRAL', 'score': 0.6869037747383118}]

Why are you so late?
[{'label': 'NEUTRAL', 'score': 0.6636735200881958}]

The movie was really good
[{'label': 'NEUTRAL', 'score': 0.7407053709030151}]

I've been waiting for a huggingface course my whole life.
[{'label': 'NEUTRAL', 'score': 0.7009155750274658}]



## Text Generation

In [None]:
text_init = [
    "In this course we will teach you how to play chess",
    "It all begun in 1965, the factory workers went to work as usual",
    "In todays news, the weather is still getting colder",
    "Artificial Intelligence has been growing in popularity in the past years",
    "In this course we will teach you how to play chess",
]

In [None]:
# Text Generation (GPT-2)

generator = pipeline("text-generation", model="distilgpt2")

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/353M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
for text in text_init:
    pred = generator(text, max_length=50, num_return_sequences=4)
    print()
    print(text)
    print(pred[0]['generated_text'])
    print()
    print()

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



In this course we will teach you how to play chess
In this course we will teach you how to play chess on a tablet or tablet. In this course we will show you how to play a chess game with the keyboard to use in your pocket.















Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



It all begun in 1965, the factory workers went to work as usual
It all begun in 1965, the factory workers went to work as usual on industrial products and products, but they weren't able to make up their mind. At the end of 1966, the factory was disbanded for the good of it. But in 1967




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



In todays news, the weather is still getting colder
In todays news, the weather is still getting colder for several days




We‏ are already a bit early in the season, and that‍s the most anticipated release yet for the Xbox One
If you‏




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Artificial Intelligence has been growing in popularity in the past years
Artificial Intelligence has been growing in popularity in the past years, and now its own research and development is revealing how advanced artificial intelligence can revolutionize our lives. There remain questions about whether artificial intelligence could be a real force on the human race, especially



In this course we will teach you how to play chess
In this course we will teach you how to play chess in the context of the language of chess.


If you enjoy how to play chess in the context of the language of chess, check out an online video that explains what the language means




## Summarization

In [None]:
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

Downloading:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
texts_to_summarize = [
    "Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). The university operates 19 organized research units as well as eight School of Medicine research units, six research centers at Scripps Institution of Oceanography and two multi-campus initiatives.[19] UC San Diego is also closely affiliated with several regional research centers, such as the Salk Institute, the Sanford Burnham Prebys Medical Discovery Institute, the Sanford Consortium for Regenerative Medicine, and the Scripps Research Institute. According to the National Science Foundation, UC San Diego spent $1.403 billion on research and development in fiscal year 2020, ranking it first in the University of California system and 6th in the nation.",
    "UC San Diego consists of twelve undergraduate, graduate and professional schools as well as seven undergraduate residential colleges. It received over 140,000 applications for undergraduate admissions in Fall 2021, making it the second most applied-to university in the United States. UC San Diego Health, the region's only academic health system, provides patient care, conducts medical research and educates future health care professionals at the UC San Diego Medical Center, Hillcrest, Jacobs Medical Center, Moores Cancer Center, Sulpizio Cardiovascular Center, Shiley Eye Institute, Institute for Genomic Medicine, Koman Family Outpatient Pavilion and various express care and urgent care clinics throughout San Diego."
]

In [None]:
for text in texts_to_summarize:
    pred = summarizer(text, min_length=5, max_length=20)
    print()
    print(text)
    print(pred)
    print()
    print()


Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). The university operates 19 organized research units as well as eight School of Medicine research units, six research centers at Scripps Institution of Oceanography and two multi-campus initiatives.[19] UC San Diego is also closely affiliated with several regional research centers, such as the Salk Institute, the Sanford Burnham Prebys Medical Discovery Institute, the Sanford Consortium for Regenerative Medicine, and the Scripps Research Institute. According to the National Science Foundation, UC San Diego spent $1.403 billion on research an

## Translation

In [None]:
en_fr_translator = pipeline("translation_en_to_fr")
en_fr_translator("How old are you?")

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


[{'translation_text': ' quel âge êtes-vous?'}]

In [None]:
for text in texts:
    pred = en_fr_translator(text)
    print()
    print(text)
    print(pred)
    print()
    print()


I don't like the mondays
[{'translation_text': "Je n'aime pas les lundis"}]



I love the weather in the summer
[{'translation_text': "J'aime le temps en été"}]



Why are you so late?
[{'translation_text': 'Pourquoi êtes-vous si tard?'}]



The movie was really good
[{'translation_text': 'Le film était vraiment bon'}]



I've been waiting for a huggingface course my whole life.
[{'translation_text': "J'ai toujours attendu un cours sur l'embrasement."}]


