# Introduction to HuggingFace Hub

In [1]:
from transformers import pipeline
import timm

  from .autonotebook import tqdm as notebook_tqdm


## Images

In [2]:
image_classifier = pipeline(task="image-classification")

No model was supplied, defaulted to google/vit-base-patch16-224 and revision 5dca96d (https://huggingface.co/google/vit-base-patch16-224).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
banana_path = "../datasets/images/banana.jpeg"
image_classifier(banana_path)

[{'score': 0.9956451654434204, 'label': 'banana'},
 {'score': 0.000427508755819872, 'label': 'orange'},
 {'score': 0.0003286113205831498, 'label': 'lemon'},
 {'score': 0.0003006604965776205, 'label': 'pineapple, ananas'},
 {'score': 0.0001832905109040439, 'label': 'strawberry'}]

In [4]:
image_segmentator = pipeline(task="image-segmentation")

No model was supplied, defaulted to facebook/detr-resnet-50-panoptic and revision fc15262 (https://huggingface.co/facebook/detr-resnet-50-panoptic).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading model.safetensors: 100%|██████████| 102M/102M [00:11<00:00, 8.74MB/s] 
Downloading (…)rocessor_config.json: 100%|██████████| 273/273 [00:00<00:00, 1.38MB/s]
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.


In [None]:
platzi_image = "../datasets/images/platzi.jpeg"
image_segmentator(platzi_image)

## NLP

In [6]:
summarizer = pipeline(task="summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 1.80k/1.80k [00:00<00:00, 16.5MB/s]
Downloading pytorch_model.bin: 100%|██████████| 1.22G/1.22G [02:28<00:00, 8.21MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 219kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 4.67MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 8.08MB/s]


In [7]:
text = """
Elon Musk has named a new chief executive of Twitter, just over six months after his controversial takeover of the social media company.

The billionaire said Linda Yaccarino, the former head of advertising at NBCUniversal, would oversee business operations at the site, which has been struggling to make money.

He said she would start in six weeks.

Mr Musk will remain involved as executive chairman and chief technology officer.

"Looking forward to working with Linda to transform this platform into X, the everything app," he wrote on Twitter, confirming the decision a day after he had stoked speculation by writing that he had found a new boss without revealing their identity.

Mr Musk - who bought the social media platform last year for $44bn - had been under pressure to find someone else to lead the company and refocus his attention on his other businesses, which include electric carmaker Tesla and rocket firm SpaceX.

With less than 9% of Fortune 500 tech companies headed by women, Ms Yaccarino will become that rare example of a woman at the top of a major tech firm, after rising steadily through the ranks of some of America's biggest media companies.
"""

In [8]:
summarizer(text)

[{'summary_text': " New CEO of Twitter announces he will take over as CEO of the company . He had been under pressure to focus on his other projects, including Tesla and SpaceX . The new boss will be the first woman to take over at the top of the firm's social network . He has been in charge of Twitter for six months and is expected to start in September ."}]

In [9]:
summarizer_es = pipeline(task="summarization", model="IIC/mt5-spanish-mlsum")

Downloading (…)lve/main/config.json: 100%|██████████| 702/702 [00:00<00:00, 7.08MB/s]
Downloading pytorch_model.bin: 100%|██████████| 2.33G/2.33G [04:44<00:00, 8.18MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 408/408 [00:00<00:00, 1.77MB/s]
Downloading tokenizer.json: 100%|██████████| 16.3M/16.3M [00:01<00:00, 10.5MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 65.0/65.0 [00:00<00:00, 111kB/s]


In [10]:
text_es = """
Hernán Díaz quería narrar desde la ficción los "engranajes misteriosos que rigen la vida del capital" en Wall Street.

Para eso, después de revisar los grandes clásicos de la teoría económica y la historia de Estados Unidos, decidió crear en su novela Trust el universo que rodea a Benjamin Rask, un magnate que a principios de la década de 1920 multiplica la herencia que recibe de su familia apostando al capital financiero.

Nacido en Argentina en 1973, Díaz creció en Suecia, estudió Letras en la Universidad de Buenos Aires y se doctoró en Filosofía en la Universidad de Nueva York. Desde hace 25 años reside en Estados Unidos.

Trust (Riverhead Books, 2022), traducida al español como "Fortuna" (Anagrama, 2023), fue incluida en la lista de libros favoritos de Barack Obama, llega cinco años después de su primera novela "A lo lejos" y acaba de ganar el premio Pulitzer en la categoría de Ficción.

Desde un hotel de Los Ángeles, donde mantiene reuniones para la adaptación de su libro a una serie de HBO que estará protagonizada por Kate Winslet, Díaz habla con BBC Mundo sobre el capitalismo en Estados Unidos, las desigualdades, las alianzas y las traiciones que orbitan en el mundo del dinero.
"""

In [11]:
summarizer_es(text_es, min_length=50, max_length=200)

[{'summary_text': 'Hernán Díaz narra los engranajes misteriosos de Wall Street. El autor argentino publica en su novela ‘Trust’ el universo que rodea a Benjamin Rask, un magnate que a principios de la década de 1920 multiplica la herencia que recibe de su familia apostando al capital financiero'}]

In [12]:
sentiment_classifier = pipeline(
    task="text-classification", model="pysentimiento/robertuito-sentiment-analysis"
)

Downloading (…)lve/main/config.json: 100%|██████████| 925/925 [00:00<00:00, 7.73MB/s]
Downloading pytorch_model.bin: 100%|██████████| 435M/435M [00:52<00:00, 8.33MB/s] 
Downloading (…)okenizer_config.json: 100%|██████████| 384/384 [00:00<00:00, 2.70MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.31M/1.31M [00:00<00:00, 4.75MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 167/167 [00:00<00:00, 766kB/s]
Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [13]:
text_sentiment = [
    "Ver tanta informacion en internet es abrumante, no se por donde empezar",
    "Conocer el funcionamiento de un equipo de fisica cuantica es muchisimo trabajo",
]
sentiment_classifier(text_sentiment)

[{'label': 'NEG', 'score': 0.9638364911079407},
 {'label': 'POS', 'score': 0.750228226184845}]

In [14]:
text_generator = pipeline(task="text-generation", model="bigscience/bloomz-560m")

Downloading (…)lve/main/config.json: 100%|██████████| 715/715 [00:00<00:00, 5.03MB/s]
Downloading model.safetensors: 100%|██████████| 1.12G/1.12G [02:15<00:00, 8.28MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 222/222 [00:00<00:00, 435kB/s]
Downloading tokenizer.json: 100%|██████████| 14.5M/14.5M [00:01<00:00, 9.71MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 397kB/s]


In [15]:
text_reference = "Andy era mi gatita. Ella fallecio hace casi 2 años. Aun recuerdo"
text_generator(text_reference, min_length=30)



ValueError: Unfeasible length constraints: the minimum length (30) is larger than the maximum length (20)