# HuggingFace pipelines

For this session we head to Google Colab and use this Notebook to explore the HuggingFace High Level API, pipelines.

https://colab.research.google.com/drive/1aMaEw8A56xs0bRM4lu8z7ou18jqyybGm?usp=sharing

You can use a low cost (or free) T4 GPU runtime for this notebook - and the results look great!

There are instructions in the notebook for setting up your HuggingFace Token and including it as a secret in the notebook.

In [14]:
#!pip install -q transformers datasets diffusers

In [25]:
# Imports
import torch
import os
import soundfile as sf
from dotenv import load_dotenv
from huggingface_hub import login
from transformers import pipeline
from diffusers import DiffusionPipeline
from datasets import load_dataset
from IPython.display import Audio

In [26]:
# Load environment variables from .env file
load_dotenv(override=True)

# API keys from environment
HUGGINGFACE_API_KEY = os.getenv('HUGGINGFACE_API_KEY')

# Verify API keys
if HUGGINGFACE_API_KEY:
    print(f"Hugging Face API Key loaded: {HUGGINGFACE_API_KEY[:4]}...")
else:
    print("Hugging Face API Key not set")

Hugging Face API Key loaded: hf_f...


In [27]:
# Login to Hugging Face
if HUGGINGFACE_API_KEY:
    login(HUGGINGFACE_API_KEY)
    print("Logged in to Hugging Face successfully!")
else:
    print("Error: HUGGINGFACE_API_KEY not found in environment variables")

Logged in to Hugging Face successfully!


In [29]:
# Usar CPU en lugar de CUDA si no hay GPU disponible
clasiffier = pipeline("sentiment-analysis", device="cpu")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [31]:
# sentiment Analysis

result = clasiffier("!Estoy super emocionado de estar en camino hacia la maestria en LLM!")
print(result)

[{'label': 'POSITIVE', 'score': 0.9305714964866638}]


In [32]:
# Named Entity Recognition

ner = pipeline("ner", grouped_entities=True, device="cpu")
result = ner("Barak Obama fue el 44 presidente de los estados unidos")
print(result)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'entity_group': 'PER', 'score': 0.99757403, 'word': 'Barak Obama', 'start': 0, 'end': 11}]




In [34]:
# Question Answering with context

question_answerer = pipeline("question-answering", device="cpu")
result = question_answerer(question="¿Quien fue el 44 presidente de los estados unidos?", 
                           context="Barack Obama fue el 44 presidente de los estados unidos")
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.8796626329421997, 'start': 0, 'end': 12, 'answer': 'Barack Obama'}


In [None]:
# Text Sumarization

sumarizer = pipeline("summarization", device="cuda")
text = """Hugging Face Transformers ha estado causando sensación en el campo del Procesamiento del Lenguaje Natural (PLN). 
Ofrece una API fácil de usar que reduce los costos de computación aprovechando modelos pre-entrenados de última tecnología 
para varias tareas de PLN. Este artículo se adentrará en el mundo de Hugging Face Transformers, explorando sus 
características, beneficios y cómo se destacan en el panorama del PLN.

La biblioteca Hugging Face Transformers es un recurso completo que proporciona modelos pre-entrenados para tareas de PLN 
como análisis de sentimientos, clasificación de texto y reconocimiento de entidades nombradas. También ofrece herramientas 
para ajustar estos modelos para adaptarlos a casos de uso específicos. Este artículo te guiará a través de las 
complejidades de Hugging Face Transformers, sus aplicaciones y cómo usarlos de manera efectiva.
"""

summary = sumarizer(text, max_length=50, min_length=25, do_sample=False)
print(summary[0]['summary_text'])

In [None]:
# Traslation

traslator = pipeline("translation_en_to_fr", device="cuda")
result = traslator("Hugging Face Transformers has been causing a sensation in the field of Natural Language Processing (NLP).")
print(result[0]['translation_text'])

In [None]:
# Clasificacion

classifier = pipeline("zero-shot-classification", device="cuda")
result = classifier("¡La biblioteca de Transformers de Hugging face es increibe!",
                    candidate_labels=["tecnologia", "deporte", "politica"])
print(result) 

In [None]:
# Text Generation

generator = pipeline("text-generation", device="cuda")
result = generator("Si hay algo que quiero que recuerdes sobre el uso de los ppelines de Hugging Face,")
print(result[0]['generated_text'])  

In [None]:
imagen_gen = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
).to("cpu")


text = "Una clase de centificos de datos aprendiendo sobre IA, al estilo de Salvador Daly"
image = imagen_gen(prompt=text).images[0]
image

In [None]:
from transformers import pipeline
from datasets import load_dataset
import torch
import soundfile as sf
from IPython.display import Audio

# Audio Generation
#synthesiser = pipeline("text-to-speech", "facebook/mms-tts-eng", device="cuda")
synthesiser = pipeline("text-to-speech", "facebook/mms-tts-spa", device="cuda")

# The problematic lines related to embeddings_dataset and speaker_embeddings are removed
# as they are not needed for 'facebook/mms-tts-eng' with the pipeline.

speech = synthesiser("!Hola a un ingeniero de Inteligencia Artificial, en camino hacia la maestría")

sf.write("speech.wav", speech["audio"].flatten(), samplerate=speech["sampling_rate"], subtype='PCM_16', format='WAV')
Audio("speech.wav")