# Welcome to Pipelines!

The HuggingFace transformers library provides APIs at two different levels.

The High Level API for using open-source models for typical inference tasks is called "pipelines". It's incredibly easy to use.

You create a pipeline using something like:

`my_pipeline = pipeline("the_task_I_want_to_do")`

Followed by

`result = my_pipeline(my_input)`

And that's it!

See end of this notebook for a list of all pipelines.

## Before we start: 2 important pro-tips for local use:

**Pro-tip 1:**

Data Science code often gives warnings and messages. They can mostly be safely ignored! Glance over them, and if something goes wrong later, perhaps they can give you a clue.

**Pro-tip 2:**

For local use, make sure you have:
- Python 3.8+ installed
- Required packages installed (see installation cell below)
- A Hugging Face account and API token set as an environment variable `HF_TOKEN`
- If you have an NVIDIA GPU, make sure CUDA is properly installed. The notebook will automatically detect and use it.
- On Mac with Apple Silicon, the notebook will use MPS (Metal Performance Shaders) if available
- If no GPU is available, the notebook will fall back to CPU (slower but will work)


## A sidenote:

You may already know this, but just in case you're not familiar with the word "inference" that I use here:

When working with Data Science models, you could be carrying out 2 very different activities: **training** and **inference**.

### 1. Training  

**Training** is when you provide a model with data for it to adapt to get better at a task in the future. It does this by updating its internal settings - the parameters or weights of the model. If you're Training a model that's already had some training, the activity is called "fine-tuning".

### 2. Inference

**Inference** is when you are working with a model that has _already been trained_. You are using that model to produce new outputs on new inputs, taking advantage of everything it learned while it was being trained. Inference is also sometimes referred to as "Execution" or "Running a model".

All of our use of APIs for GPT, Claude and Gemini in the last weeks are examples of **inference**. The "P" in GPT stands for "Pre-trained", meaning that it has already been trained with data (lots of it!) In week 6 we will try fine-tuning GPT ourselves.
  
The pipelines API in HuggingFace is only for use for **inference** - running a model that has already been trained. In week 7 we will be training our own model, and we will need to use the more advanced HuggingFace APIs that we look at in the up-coming lecture.

I recorded this playlist on YouTube with more on parameters, training and inference:  
https://www.youtube.com/playlist?list=PLWHe-9GP9SMMdl6SLaovUQF2abiLGbMjs


In [None]:
# Pip installs should come at the top line.
# If your Kernel ever resets, you need to run this again.

# !pip install -q --upgrade datasets==3.6.0

In [None]:
# Let's check the GPU and device availability

import torch
import subprocess
import sys

# Check for CUDA (NVIDIA GPU)
if torch.cuda.is_available():
    print(f"CUDA is available! Device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    try:
        result = subprocess.run(['nvidia-smi'], capture_output=True, text=True, timeout=5)
        if result.returncode == 0:
            print("\nNVIDIA-SMI output:")
            print(result.stdout)
    except:
        pass
    device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    print("MPS (Apple Silicon) is available!")
    device = "mps"
else:
    print("No GPU detected. Will use CPU (this will be slower).")
    device = "cpu"

print(f"\nUsing device: {device}")

In [None]:
# Imports

import os
import torch
from huggingface_hub import login
from transformers import pipeline
from diffusers import DiffusionPipeline
from datasets import load_dataset
import soundfile as sf
from IPython.display import Audio

# Important Note - Hugging Face account

In Day 1, we set up a FREE account on https://huggingface.co

### If you skipped this:

Please go back and do it! Then go to the Avatar menu, Tokens, and create an API token. And make sure it has WRITE permissions!

### Setting up your token for local use:

Set your Hugging Face token as an environment variable:

**Linux/Mac:**
```bash
export HF_TOKEN="your_token_here"
```

**Windows (PowerShell):**
```powershell
$env:HF_TOKEN="your_token_here"
```

**Windows (Command Prompt):**
```cmd
set HF_TOKEN=your_token_here
```

Or you can create a `.env` file in your project directory with:
```
HF_TOKEN=your_token_here
```

And load it using python-dotenv (install with `pip install python-dotenv`).

In [None]:
# Get Hugging Face token from environment variable
hf_token = os.getenv('HF_TOKEN')

if hf_token and hf_token.startswith("hf_"):
    print("HF token looks good!")
    login(hf_token, add_to_git_credential=True)
else:
    print("HF_TOKEN environment variable is not set or invalid.")
    print("Please set it using: export HF_TOKEN='your_token_here'")
    print("Or you can enter it manually below (not recommended for security):")
    # Uncomment the lines below if you want to enter the token manually:
    # hf_token = input("Enter your Hugging Face token: ")
    # if hf_token:
    #     login(hf_token, add_to_git_credential=True)

## Using Pipelines from Hugging Face

A simple way to run inference for common tasks, without worrying about all the plumbing, picking reasonable defaults.


### How it works:

STEP 1: Create a pipeline - a function you can then call

```python
my_pipeline = pipeline(task, model=xx, device=xx)
```

If you don't specify a model, then Hugging Face picks one for you that's the default for the task. 

For the device:
- Specify `"cuda"` for an NVIDIA GPU
- Specify `"mps"` on a Mac with Apple Silicon
- Specify `"cpu"` or omit the device parameter to use CPU (slower but works everywhere)

The notebook automatically detects the best available device and uses it.


STEP 2: Then call it as many times as you want:

```python
my_pipeline(input1)
my_pipeline(input2)
```

In [None]:
# Sentiment Analysis

my_simple_sentiment_analyzer = pipeline("sentiment-analysis", device=device)
result = my_simple_sentiment_analyzer("I'm super excited to be on the way to LLM mastery!")
print(result)

In [None]:
result = my_simple_sentiment_analyzer("I should be more excited to be on the way to LLM mastery!")
print(result)

In [None]:

better_sentiment = pipeline("sentiment-analysis", model="nlptown/bert-base-multilingual-uncased-sentiment", device=device)
result = better_sentiment("I should be more excited to be on the way to LLM mastery!!")
print(result)

In [None]:
# Named Entity Recognition

ner = pipeline("ner", device=device)
result = ner("AI Engineers are learning about the amazing pipelines from HuggingFace locally from Ed Donner")
for entity in result:
  print(entity)

In [None]:
# Question Answering with Context

question="What are Hugging Face pipelines?"
context="Pipelines are a high level API for inference of LLMs with common tasks"

question_answerer = pipeline("question-answering", device=device)
result = question_answerer(question=question, context=context)
print(result)

In [None]:
# Text Summarization

summarizer = pipeline("summarization", device=device)
text = """
The Hugging Face transformers library is an incredibly versatile and powerful tool for natural language processing (NLP).
It allows users to perform a wide range of tasks such as text classification, named entity recognition, and question answering, among others.
It's an extremely popular library that's widely used by the open-source data science community.
It lowers the barrier to entry into the field by providing Data Scientists with a productive, convenient way to work with transformer models.
"""
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print(summary[0]['summary_text'])

In [None]:
# Translation

translator = pipeline("translation_en_to_fr", device=device)
result = translator("The Data Scientists were truly amazed by the power and simplicity of the HuggingFace pipeline API.")
print(result[0]['translation_text'])

In [None]:
# Another translation, showing a model being specified
# All translation models are here: https://huggingface.co/models?pipeline_tag=translation&sort=trending

translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es", device=device)
result = translator("The Data Scientists were truly amazed by the power and simplicity of the HuggingFace pipeline API.")
print(result[0]['translation_text'])

In [None]:
# Classification

classifier = pipeline("zero-shot-classification", device=device)
result = classifier("Hugging Face's Transformers library is amazing!", candidate_labels=["technology", "sports", "politics"])
print(result)

In [None]:
# Text Generation

generator = pipeline("text-generation", device=device)
result = generator("If there's one thing I want you to remember about using HuggingFace pipelines, it's")
print(result[0]['generated_text'])

In [None]:
from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
pipe.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
display(image)

In [None]:
# Image Generation - remember this?! Now you know what's going on
# Pipelines can be used for diffusion models as well as transformers

from IPython.display import display
from diffusers import AutoPipelineForText2Image
# import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
# pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo-tensorrt")
pipe.to("cuda")
prompt = "A class of students learning AI engineering in a vibrant pop-art style"
image = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
display(image)

In [None]:
# Audio Generation

from transformers import pipeline
from datasets import load_dataset
import soundfile as sf
import torch
from IPython.display import Audio

synthesiser = pipeline("text-to-speech", "microsoft/speecht5_tts", device=device)
embeddings_dataset = load_dataset("matthijs/cmu-arctic-xvectors", split="validation", trust_remote_code=True)
speaker_embedding = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
speech = synthesiser("Hi to an artificial intelligence engineer, on the way to mastery!", forward_params={"speaker_embeddings": speaker_embedding})

Audio(speech["audio"], rate=speech["sampling_rate"])

# All the available pipelines

Here are all the pipelines available from Transformers and Diffusers.

With thanks to student Lucky P for suggesting I include this!

There's a list pipelines under the Tasks on this page (you have to scroll down a bit, then expand the parameters to see the Tasks):

https://huggingface.co/docs/transformers/main_classes/pipelines

There's also this list of Tasks for Diffusion models instead of Transformers, following the image generation example where I use DiffusionPipeline above.

https://huggingface.co/docs/diffusers/en/api/pipelines/overview

If you come up with some cool examples of other pipelines, please share them with me! It's wonderful how HuggingFace makes this advanced AI functionality available for inference with such a simple API.