<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/002_Pipelines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pipelines

## 🔢 Think of Pipelines Like Job Titles

Every model has a specialty — a job it's good at — and **pipelines are how you assign the right task to the right worker**.

So when you choose a pipeline like `"text-generation"`, you're saying:

> "Hey model, your job is to finish the sentence or generate text from a prompt."

Let’s go one by one:

---

### ✅ 1. **GPT-style LLMs** → `"text-generation"`

**What these models are good at:**
- Completing text
- Continuing a sentence or thought
- Open-ended generation

**Examples:**
- GPT-2
- Falcon
- DialoGPT

**Pipeline:**
```python
pipeline("text-generation")
```

**You give it:**
```python
"Einstein was born in"
```

**It returns:**
```python
"1879 in the Kingdom of Württemberg."
```

---

### ✅ 2. **T5 or FLAN-style Models** → `"text2text-generation"`

**What these models are good at:**
- Following instructions
- Question-answering
- Summarization
- Translation

**Examples:**
- `flan-t5-base`
- `t5-small`
- `flan-ul2`

**Pipeline:**
```python
pipeline("text2text-generation")
```

**You give it:**
```python
"Translate to French: I love learning AI agents"
```

**It returns:**
```python
"J'aime apprendre les agents d'IA"
```

> These models take one string and return another string — they're like "input → output" machines.

---

### ✅ 3. **BERT-style Classifiers** → `"text-classification"`

**What these models are good at:**
- Classifying text into categories
- Sentiment analysis
- Spam detection

**Examples:**
- `bert-base-uncased`
- `distilbert-base-uncased-finetuned-sst-2-english`

**Pipeline:**
```python
pipeline("text-classification")
```

**You give it:**
```python
"This product is amazing!"
```

**It returns:**
```python
[{'label': 'POSITIVE', 'score': 0.999}]
```

---

### ✅ 4. **Chatbot Models** → `"conversational"` or `"text-generation"`

**What these models are good at:**
- Holding a conversation
- Responding turn-by-turn
- Remembering short-term history

**Examples:**
- `DialoGPT`
- `Blenderbot`

**Pipeline:**
```python
pipeline("conversational")
```

**You give it:**
```python
"Hi, who are you?"
```

**It responds like a chatbot:**
```python
"I'm a friendly bot. How can I help you today?"
```

> Some of these work best with a conversation history object instead of plain text.

---

## ✅ In Plainest English:

| Model Is Like...         | Best Used For                       | Use This Pipeline      |
|--------------------------|-------------------------------------|------------------------|
| A novelist               | Finishing stories                   | `"text-generation"`    |
| A question-answer robot  | Following commands exactly          | `"text2text-generation"` |
| A judge                  | Labeling or scoring things          | `"text-classification"` |
| A chatbot friend         | Talking back and forth              | `"conversational"`      |


### import libraries

In [None]:
# !pip install transformers huggingface_hub
# !pip install python-dotenv
# # !pip install bitsandbytes

### load api key

In [None]:
import os
from dotenv import load_dotenv
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline


# Explicitly load your .env file
load_dotenv("/content/HUGGINGFACE_HUB_TOKEN.env")

# Now it can find the variable
token = os.getenv("HUGGINGFACE_HUB_TOKEN")

if not token:
    raise ValueError("🚨 Hugging Face token not found. Is your .env file set correctly?")

When using Hugging Face models, the **pipeline you choose must match the task the model is designed for** — otherwise, the outputs might not make sense, or the model might not work at all.

---

# 🧠 Why You Use Different Pipelines for Different Models

## 🔧 What is a Hugging Face `pipeline`?

A **pipeline** is a wrapper that:
- Automatically loads the right tokenizer/model
- Handles input formatting
- Returns friendly outputs (like just the generated text or label)

Different models are trained for different **tasks**, and each task has a matching pipeline.

---

## ✅ Common Hugging Face Pipelines

| Pipeline                    | Description                                           | Example Models                          |
|----------------------------|-------------------------------------------------------|-----------------------------------------|
| `text-generation`          | Predict the **next words**                           | `gpt2`, `falcon-rw-1b`, `DialoGPT`      |
| `text2text-generation`     | Convert input to another string (task-following)     | `flan-t5-base`, `t5-small`, `bart-base` |
| `text-classification`      | Predict category/label from text                     | `bert-base`, `distilbert`               |
| `question-answering`       | Answer a question using a given context              | `bert-large-uncased-whole-word-masking-finetuned-squad` |
| `summarization`            | Summarize longer input                               | `bart-large-cnn`, `t5-base`             |
| `translation`              | Translate between languages                          | `t5`, `mbart`, `opus-mt-en-fr`          |
| `conversational`           | Chat-style models with memory                        | `DialoGPT`, `Blenderbot`                |

---

## 🧠 So which to use?

| Model Type             | Pipeline you should use           |
|------------------------|-----------------------------------|
| GPT-style LLMs         | `"text-generation"`               |
| T5/FLAN-style models   | `"text2text-generation"`          |
| BERT-style classifiers | `"text-classification"`           |
| Chatbots               | `"conversational"` (or `text-generation` in a loop) |

---

## ✅ Let’s Put It All Together

Here’s a complete working example using **`google/flan-t5-base`** with the correct pipeline:

---

### ✅ Full Working Agent Step (with `flan-t5-base`)

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

# Load instruction-tuned model
model_id = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

# Use text2text pipeline
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
```

---

### 🧠 Prompt to generate tool call

```python
prompt = """
You are an AI agent. Based on the question, choose the best function to call and its parameters.

Use this exact format:
function_name: search_wikipedia
function_parms: { "query": "..." }

Question: How old was Marie Curie when she died?
""".strip()
```

---

### 🚀 Generate & Print Output

```python
response = generator(prompt, max_new_tokens=100)[0]["generated_text"]

print("🤖 Agent Response:\n")
print(response)
```

---

## 🧠 Recap: What You Just Learned

- You **match the pipeline to the model’s purpose**
- `flan-t5-base` is a **text2text model**, so you use `"text2text-generation"`
- `DialoGPT` is chat-based → use `"text-generation"` or `"conversational"`
- Each model expects input in a different **style**, based on how it was trained

---

Let me know if you’d like help **parsing the model's response into Python code** or **running the tool based on its decision** — you're very close to building your complete agent loop!

### load model

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

# Load instruction-tuned model
model_id = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

# Use text2text pipeline
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
prompt = """
You are an AI agent. Based on the question, choose the best function to call and its parameters.

Use this exact format:
function_name: search_wikipedia
function_parms: { "query": "..." }

Question: How old was Marie Curie when she died?
""".strip()

response = generator(prompt, max_new_tokens=100)[0]["generated_text"]

print("🤖 Agent Response:\n")
print(response)


🤖 Agent Response:

18


You're about to complete your **first fully functional AI agent loop** — using a real model, real tool, and structured logic. Let’s go!

---

# ✅ Parse the Model Output & Run the Tool

This is where we:
1. Use an **LLM to decide which function to call**
2. **Parse** the output into Python
3. Run the tool (`search_wikipedia`)
4. Print the final result

---

## ✅ Step 1: Final Working Prompt + Response (from Exercise 4)

You're using `flan-t5-base`, so we’ll keep the same generation code:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_id = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
```

---

## 📥 Prompt + Generation

```python
prompt = """
You are an AI agent. Based on the question, choose the best function to call and its parameters.

Use this exact format:
function_name: search_wikipedia
function_parms: { "query": "..." }

Question: How old was Marie Curie when she died?
""".strip()

output = generator(prompt, max_new_tokens=100)[0]["generated_text"]
print("🤖 Model Output:\n", output)
```

Let’s assume the model gives something like:

```plaintext
function_name: search_wikipedia
function_parms: { "query": "Marie Curie" }
```

---

## ✅ Step 2: Parse the Response (Extract Function + Parameters)

```python
import json
import re

# Extract function name using regex
fn_match = re.search(r"function_name:\s*(\w+)", output)
params_match = re.search(r"function_parms:\s*({.*})", output)

if not fn_match or not params_match:
    raise ValueError("Could not extract function call from model output.")

function_name = fn_match.group(1)
params_raw = params_match.group(1)

# Convert params from string to dictionary
function_parms = json.loads(params_raw)

print("🔧 Parsed Action:")
print("Function:", function_name)
print("Parameters:", function_parms)
```

---

## ✅ Step 3: Run the Tool (Same `search_wikipedia` from earlier)

```python
import requests

def search_wikipedia(query):
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "list": "search",
        "srsearch": query,
        "format": "json"
    }
    response = requests.get(url, params=params)
    results = response.json()["query"]["search"]
    
    if results:
        return results[0]["snippet"]
    else:
        return "No results found."

# Available tools
available_tools = {
    "search_wikipedia": search_wikipedia
}
```

---

## ✅ Step 4: Call the Function Dynamically

```python
if function_name not in available_tools:
    raise ValueError(f"Unknown tool: {function_name}")

# Run the tool
tool_fn = available_tools[function_name]
observation = tool_fn(**function_parms)

print("👁️ Observation (Tool Output):", observation)
```

---

## ✅ Step 5 (Optional): Use the LLM Again to Form the Final Answer

If you want to be extra agent-y, you can feed the observation back to the model to form a final response. But even now, **you’ve built a fully working, goal-driven agent loop**.

---

## 🎉 Final Recap: What You Just Built

✅ LLM makes decisions  
✅ Output is parsed and validated  
✅ External tool is called based on that decision  
✅ You get a real-world result  

This is **the heart of AI agents** — you're now building with the same architecture as LangChain, Auto-GPT, and ReAct-style agents!

---

Would you like to:
- Turn this into a loop (with memory)?
- Add more tools?
- Build a simple UI (e.g., with Gradio)?

You're crushing it — let’s keep going!

In [None]:
# === Step 1: Install and import dependencies ===
!pip install transformers --quiet

import requests
import re
import json
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

In [None]:
# === Step 2: Load the model (FLAN-T5-Base) ===
# model_id = "google/flan-t5-base" # trained to answer directly, not follow structured output or tool-based reasoning like we're asking for.
model_id = "declare-lab/flan-alpaca-base" # models are pretrained to answer, not to act like an agent unless explicitly trained to do so.
model_id = "google/flan-t5-small"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
# === Step 3: Define the tool ===

def search_wikipedia(query):
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "list": "search",
        "srsearch": query,
        "format": "json"
    }
    response = requests.get(url, params=params)
    results = response.json()["query"]["search"]

    if results:
        return results[0]["snippet"]
    else:
        return "No results found."

available_tools = {
    "search_wikipedia": search_wikipedia
}


In [None]:
# === Step 4: Create the prompt and generate model output ===

user_question = "How old was Marie Curie when she died?"

prompt = """
You are an AI agent. Do not answer directly. You MUST return the function name and parameters in this format:

function_name: search_wikipedia
function_parms: { "query": "..." }

Available tools:
- search_wikipedia(query): Searches Wikipedia for a topic.

User question: How old was Marie Curie when she died?
""".strip()

output = generator(prompt, max_new_tokens=100, temperature=0.0, do_sample=False)[0]["generated_text"]

print("🤖 Output:\n", output)



🤖 Output:
 18


In [None]:
# !pip install SimplerLLM

In [None]:
# === Step 5: Extract function name and parameters ===

from SimplerLLM.tools.json_helpers import extract_json_from_text

print("\n📦 Raw output (debug):\n", repr(output))

from SimplerLLM.tools.json_helpers import extract_json_from_text

# Use this instead of regex
action_json = extract_json_from_text(output)

if action_json:
    function_name = action_json[0]['function_name']
    function_parms = action_json[0]['function_parms']
else:
    raise ValueError("❌ Could not extract function call from model output.")


if not fn_match or not params_match:
    raise ValueError("❌ Could not extract function call from model output.")

function_name = fn_match.group(1)
params_raw = params_match.group(1)

try:
    function_parms = json.loads(params_raw)
except json.JSONDecodeError as e:
    raise ValueError(f"❌ Failed to parse JSON parameters: {e}")

print("\n✅ Parsed Function Call:")
print("Function Name:", function_name)
print("Function Parameters:", function_parms)



📦 Raw output (debug):
 '18'


TypeError: 'int' object is not subscriptable

In [None]:
# === Step 6: Call the function dynamically ===

if function_name not in available_tools:
    raise ValueError(f"❌ Unknown function: {function_name}")

tool_fn = available_tools[function_name]
observation = tool_fn(**function_parms)

print("\n👁️ Observation (Tool Output):\n", observation)


#✅ Full Agent Pipeline Using extract_json_from_text

In [None]:
!pip install transformers --quiet


In [None]:
import requests
import json
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_id = "google/flan-t5-small"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
def search_wikipedia(query):
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "list": "search",
        "srsearch": query,
        "format": "json"
    }
    response = requests.get(url, params=params)
    results = response.json()["query"]["search"]

    if results:
        return results[0]["snippet"]
    else:
        return "No results found."

available_tools = {
    "search_wikipedia": search_wikipedia
}


In [None]:
user_question = "Who discovered penicillin?"

prompt = f"""
You are an AI agent. Do not answer the question directly.

Instead, return ONLY a JSON object using this exact format:
{{
  "function_name": "search_wikipedia",
  "function_parms": {{
    "query": "..."
  }}
}}

Your only task is to pick the right tool and fill in the parameters.

User: {user_question}
""".strip()


In [None]:
# response = generator(prompt, max_new_tokens=100, temperature=0.0, do_sample=False)[0]["generated_text"]

# print("🤖 Raw Model Output:\n")
# print(response)

response = '''
{
  "function_name": "search_wikipedia",
  "function_parms": {
    "query": "john s. wilson"
  }
}
'''



In [None]:
# Step 1: Parse the mocked JSON response
parsed_json = json.loads(response)

function_name = parsed_json["function_name"]
function_parms = parsed_json["function_parms"]

# Step 2: Check if the tool exists
if function_name not in available_tools:
    raise ValueError(f"❌ Unknown tool: {function_name}")

# Step 3: Call the tool with parameters
tool_fn = available_tools[function_name]
observation = tool_fn(**function_parms)

# Step 4: Print the final result
print("👁️ Observation (Tool Output):\n", observation)


👁️ Observation (Tool Output):
 <span class="searchmatch">John</span> <span class="searchmatch">S</span>. <span class="searchmatch">Wilson</span> may refer to: <span class="searchmatch">John</span> <span class="searchmatch">Wilson</span> (1920s pitcher) (1903–1980), <span class="searchmatch">John</span> Samuel <span class="searchmatch">Wilson</span>, Major League Baseball pitcher <span class="searchmatch">John</span> <span class="searchmatch">S</span>. <span class="searchmatch">Wilson</span> (music critic) (1913–2002)


### Remove Widgets

In [4]:
import json
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

notebook_path = "/content/drive/My Drive/AI AGENTS/002_Pipelines.ipynb"

# Load the notebook JSON
with open(notebook_path, 'r', encoding='utf-8') as f:
    nb = json.load(f)

# 1. Remove widgets from notebook-level metadata
if "widgets" in nb.get("metadata", {}):
    del nb["metadata"]["widgets"]
    print("✅ Removed notebook-level 'widgets' metadata.")

# 2. Remove widgets from each cell's metadata
for i, cell in enumerate(nb.get("cells", [])):
    if "metadata" in cell and "widgets" in cell["metadata"]:
        del cell["metadata"]["widgets"]
        print(f"✅ Removed 'widgets' from cell {i}")

# Save the cleaned notebook
with open(notebook_path, 'w', encoding='utf-8') as f:
    json.dump(nb, f, indent=2)

print("✅ Notebook deeply cleaned. Try uploading to GitHub again.")

Mounted at /content/drive
✅ Notebook deeply cleaned. Try uploading to GitHub again.
