#1. What exactly defines an AI agent and how is it different from other LLM applications?

Think of it like this:

A normal LLM app (like ChatGPT answering a question) is like a calculator — you input something, it gives you one answer, and that’s it.

An AI agent is like a virtual assistant (like Jarvis from Iron Man). It can make decisions, use tools, and take multiple steps to complete a task.
✅ Example:

Normal LLM app:

You ask: “What’s the weather in Paris?”
It replies: “It’s 25°C and sunny.”
AI Agent:

You ask: “Should I pack an umbrella for my Paris trip this weekend?”
It:
Checks the weather for the weekend.
Analyzes the forecast.
Replies: “Yes, there’s a 70% chance of rain on Saturday. Pack an umbrella.”


#2. How do language models achieve autonomy in decision-making processes?
AI agents become “smart” by using these features:

Memory: It remembers what you said earlier.

Planning: It breaks a big task into small steps.

Tool use: It can Google things, do math, or call APIs.

Loops: It can think, act, and check again until it gets the right answer.

✅ Example:

You ask: “Find me the cheapest flight from Delhi to London this weekend.”

An AI agent might:

Search flight websites using an API.
Compare prices.
Check dates.
Say: “The cheapest flight is $450 on Air India, departing Saturday night.”


#3. What's the critical difference between agent workflows and truly autonomous agents?

Agent workflow = Like a recipe. The steps are already written.

Autonomous agent = Like a chef. It figures out the recipe based on the ingredients and goal.

✅ Example:

You ask both to “write and publish a blog about AI trends.”

Workflow agent:

1. Writes draft
2. Runs grammar check
3. Publishes to website

(You told it what to do.)

Autonomous agent:
1. Asks: What topic is trending?
2. Decides to write about “AI in Education”
3. Writes it
4. Adds images
5. Publishes it

(You only gave the goal — it figured out everything else.)

#4. How can you integrate tools with LLMs to create effective agent systems?

Effective agent systems often rely on tool integration for real-world capability. Typical tool types:

APIs (e.g., weather, finance)
Python functions (for computation, file I/O)
Databases (retrieval-augmented generation - RAG)
Browsers or Search tools (real-time info)
Vector DBs (for semantic search & memory)
Ways to integrate:

LangChain tool wrappers
OpenAI function calling
HuggingFace tools + pipelines
Custom ReAct / Plan-and-Execute strategies
LangGraph nodes with tool execution

#5. What architectural patterns should you consider when designing AI agents?


| Pattern         | Simple Explanation                        | Example                                                       |
|----------------|--------------------------------------------|---------------------------------------------------------------|
| ReAct           | Think, then act, then check result         | "What's the capital of Brazil?" → Think → Google → Answer     |
| Plan-and-Execute| First make a plan, then do steps           | "Plan my day" → Creates a to-do list → Schedules meetings     |
| LangGraph FSM   | Like a flowchart with steps & decisions    | "Get a quote" → "Verify source" → "Save it"                   |
| Multi-Agent     | Many small agents working together         | One agent writes, one edits, one posts the blog               |
| RAG             | Uses a database to find facts before answering | "Who is CEO of Telstra?" → Searches company DB → Answers |



✅ Example using LangGraph:

A LangGraph agent may:

Take your question,
Go to a search node,
Move to processing node,
Then to output node with final answer.




In [None]:
# Hugging Face API Token
HUGGINGFACEHUB_API_TOKEN=""
HUGGINGFACE_MODEL_ID="HuggingFaceH4/zephyr-7b-alpha"

from huggingface_hub import login
login(token=HUGGINGFACEHUB_API_TOKEN)

In [None]:
!pip install --upgrade --quiet transformers accelerate
!pip install bitsandbytes accelerate

  return self._str


In [None]:
"""
Demonstration of a simple multi‑step workflow using Hugging Face transformers.

In a multi‑step workflow, the output of one language model call becomes the
input to the next.  This technique is often called *prompt chaining* and is
commonly used for tasks that can be decomposed into fixed subtasks【804865909491372†L168-L181】.
By chaining together summarization, translation and sentiment analysis,
this example shows how to build a small pipeline where each step augments the
result of the previous one.

NOTE: Running this script requires the ``transformers`` library and internet
access to download the pretrained models.  See the Hugging Face
documentation for installation details and model usage.
"""

from __future__ import annotations
import torch
from typing import Dict

try:
    from transformers import pipeline
except ImportError as e:
    raise ImportError(
        "transformers is not installed. Please install it with `pip install transformers` "
        "to run this module."
    ) from e


class MultiStepWorkflow:
    """Implements a simple three‑step LLM workflow.

    Steps:
      1. **Summarization** – Condense a long piece of text into a concise summary.
      2. **Translation** – Translate the English summary into French.
      3. **Sentiment Analysis** – Classify the sentiment of the summary.

    The design follows the prompt‑chaining paradigm, where the output of
    one model feeds directly into the next【804865909491372†L168-L181】.
    """

    def __init__(self) -> None:
        # Use GPU (device=0) if available, else fallback to CPU (-1)
        device = 0 if torch.cuda.is_available() else -1

        self.summarizer = pipeline(
            "summarization",
            model="facebook/bart-large-cnn",
            device=device
        )
        self.translator = pipeline(
            "translation_en_to_fr",
            model="Helsinki-NLP/opus-mt-en-fr",
            device=device
        )
        self.classifier = pipeline(
            "sentiment-analysis",
            device=device
        )

    def run(self, text: str) -> Dict[str, str]:
        """Execute the multi‑step workflow on the given text.

        Parameters
        ----------
        text : str
            The input document to process.

        Returns
        -------
        Dict[str, str]
            A dictionary containing the summary, its French translation, and
            the sentiment analysis result.
        """
        # Step 1: summarize the input text
        summary_output = self.summarizer(
            text,
            max_length=80,
            min_length=30,
            do_sample=False,
        )
        summary = summary_output[0]["summary_text"]

        # Step 2: translate the summary into French
        translation_output = self.translator(summary)
        translation = translation_output[0]["translation_text"]

        # Step 3: classify the sentiment of the summary
        sentiment_output = self.classifier(summary)
        sentiment = sentiment_output[0]["label"]

        return {
            "summary": summary,
            "translation": translation,
            "sentiment": sentiment,
        }


def main() -> None:
    workflow = MultiStepWorkflow()
    text = (
        "Artificial intelligence is transforming industries around the world. "
        "Large language models enable developers to build applications that can "
        "write articles, answer questions, translate languages and more. "
        "However, many tasks require chaining multiple calls together to achieve "
        "the desired result. In this example we will summarize this passage, "
        "translate the summary into French, and then classify its sentiment."
    )
    results = workflow.run(text)
    print("Summary:\n", results["summary"])
    print("\nTranslation (French):\n", results["translation"])
    print("\nSentiment:", results["sentiment"])


if __name__ == "__main__":
    main()

Device set to use cuda:0
Device set to use cuda:0
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Your max_length is set to 80, but your input_length is only 69. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=34)


Summary:
 Many tasks require chaining multiple calls together to achieve the desired result. In this example we will summarize this passage, translate the summary into French, and then classify its sentiment.

Translation (French):
 Dans cet exemple, nous allons résumer ce passage, traduire le résumé en français, puis classer son sentiment.

Sentiment: NEGATIVE


In [None]:
"""
Illustrative examples for building effective agents with LLMs.

This module defines several small demonstrations that correspond to common
questions about AI agents and multi‑step workflows.  It contrasts simple LLM
applications with agentic systems, shows how an agent can make decisions
autonomously, demonstrates the difference between predefined workflows and
autonomous agents【804865909491372†L84-L91】, integrates external tools and outlines
an architectural pattern for an agent with memory and planning components.

Each function or class is self‑contained and can be run independently.  You
should have the ``transformers`` library installed and Internet access to
download the example models.
"""

from __future__ import annotations

from typing import Any, Callable, Dict, List, Optional

try:
    from transformers import pipeline
except ImportError as e:
    raise ImportError(
        "The transformers package is required for these examples. Install it via `pip install transformers`."
    ) from e


def simple_llm_application(text: str) -> str:
    """A basic LLM application that performs a single task.

    This function summarizes the input text using a Hugging Face pipeline.  It
    illustrates the simplest form of a language model application — a single
    call that returns a result.  There is no decision making or tool use.

    Parameters
    ----------
    text : str
        The text to summarize.

    Returns
    -------
    str
        The summary of the input text.
    """
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    summary = summarizer(text, max_length=80, min_length=30, do_sample=False)[0][
        "summary_text"
    ]
    return summary


class BasicAgent:
    """A simple agent that routes tasks to different pipelines.

    Unlike the single‑call example above, this agent parses user commands and
    chooses between summarization, translation or sentiment analysis.  It
    exhibits a minimal degree of autonomy: the agent decides which model to
    invoke based on the input (a simple keyword match).  This illustrates how
    agents differ from basic LLM applications【804865909491372†L84-L91】.
    """

    def __init__(self) -> None:
        self.summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
        self.translator = pipeline(
            "translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr"
        )
        self.classifier = pipeline("sentiment-analysis")

    def run(self, prompt: str) -> Any:
        prompt_lower = prompt.lower()
        if prompt_lower.startswith("summarize:"):
            text = prompt.split(":", 1)[1].strip()
            return self.summarizer(text)[0]["summary_text"]
        if prompt_lower.startswith("translate:"):
            text = prompt.split(":", 1)[1].strip()
            return self.translator(text)[0]["translation_text"]
        if prompt_lower.startswith("classify sentiment:"):
            text = prompt.split(":", 1)[1].strip()
            return self.classifier(text)
        # default behaviour: return raw input
        return "I don't know how to handle this request."


class PredefinedWorkflow:
    """A fixed multi‑step workflow (prompt chaining).

    This class demonstrates a workflow where each step is predetermined —
    summarization followed by sentiment analysis.  It is akin to the prompt
    chaining pattern described in the LangGraph tutorial, where each LLM call
    processes the output of the previous one【804865909491372†L168-L181】.  Because
    the sequence is fixed, this is considered a *workflow* rather than a
    full agent【804865909491372†L84-L91】.
    """

    def __init__(self) -> None:
        self.summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
        self.classifier = pipeline("sentiment-analysis")

    def run(self, text: str) -> Dict[str, Any]:
        summary = self.summarizer(text, max_length=80, min_length=30, do_sample=False)[
            0
        ]["summary_text"]
        sentiment = self.classifier(summary)[0]
        return {"summary": summary, "sentiment": sentiment}


class AutonomousAgent:
    """A dynamically‑controlled agent with simple autonomy.

    This agent decides at runtime whether to translate a summary based on the
    sentiment score.  It performs summarization, then sentiment analysis, and
    finally — only if the sentiment is positive — translates the summary into
    French.  This conditional behaviour illustrates the autonomy of agents in
    making decisions beyond a fixed workflow【804865909491372†L84-L91】.
    """

    def __init__(self) -> None:
        self.summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
        self.classifier = pipeline("sentiment-analysis")
        self.translator = pipeline(
            "translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr"
        )

    def run(self, text: str) -> Dict[str, Any]:
        summary = self.summarizer(text, max_length=80, min_length=30, do_sample=False)[
            0
        ]["summary_text"]
        sentiment_result = self.classifier(summary)[0]
        label = sentiment_result["label"]
        result = {"summary": summary, "sentiment": sentiment_result}
        # If sentiment is positive, add a translation step
        if label == "POSITIVE":
            result["translation"] = self.translator(summary)[0]["translation_text"]
        return result


def agent_with_tool(prompt: str) -> Any:
    """Agent that integrates an external arithmetic tool.

    This function defines a tiny tool (a multiplication function) and a simple
    logic that decides whether to call the tool or defer to an LLM.  If the
    prompt contains a multiplication query like "what is 3*4?", the agent
    extracts the numbers and calls the tool; otherwise it uses a summarization
    pipeline.  This demonstrates how tool integration can extend an agent’s
    capabilities beyond text generation, as described in many agentic
    frameworks【619631130380182†L151-L154】.
    """

    def multiply(a: int, b: int) -> int:
        return a * b

    # simple parser for multiplication queries
    import re

    match = re.match(r".*?(\d+)\s*\*\s*(\d+).*", prompt)
    if match:
        a, b = int(match.group(1)), int(match.group(2))
        return multiply(a, b)

    # fallback to summarization for non‑arithmetic prompts
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    return summarizer(prompt)[0]["summary_text"]


class PatternedAgent:
    """Illustrates an architectural pattern with memory and planning.

    This class sketches a simple agent architecture inspired by the components
    discussed in LLM agent literature【619631130380182†L90-L104】.  The agent
    maintains a short-term memory of past interactions, uses a rudimentary
    planning method to break tasks into steps, and integrates tools for
    computations.  It is deliberately simplified for educational purposes.
    """

    def __init__(self) -> None:
        # core LLM
        self.generator = pipeline("text-generation", model="gpt2", max_length=100)
        # tool: arithmetic
        self.tool: Dict[str, Callable[[int, int], int]] = {
            "multiply": lambda a, b: a * b,
        }
        # short‑term memory
        self.memory: List[str] = []

    def plan(self, prompt: str) -> List[Dict[str, Any]]:
        """Very simple planner to parse instructions.

        It looks for arithmetic operations and splits the task accordingly.  In a
        real agent, planning would involve chain‑of‑thought reasoning or
        algorithmic decomposition【619631130380182†L142-L147】.
        """
        import re
        tasks = []
        match = re.match(r".*?(\d+)\s*\*\s*(\d+).*", prompt)
        if match:
            a, b = int(match.group(1)), int(match.group(2))
            tasks.append({"type": "tool", "name": "multiply", "args": (a, b)})
        else:
            tasks.append({"type": "generate", "text": prompt})
        return tasks

    def run(self, prompt: str) -> Any:
        # store in memory
        self.memory.append(prompt)
        steps = self.plan(prompt)
        print("step",steps)
        results = []
        for step in steps:
            if step["type"] == "tool":
                tool_name = step["name"]
                args = step["args"]
                if tool_name in self.tool:
                    results.append(self.tool[tool_name](*args))
            elif step["type"] == "generate":
                text = step["text"]
                response = self.generator(text, do_sample=True)[0]["generated_text"]
                results.append(response)
                self.memory.append(response)
        return results


def main() -> None:
    # Demonstrate the simple LLM application
    text = (
        "Language models enable developers to build applications that can perform"
        " various natural language processing tasks. In this example we will"
        " summarize this text."
    )
    print("Simple LLM application:\n", simple_llm_application(text))

    # Demonstrate the basic agent
    agent = BasicAgent()
    print("\nBasic agent summary:\n", agent.run("summarize: " + text))
    print("\nBasic agent translation:\n", agent.run("translate: Hello, world!"))
    print("\nBasic agent sentiment:\n", agent.run("classify sentiment: I love this!"))

    # Predefined workflow
    workflow = PredefinedWorkflow()
    print("\nPredefined workflow output:\n", workflow.run(text))

    # Autonomous agent
    autonomous = AutonomousAgent()
    print("\nAutonomous agent output (positive sentiment):\n", autonomous.run(text))
    negative_text = (
        "This movie was terrible. The plot was boring and the acting was mediocre."
    )
    print("\nAutonomous agent output (negative sentiment):\n", autonomous.run(negative_text))

    # Agent with tool
    print("\nAgent with tool (multiplication):\n", agent_with_tool("What is  2 * 3 ?"))
    print("\nAgent with tool (no math):\n", agent_with_tool(text))

    # Patterned agent
    patterned = PatternedAgent()
    print("\nPatterned agent arithmetic:\n", patterned.run("Compute 5 * 4"))
    print("\nPatterned agent generation:\n", patterned.run("Tell me a joke about cats"))


if __name__ == "__main__":
    main()

Device set to use cuda:0
Your max_length is set to 80, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)


Simple LLM application:
 Language models enable developers to build applications that can perform various natural language processing tasks. In this example we will summarize this text. In the next section, we will look at how a language model can be used to build an application.


Device set to use cuda:0
Device set to use cuda:0
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Your max_length is set to 142, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)



Basic agent summary:
 Language models enable developers to build applications that can perform various natural language processing tasks. In this example we will summarize this text. In the next section, we will look at how a language model can be used to build an application that can do various tasks in the natural language world.

Basic agent translation:
 Bonjour, le monde !

Basic agent sentiment:
 [{'label': 'POSITIVE', 'score': 0.9998764991760254}]


Device set to use cuda:0
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Your max_length is set to 80, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)



Predefined workflow output:
 {'summary': 'Language models enable developers to build applications that can perform various natural language processing tasks. In this example we will summarize this text. In the next section, we will look at how a language model can be used to build an application.', 'sentiment': {'label': 'POSITIVE', 'score': 0.9587017893791199}}


Device set to use cuda:0
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Device set to use cuda:0
Your max_length is set to 80, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)
Your max_length is set to 80, but your input_length is only 17. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=8)



Autonomous agent output (positive sentiment):
 {'summary': 'Language models enable developers to build applications that can perform various natural language processing tasks. In this example we will summarize this text. In the next section, we will look at how a language model can be used to build an application.', 'sentiment': {'label': 'POSITIVE', 'score': 0.9587017893791199}, 'translation': 'Dans cet exemple, nous résumerons ce texte. Dans la section suivante, nous examinerons comment un modèle de langue peut être utilisé pour construire une application.'}

Autonomous agent output (negative sentiment):
 {'summary': '"This movie was terrible. The plot was boring and the acting was mediocre," said one viewer. "It was just a terrible movie," said another.', 'sentiment': {'label': 'NEGATIVE', 'score': 0.9998108744621277}}

Agent with tool (multiplication):
 6
I am here


Device set to use cuda:0
Your max_length is set to 142, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)



Agent with tool (no math):
 Language models enable developers to build applications that can perform various natural language processing tasks. In this example we will summarize this text. In the next section, we will look at how a language model can be used to build an application that can do various tasks in the natural language world.


Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


step [{'type': 'tool', 'name': 'multiply', 'args': (5, 4)}]

Patterned agent arithmetic:
 [20]
step [{'type': 'generate', 'text': 'Tell me a joke about cats'}]

Patterned agent generation:
 ["Tell me a joke about cats and dogs. Does anything else bother you?\n\nThe most common reason for cat ownership is that of an animal's fear-based, self-serving, self-destructive behavior. The only reason cats and dogs"]


🚀 1. What Are Effective LLM Workflows?

An LLM workflow is a step-by-step process where the model:

Gets input (like a user question),
Processes the task,
Uses tools if needed,
Checks the result,
Gives a reliable answer.

# 🤖 Designing Effective Workflows for LLMs (Large Language Models)

LLMs (like ChatGPT, Claude, or Mistral) are powerful, but to use them effectively in real-world applications, you need smart workflows. Here’s a simple guide.

---

## ✅ 1. What are the 5 Essential Design Patterns for Robust AI Systems?

| Pattern              | What It Means (Simple)                           | Example                                                                |
|----------------------|--------------------------------------------------|------------------------------------------------------------------------|
| **ReAct**            | Think, then act, then check                      | "What's the capital of Brazil?" → Think → Google → Answer              |
| **Plan-and-Execute** | Make a plan first, then follow steps             | "Plan my day" → List tasks → Schedule meetings                         |
| **FSM (LangGraph)**  | A flowchart of fixed steps and decisions         | "Upload file" → "Validate" → "Store"                                   |
| **Multi-Agent**      | Several LLMs with different jobs                 | One agent writes → One edits → One posts to blog                       |
| **RAG**              | Use external knowledge or database to answer     | "Who is CEO of Telstra?" → Search docs → Generate answer               |

---

## 🧪 2. How to Implement Validation & Quality Control?

To make sure LLMs produce good output:

1. **Self-check**  
   Ask the model to critique its own output.  
   _Example_: “Was the answer correct and complete?”

2. **Tool-based Validation**  
   Use code or APIs to check things.  
   _Example_: Use regex to validate an email address.

3. **Human-in-the-loop (HITL)**  
   Have humans approve critical outputs.  
   _Example_: Approve a legal summary before publishing.

4. **Voting / Redundancy**  
   Ask multiple times, compare results.  
   _Example_: Run 3 drafts and pick the best one.

---

## 🏗️ 3. What Techniques Does Anthropic Recommend?

Anthropic recommends:

- Break problems into **clear, small tasks**
- Use **structured prompts** instead of one big one
- Let the model **review or refine its output**
- Use **transparent steps** that are easy to debug
- Add **guardrails** (rules, filters) to avoid mistakes

---

## 🔗 4. How to Orchestrate Multiple LLMs Together?

You can make multiple LLMs work like a team — each with a specific role. This approach makes your AI system smarter, more reliable, and easier to manage.

### 🔹 Step 1: Assign Roles

Give each LLM a clear task:

- **Writer** → Generates the initial content  
- **Reviewer** → Improves grammar, tone, or structure  
- **Fact Checker** → Verifies the correctness of information  
- **Action Agent** → Executes an action like sending an email or saving to a database

---

### 🔹 Step 2: Chain Them Together

Pass the output of one LLM to the next, like a relay race.

Writer → Reviewer → Fact Checker → Action Agent


Each step makes the result better, safer, or more accurate.

---

### 🔹 Step 3: Use a Controller

Use a central script, orchestration tool, or framework to manage the flow between LLMs.

**Example tools**:
- LangGraph (graph-based orchestration)
- CrewAI (multi-agent systems)
- Python scripts with conditional logic

---

### ✅ Example Workflow

**Goal**: Send a high-quality, personalized email using multiple LLMs.

1. **Writer Agent**: Drafts the email based on a customer support case.  
2. **Reviewer Agent**: Polishes tone and grammar.  
3. **Fact Checker Agent**: Verifies that product info is up to date.  
4. **Action Agent**: Sends the email via an API.

**Flow**: User Request → Writer → Reviewer → Fact Checker → Send Email


---

### 💡 Tip

Each LLM does one thing well. Keeping roles small and focused improves reliability and lets you swap or retrain parts independently.







In [None]:
from transformers import pipeline

# Load Hugging Face model (this will auto-download if not cached)
llm_pipeline = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.2", max_length=100,device_map="auto",torch_dtype="auto")
#      pipeline(
#     "text-generation",
#     model="mistralai/Mistral-7B-Instruct-v0.2",
#     tokenizer="mistralai/Mistral-7B-Instruct-v0.2",
#     device_map="auto",
#     torch_dtype="auto"
# )

def call_llm(prompt: str, max_new_tokens: int = 512, temperature: float = 0.7) -> str:
    """Send prompt to HF model and return generated text."""
    prompt = f"<s>[INST] {prompt.strip()} [/INST]"
    response = llm_pipeline(
        prompt,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=temperature,
        return_full_text=False
    )
    return response[0]["generated_text"].strip()

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Device set to use cuda:0


# 1. Prompt Chaining (Sequential Workflows)

Simple Idea: Break a big task into smaller steps. The output of one step feeds the next.

Use case: Blog writing (outline → write post)


💡 Summary

You have two separate prompt changes:

First: Static prompt — "Create an outline..."

Second: Dynamic prompt — built using the output from step 1

Each prompt is wrapped with [INST] ... [/INST] by call_llm() to fit Mistral's chat format.

In [None]:

import re

outline_prompt = "Create a numbered outline for a blog on the benefits of AI agents."
outline = call_llm(outline_prompt)
print("Generated Outline:\n", outline)

# Flexible section matcher
num_sections = len(re.findall(r"^\s*(\d+[\.\)]|[-•*])", outline, re.MULTILINE))
if num_sections < 3:
    raise ValueError(f"Outline has only {num_sections} sections. Expected at least 3.\nOutline:\n{outline}")

# Blog generation
post_prompt = f"Write a blog post using this outline:\n{outline}"
blog_post = call_llm(post_prompt, max_new_tokens=300)
print("\nGenerated Blog Post:\n", blog_post)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Generated Outline:
 I. Introduction

* Brief explanation of what AI agents are
* Importance of AI agents in today's technology landscape

II. Enhanced Productivity and Efficiency

* Automation of repetitive and mundane tasks
* Faster processing of large data sets
* Improved accuracy and consistency

III. Enhanced Customer Experience

* Personalized recommendations and interactions
* 24/7 availability and quick response times
* Multilingual and multichannel support

IV. Enhanced Decision Making

* Data analysis and trend identification
* Predictive analytics and forecasting
* Improved problem-solving abilities

V. Enhanced Creativity and Innovation

* Generation of new ideas and concepts
* Automated design and art creation
* Improved research and development

VI. Enhanced Safety and Security

* Monitoring and detection of threats
* Automated disaster response and recovery
* Improved risk assessment and management

VII. Enhanced Accessibility

* Assistance for individuals with disabiliti

 # 2. Routing (Input Classification → Specialized Response)

Simple Idea: Identify the type of question and send it to the right handler or prompt.

Use case: Customer support (billing vs technical)

Example Code:

In [None]:
# router_example.py


def route_query(query: str) -> str:
    query_lower = query.lower()
    if any(word in query_lower for word in ["refund", "charge", "payment", "invoice"]):
        return "billing"
    elif any(word in query_lower for word in ["error", "bug", "issue", "crash"]):
        return "technical"
    return "general"

# Example query
query = "I need a refund for my last purchase."

# Routing
category = route_query(query)
print(f"Routed to category: {category}")

# Prompt based on category
prompt_map = {
    "billing": f"You are a helpful billing support agent. Assist the customer:\n{query}",
    "technical": f"You are a skilled technical support agent. Assist the customer:\n{query}",
    "general": f"You are a general support assistant. Help with the following:\n{query}"
}

# Run the LLM
response = call_llm(prompt_map[category])
print("\nLLM Response:\n", response)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Routed to category: billing

LLM Response:
 Of course, I'd be happy to help you with your refund request. Could you please provide me with your order number or the email address associated with the purchase, so I can look up the details of your order in our system? Additionally, it would be helpful if you could tell me why you are requesting a refund and whether there is any issue with the product or service you received. I will do my best to process your refund as quickly and efficiently as possible. Please keep in mind that refunds may take up to 5-7 business days to process, depending on your bank or payment method. Let me know if you have any questions or concerns.


In [None]:
import asyncio
import sys
import re
from transformers import pipeline

# --- Load Hugging Face Model ---
llm_pipeline = pipeline(
    "text-generation",
    model="tiiuae/falcon-7b-instruct",
    tokenizer="tiiuae/falcon-7b-instruct",
    device=0  # <-- FORCE CPU TO AVOID meta ERROR
)

# --- Synchronous Model Call ---
def _call_llm_sync(prompt: str, max_new_tokens: int = 256, temperature: float = 0.7) -> str:
    wrapped_prompt = f"<s>[INST] {prompt.strip()} [/INST]"
    response = llm_pipeline(
        wrapped_prompt,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=temperature,
        return_full_text=False
    )
    return response[0]["generated_text"].strip()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
# --- Async Wrapper ---
async def async_call_llm(prompt: str, max_new_tokens: int = 256, temperature: float = 0.7) -> str:
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, _call_llm_sync, prompt, max_new_tokens, temperature)

# --- Summarize a Single Section ---
async def summarize(text_section: str) -> str:
    return await async_call_llm(f"Summarize this:\n{text_section}")

# --- Summarize Whole Document (Parallel) ---
async def summarize_document(doc: str) -> str:
    sections = [sec.strip() for sec in doc.split("\n\n") if sec.strip()]
    tasks = [summarize(sec) for sec in sections]
    partials = await asyncio.gather(*tasks)
    combined = "\n".join(partials)
    final_summary = await async_call_llm(f"Combine these into a final summary:\n{combined}")
    return final_summary

# --- Main ---
def run_summary_pipeline():
    long_doc = """
    AI agents are computer programs that can act autonomously on behalf of a user or another program.
    They are used in applications ranging from personal assistants to robotics.

    These agents can learn from their environment and improve their performance over time using machine learning.

    One of the biggest benefits of AI agents is their ability to automate repetitive or time-consuming tasks.

    However, challenges such as bias, interpretability, and safety must be addressed to ensure responsible AI use.
    """

    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None

    if loop and loop.is_running():
        # Jupyter/Colab or async environment
        import nest_asyncio
        nest_asyncio.apply()
        summary = asyncio.ensure_future(summarize_document(long_doc))
        loop.run_until_complete(summary)
        print("\n📄 Final Summary:\n", summary.result())
    else:
        # Script environment
        summary = asyncio.run(summarize_document(long_doc))
        print("\n📄 Final Summary:\n", summary)

# --- Entry Point ---
if __name__ == "__main__":
    run_summary_pipeline()

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
