# Notebook 03: LlamaStack Core Features

## üéØ What is This Notebook About?

Welcome! This notebook introduces you to the **core features of LlamaStack** that make powerful autonomous agents possible.

**What we'll learn:**
1. **Simple Chat** - Basic LLM interactions
2. **RAG (Retrieval Augmented Generation)** - Enhancing LLMs with external knowledge
3. **MCP (Model Context Protocol)** - Integrating external tools and data sources
4. **Safety** - Content moderation and safety shields
5. **Evaluation** - Measuring and improving AI performance

**Why this matters:**
- Understanding these features helps you build better agents
- Each feature solves specific problems
- Combining them creates powerful autonomous systems
- These are the building blocks for advanced agents

---

## üìö Learning Objectives

By the end of this notebook, you will:
- ‚úÖ Understand each LlamaStack core feature
- ‚úÖ Know when to use each feature
- ‚úÖ See practical examples of each feature
- ‚úÖ Be ready to combine features in advanced agents

---

## ‚öôÔ∏è Setup

Let's start by setting up our environment and connecting to LlamaStack.


In [None]:
# Import required libraries
import os
import sys
from pathlib import Path

# Add src to path for imports
notebook_dir = Path().resolve()
src_path = notebook_dir.parent / 'src'
sys.path.insert(0, str(src_path))

# Import LlamaStack SDK
from llama_stack_client import LlamaStackClient

# Configuration
llamastack_url = os.getenv("LLAMA_STACK_URL", "http://localhost:8321")
model = os.getenv("LLAMA_MODEL", "ollama/llama3.2:3b")

print(f"üì° LlamaStack URL: {llamastack_url}")
print(f"ü§ñ Model: {model}")

# Initialize client
client = LlamaStackClient(base_url=llamastack_url)

# Verify connection
try:
    models = client.models.list()
    print(f"‚úÖ Connected to LlamaStack")
    print(f"   Available models: {len(models)}")
    if models:
        print(f"   Using model: {model}")
except Exception as e:
    print(f"‚ùå Cannot connect to LlamaStack: {e}")
    print("   Please ensure LlamaStack is running:")
    print("   python scripts/start_llama_stack.py")
    raise


---

## Part 1: Simple Chat

### What is Chat?

**Chat completion** is the most basic way to interact with an LLM. You send messages and get responses.

**Key Concepts:**
- **Messages** have roles: `system`, `user`, `assistant`
- **System messages** set the AI's behavior and personality
- **User messages** are your questions or requests
- **Assistant messages** are the AI's responses (in conversation history)

**Use Cases:**
- Simple Q&A
- Conversational interfaces
- Text generation
- Basic reasoning tasks

Let's see it in action!


In [None]:
# Example 1: Basic Chat Completion
print("=" * 60)
print("Example 1: Basic Chat")
print("=" * 60)

response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful IT operations assistant."},
        {"role": "user", "content": "What is the capital of France? Answer in one sentence."}
    ],
)

print(f"\n‚úÖ Chat response received!")
if hasattr(response, 'choices') and response.choices:
    content = response.choices[0].message.content
    print(f"\nResponse: {content}")
else:
    print(f"Response: {response}")


In [None]:
# Example 2: Multi-turn Conversation
print("=" * 60)
print("Example 2: Multi-turn Conversation")
print("=" * 60)

# First turn
response1 = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Alice. What's my name?"}
    ],
)

print("Turn 1:")
if response1.choices:
    print(f"  User: My name is Alice. What's my name?")
    print(f"  Assistant: {response1.choices[0].message.content}")

# Second turn (with conversation history)
response2 = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Alice. What's my name?"},
        {"role": "assistant", "content": response1.choices[0].message.content},
        {"role": "user", "content": "What did I just tell you my name was?"}
    ],
)

print("\nTurn 2:")
if response2.choices:
    print(f"  User: What did I just tell you my name was?")
    print(f"  Assistant: {response2.choices[0].message.content}")


In [None]:
# Example 3: Streaming Response
print("=" * 60)
print("Example 3: Streaming Response")
print("=" * 60)

print("Question: Explain what an autonomous agent is in 2-3 sentences.\n")
print("Streaming response:")

stream = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful AI educator."},
        {"role": "user", "content": "Explain what an autonomous agent is in 2-3 sentences."}
    ],
    stream=True
)

full_response = ""
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        content = chunk.choices[0].delta.content
        print(content, end="", flush=True)
        full_response += content

print("\n\n‚úÖ Streaming complete!")


---

## Part 2: RAG (Retrieval Augmented Generation)

### What is RAG?

**RAG** enhances LLMs by giving them access to external knowledge through **vector stores**.

**How it works:**
1. **Store documents** in a vector store (documents are converted to embeddings)
2. **Search** for relevant documents when needed
3. **Retrieve** context from documents
4. **Augment** the LLM prompt with retrieved context
5. **Generate** response using both LLM knowledge and retrieved context

**Use Cases:**
- Answering questions about specific documents
- Knowledge bases and documentation
- Domain-specific information
- Up-to-date information not in training data

**Key Concepts:**
- **Vector Store**: Database of document embeddings
- **Embeddings**: Numerical representations of text
- **Retrieval**: Finding relevant documents
- **Context**: Retrieved information added to prompts

Let's explore RAG!


In [None]:
# Example 1: Check Vector Stores
print("=" * 60)
print("Example 1: Exploring Vector Stores")
print("=" * 60)

# List existing vector stores
vector_stores = client.vector_stores.list()
store_count = len(vector_stores) if hasattr(vector_stores, '__len__') else 0
print(f"Found {store_count} existing vector stores")

# Show available methods
print(f"\nAvailable vector store operations:")
print(f"  - create: Create a new vector store")
print(f"  - list: List all vector stores")
print(f"  - search: Search for relevant documents")
print(f"  - files: Manage files in vector stores")


In [None]:
# Example 2: Create a Vector Store (if supported)
print("=" * 60)
print("Example 2: Creating a Vector Store")
print("=" * 60)

try:
    # Create a test vector store
    vs_response = client.vector_stores.create(
        name="test-rag-store",
        description="Test vector store for RAG demonstration"
    )
    
    store_id = vs_response.id if hasattr(vs_response, 'id') else 'created'
    print(f"‚úÖ Created vector store: {store_id}")
    
    print("\nüìù To use RAG:")
    print("  1. Add files: client.vector_stores.files.create(vector_store_id=store_id, ...)")
    print("  2. Search: client.vector_stores.search(vector_store_id=store_id, query='...')")
    print("  3. Use retrieved context in chat completion")
    
    # Clean up
    if hasattr(vs_response, 'id'):
        try:
            client.vector_stores.delete(vs_response.id)
            print("\nüßπ Cleaned up test vector store")
        except:
            pass
            
except Exception as e:
    print(f"‚ÑπÔ∏è  Vector store creation: {e}")
    print("\nüìù RAG Workflow:")
    print("  1. Create a vector store")
    print("  2. Add documents/files to the store")
    print("  3. Search for relevant documents")
    print("  4. Use retrieved context in chat completion")


---

## Part 3: MCP (Model Context Protocol)

### What is MCP?

**MCP** allows agents to access external tools, data sources, and services.

**Key Concepts:**
- **Tool Runtime**: Execution environment for tools
- **Tool Groups**: Collections of related tools
- **Tool Execution**: Running tools and getting results
- **External Integration**: Connecting to APIs, databases, services

**Use Cases:**
- API integrations
- Database queries
- External service calls
- Custom tool execution

**Why it matters:**
- Agents can do more than just chat
- Access real-world data and services
- Extend agent capabilities
- Enable autonomous actions

Let's explore MCP!


In [None]:
# Example 1: Explore Tool Runtime
print("=" * 60)
print("Example 1: Tool Runtime")
print("=" * 60)

# Check tool runtime
if hasattr(client, 'tool_runtime'):
    print("‚úÖ Tool runtime available")
    print(f"   Methods: {[x for x in dir(client.tool_runtime) if not x.startswith('_')]}")
    
    # List available tools
    try:
        tools = client.tool_runtime.list_tools()
        print(f"\n   Available tools: {len(tools) if hasattr(tools, '__len__') else 'N/A'}")
    except:
        print("   ‚ÑπÔ∏è  Tools may need to be configured")
else:
    print("‚ö†Ô∏è  Tool runtime not available")


In [None]:
# Example 2: Tool Groups
print("=" * 60)
print("Example 2: Tool Groups")
print("=" * 60)

# Check tool groups
if hasattr(client, 'toolgroups'):
    toolgroups = client.toolgroups.list()
    group_count = len(toolgroups) if hasattr(toolgroups, '__len__') else 0
    print(f"Found {group_count} tool groups")
    
    print("\nüìù Tool groups organize related tools together")
    print("   - Tools can be grouped by functionality")
    print("   - Agents can access tools through tool groups")
    print("   - MCP enables external tool integration")
else:
    print("‚ö†Ô∏è  Tool groups not available")


---

## Part 4: Safety

### What is Safety?

**Safety** ensures AI systems behave responsibly and safely.

**Key Concepts:**
- **Safety Shields**: Filters that check content before/after generation
- **Content Moderation**: Detecting harmful or inappropriate content
- **Safety Policies**: Rules for what content is allowed
- **Safe AI Practices**: Best practices for responsible AI

**Use Cases:**
- Filtering harmful content
- Preventing inappropriate responses
- Ensuring compliance
- Protecting users

**Why it matters:**
- Essential for production systems
- Protects users and organizations
- Ensures responsible AI use
- Required for many applications

Let's explore Safety!


In [None]:
# Example 1: Safety API
print("=" * 60)
print("Example 1: Safety Infrastructure")
print("=" * 60)

if hasattr(client, 'safety'):
    print("‚úÖ Safety API available")
    print(f"   Methods: {[x for x in dir(client.safety) if not x.startswith('_')]}")
    
    print("\nüìù Safety Features:")
    print("  - Safety shields: Filter content")
    print("  - Content moderation: Check for harmful content")
    print("  - Safety policies: Define what's allowed")
    print("  - Integration: Safety built into chat completions")
else:
    print("‚ö†Ô∏è  Safety API not available")


In [None]:
# Example 2: Content Moderation (if shield configured)
print("=" * 60)
print("Example 2: Content Moderation")
print("=" * 60)

test_text = "This is a test message to check safety filters."

if hasattr(client, 'moderations'):
    try:
        moderation = client.moderations.create(
            input=test_text,
            model=model
        )
        print(f"‚úÖ Moderation check successful!")
        print(f"   Input: {test_text}")
        if hasattr(moderation, 'results') and moderation.results:
            result = moderation.results[0]
            flagged = getattr(result, 'flagged', False)
            print(f"   Flagged: {flagged}")
            if flagged:
                categories = getattr(result, 'categories', {})
                print(f"   Categories: {categories}")
    except Exception as e:
        print(f"‚ÑπÔ∏è  Moderation requires shield configuration")
        print(f"   Error: {e}")
        print("\nüìù To use safety:")
        print("  1. Configure a safety shield for your model")
        print("  2. Use safety.run_shield() to check content")
        print("  3. Safety is often integrated into chat completions")
else:
    print("‚ÑπÔ∏è  Moderation API available but requires configuration")


---

## Part 5: Evaluation

### What is Evaluation?

**Evaluation** measures how well your AI system performs.

**Key Concepts:**
- **Evaluation Dataset**: Test cases with inputs and expected outputs
- **Metrics**: Ways to measure performance (accuracy, BLEU, ROUGE, etc.)
- **Evaluation Jobs**: Running evaluations on datasets
- **Performance Tracking**: Monitoring improvements over time

**Use Cases:**
- Measuring model performance
- Comparing different models
- Tracking improvements
- Quality assurance

**Why it matters:**
- Know if your system works well
- Identify areas for improvement
- Compare different approaches
- Ensure quality

Let's explore Evaluation!


In [None]:
# Example 1: Evaluation API
print("=" * 60)
print("Example 1: Evaluation Infrastructure")
print("=" * 60)

if hasattr(client, 'alpha') and hasattr(client.alpha, 'eval'):
    eval_api = client.alpha.eval
    print("‚úÖ Evaluation API available")
    
    eval_methods = [x for x in dir(eval_api) if not x.startswith('_') 
                    and x not in ['with_raw_response', 'with_streaming_response']]
    print(f"   Available methods: {eval_methods}")
else:
    print("‚ö†Ô∏è  Evaluation API not available")


In [None]:
# Example 2: Creating an Evaluation Dataset
print("=" * 60)
print("Example 2: Evaluation Dataset")
print("=" * 60)

# Create a simple evaluation dataset
test_data = [
    {
        "input": "What is 2+2?",
        "expected_output": "4"
    },
    {
        "input": "What is the capital of France?",
        "expected_output": "Paris"
    },
    {
        "input": "What color is the sky?",
        "expected_output": "blue"
    }
]

print(f"Created evaluation dataset with {len(test_data)} examples:")
for i, example in enumerate(test_data, 1):
    print(f"  {i}. Input: '{example['input']}' ‚Üí Expected: '{example['expected_output']}'")

print("\nüìù To run evaluation:")
print("  eval_job = client.alpha.eval.run_eval_alpha(")
print("      dataset=test_data,")
print(f"      model='{model}',")
print("      metrics=['accuracy', 'bleu'],")
print("  )")
print("\nüìù Evaluation Metrics:")
print("  - accuracy: Correctness of responses")
print("  - bleu: Text similarity (for text generation)")
print("  - rouge: Text overlap (for summarization)")
print("  - custom: Your own metrics")


---

## Summary

### What We Learned

In this notebook, we explored five core LlamaStack features:

1. **Simple Chat** - Basic LLM interactions with messages and streaming
2. **RAG** - Enhancing LLMs with external knowledge through vector stores
3. **MCP** - Integrating external tools and data sources
4. **Safety** - Content moderation and safety shields
5. **Evaluation** - Measuring and improving AI performance

### When to Use Each Feature

- **Chat**: Simple Q&A, conversations, basic tasks
- **RAG**: When you need domain-specific or up-to-date knowledge
- **MCP**: When you need to interact with external systems
- **Safety**: Always in production, content filtering
- **Evaluation**: Measuring performance, comparing models, quality assurance

### How Features Work Together

These features can be combined:
- **Agent + RAG**: Knowledge-augmented agents
- **Agent + MCP**: Agents with external tool access
- **Agent + Safety**: Safe action-taking
- **Agent + Eval**: Measured and improved performance

### Next Steps

In the next notebook, we'll see how to combine these features to build advanced autonomous agents!

---

## üéØ Key Takeaways

‚úÖ Each feature solves specific problems  
‚úÖ Features can be used independently or together  
‚úÖ Understanding features helps build better agents  
‚úÖ Safety and evaluation are essential for production  
‚úÖ Next: Combining features in advanced agents
