# LlamaStack Client Integration Demo

This notebook demonstrates how to integrate with LlamaStack Distribution on OpenShift AI.

## What You'll Learn

1. **Connect to LlamaStack** - Initialize the client
2. **List Available Models** - See vLLM and Azure OpenAI providers
3. **List MCP Servers** - Discover available tools
4. **Switch Providers** - Change from vLLM to Azure OpenAI
5. **Use MCP Tools** - Call tools through agents

## Prerequisites

- LlamaStack Distribution deployed on OpenShift
- Access to the LlamaStack endpoint
- `llama-stack-client` Python package installed

## Setup

First, let's install the required packages and set up our environment.

In [None]:
# Install required packages
%pip install -q llama-stack-client requests python-dotenv

In [None]:
import os
import json
import requests
from typing import List, Dict

# LlamaStack endpoint - update this to your deployment
# For OpenShift internal access:
LLAMASTACK_URL = os.getenv("LLAMASTACK_URL", "http://lsd-genai-playground-service.my-first-model.svc.cluster.local:8321")

# For external access (if you have a route):
# LLAMASTACK_URL = "https://llamastack-route.apps.your-cluster.com"

print(f"LlamaStack URL: {LLAMASTACK_URL}")

## 1. Connect to LlamaStack

Let's verify we can connect to the LlamaStack server.

In [None]:
def check_health():
    """Check if LlamaStack is healthy."""
    try:
        response = requests.get(f"{LLAMASTACK_URL}/v1/health", timeout=5)
        if response.status_code == 200:
            print("‚úÖ LlamaStack is healthy!")
            return True
        else:
            print(f"‚ùå Health check failed: {response.status_code}")
            return False
    except Exception as e:
        print(f"‚ùå Connection error: {e}")
        return False

check_health()

## 2. List Available Models

LlamaStack can have multiple inference providers. Let's see what models are available.

In [None]:
def get_models() -> List[Dict]:
    """Fetch all available models from LlamaStack."""
    response = requests.get(f"{LLAMASTACK_URL}/v1/models", timeout=10)
    if response.status_code == 200:
        data = response.json()
        return data.get("data", [])
    return []

models = get_models()

# Filter to LLM models only
llm_models = [m for m in models if m.get("model_type") == "llm"]

print(f"üìä Total models available: {len(models)}")
print(f"ü§ñ LLM models: {len(llm_models)}")
print()

# Group by provider
providers = {}
for m in llm_models:
    provider = m.get("provider_id", "unknown")
    if provider not in providers:
        providers[provider] = []
    providers[provider].append(m.get("identifier", m.get("model_id")))

print("Models by Provider:")
print("=" * 50)
for provider, model_list in providers.items():
    print(f"\nüîπ {provider} ({len(model_list)} models)")
    # Show first 5 models
    for model in model_list[:5]:
        print(f"   ‚Ä¢ {model}")
    if len(model_list) > 5:
        print(f"   ... and {len(model_list) - 5} more")

## 3. List MCP Servers (Tools)

MCP (Model Context Protocol) servers provide tools that the LLM can use. Let's see what's available.

In [None]:
def get_tools() -> List[Dict]:
    """Fetch all available tools from LlamaStack."""
    response = requests.get(f"{LLAMASTACK_URL}/v1/tools", timeout=10)
    if response.status_code == 200:
        data = response.json()
        if isinstance(data, list):
            return data
        return data.get("data", [])
    return []

tools = get_tools()

print(f"üõ†Ô∏è Total tools available: {len(tools)}")
print()

# Group by toolgroup (MCP server)
toolgroups = {}
for t in tools:
    tg = t.get("toolgroup_id", "unknown")
    if tg not in toolgroups:
        toolgroups[tg] = []
    toolgroups[tg].append(t.get("name", "unknown"))

print("MCP Servers (Tool Groups):")
print("=" * 50)
for tg, tool_list in sorted(toolgroups.items()):
    icon = "üå§Ô∏è" if "weather" in tg else "üë•" if "hr" in tg else "üìã" if "jira" in tg else "üêô" if "github" in tg else "üîß"
    print(f"\n{icon} {tg} ({len(tool_list)} tools)")
    for tool in tool_list:
        print(f"   ‚Ä¢ {tool}")

## 4. Chat Completion with Different Providers

Now let's demonstrate switching between providers. We'll use the OpenAI-compatible API.

In [None]:
def chat_completion(model_id: str, message: str, tools: List[Dict] = None) -> Dict:
    """Send a chat completion request to LlamaStack."""
    payload = {
        "model": model_id,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant with access to various tools."},
            {"role": "user", "content": message}
        ],
        "temperature": 0.7,
        "max_tokens": 1024
    }
    
    if tools:
        payload["tools"] = tools
        payload["tool_choice"] = "auto"
    
    response = requests.post(
        f"{LLAMASTACK_URL}/v1/openai/v1/chat/completions",
        json=payload,
        timeout=60
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        return {"error": f"Status {response.status_code}: {response.text}"}

print("Chat completion function ready!")

### 4.1 Using Local vLLM (Llama 3.2-3B)

In [None]:
# Use local vLLM model
VLLM_MODEL = "vllm-inference/llama-32-3b-instruct"

print(f"ü§ñ Using model: {VLLM_MODEL}")
print("=" * 50)

response = chat_completion(
    model_id=VLLM_MODEL,
    message="What is the capital of France? Answer in one sentence."
)

if "error" in response:
    print(f"‚ùå Error: {response['error']}")
else:
    content = response.get("choices", [{}])[0].get("message", {}).get("content", "")
    print(f"\nüìù Response from vLLM (Llama 3.2-3B):")
    print(content)

### 4.2 Using Azure OpenAI (GPT-4.1-mini)

Now let's switch to Azure OpenAI - just change the model ID!

In [None]:
# Switch to Azure OpenAI - just change the model ID!
AZURE_MODEL = "azure-openai/gpt-4.1-mini"

print(f"ü§ñ Using model: {AZURE_MODEL}")
print("=" * 50)

response = chat_completion(
    model_id=AZURE_MODEL,
    message="What is the capital of France? Answer in one sentence."
)

if "error" in response:
    print(f"‚ùå Error: {response['error']}")
else:
    content = response.get("choices", [{}])[0].get("message", {}).get("content", "")
    print(f"\nüìù Response from Azure OpenAI (GPT-4.1-mini):")
    print(content)

## 5. Using MCP Tools

Let's test the MCP servers by invoking tools directly.

In [None]:
def invoke_tool(tool_name: str, kwargs: Dict) -> str:
    """Invoke a tool directly via LlamaStack."""
    response = requests.post(
        f"{LLAMASTACK_URL}/v1/tool-runtime/invoke",
        json={"tool_name": tool_name, "kwargs": kwargs},
        timeout=30
    )
    
    if response.status_code == 200:
        result = response.json()
        content = result.get("content", [])
        if isinstance(content, list) and content:
            return content[0].get("text", str(content))
        return str(result)
    else:
        return f"Error: {response.status_code}"

# Test HR MCP - List employees
print("üë• HR MCP Server - List Employees")
print("=" * 50)
result = invoke_tool("list_employees", {})
print(result[:1000])  # Show first 1000 chars

In [None]:
# Test GitHub MCP - Search repositories
print("üêô GitHub MCP Server - Search Repositories")
print("=" * 50)
result = invoke_tool("search_repositories", {"query": "llamastack"})
print(result[:1500])

In [None]:
# Test Jira MCP - List projects
print("üìã Jira/Confluence MCP Server - List Projects")
print("=" * 50)
result = invoke_tool("list_projects", {})
print(result)

## 6. Summary

In this notebook, we demonstrated:

1. ‚úÖ **Connecting to LlamaStack** - Simple HTTP API
2. ‚úÖ **Listing Models** - Both vLLM and Azure OpenAI providers  
3. ‚úÖ **Listing MCP Servers** - Weather, HR, Jira, GitHub tools
4. ‚úÖ **Switching Providers** - Just change the `model_id`!
5. ‚úÖ **Using MCP Tools** - Direct invocation

### Key Takeaways

- **Provider Switching is Easy**: Just change the model ID from `vllm-inference/llama-32-3b-instruct` to `azure-openai/gpt-4.1-mini`
- **MCP Tools are Unified**: All tools are accessible through the same API regardless of which MCP server provides them
- **OpenAI-Compatible API**: Use familiar OpenAI SDK patterns with LlamaStack

In [None]:
# Final summary
print("üìä LlamaStack Configuration Summary")
print("=" * 50)
print(f"\nüîó Endpoint: {LLAMASTACK_URL}")
print(f"\nü§ñ Inference Providers:")
for provider in providers.keys():
    print(f"   ‚Ä¢ {provider}")
print(f"\nüõ†Ô∏è MCP Servers:")
for tg in toolgroups.keys():
    print(f"   ‚Ä¢ {tg}")
print(f"\nüìà Total: {len(llm_models)} LLM models, {len(tools)} tools")