# LlamaStack Client Demo - Phase 1

## Phase 1 Configuration
- **1 Inference Provider**: vLLM (Llama 3.2-3B)
- **1 MCP Server**: Weather only

This notebook demonstrates client integration with a minimal LlamaStack distribution.

In [None]:
# Install required packages
%pip install -q requests

In [None]:
import requests
import json
from typing import List, Dict

# LlamaStack endpoint (internal OpenShift service)
LLAMASTACK_URL = "http://lsd-genai-playground-service.my-first-model.svc.cluster.local:8321"

print(f"LlamaStack URL: {LLAMASTACK_URL}")

## 1. List Available Models

In Phase 1, we should see **only 1 LLM model** (vLLM).

In [None]:
response = requests.get(f"{LLAMASTACK_URL}/v1/models", timeout=10)
models = response.json().get("data", [])

# Filter to LLM models only
llm_models = [m for m in models if m.get("model_type") == "llm"]

print(f"ü§ñ LLM Models Available: {len(llm_models)}")
print("=" * 50)
for m in llm_models:
    print(f"  ‚Ä¢ {m.get('identifier')} ({m.get('provider_id')})")

## 2. List MCP Servers (Tools)

In Phase 1, we should see **only Weather MCP**.

In [None]:
response = requests.get(f"{LLAMASTACK_URL}/v1/tools", timeout=10)
data = response.json()
tools = data if isinstance(data, list) else data.get("data", [])

# Group by toolgroup (MCP server)
toolgroups = {}
for t in tools:
    tg = t.get("toolgroup_id", "unknown")
    if tg not in toolgroups:
        toolgroups[tg] = []
    toolgroups[tg].append(t.get("name", "unknown"))

# Count MCP servers (exclude builtin)
mcp_servers = [tg for tg in toolgroups.keys() if tg.startswith("mcp::")]

print(f"üõ†Ô∏è MCP Servers: {len(mcp_servers)}")
print(f"üìä Total Tools: {len(tools)}")
print("=" * 50)
for tg, tool_list in sorted(toolgroups.items()):
    icon = "üå§Ô∏è" if "weather" in tg else "üîß"
    print(f"\n{icon} {tg} ({len(tool_list)} tools)")
    for tool in tool_list:
        print(f"   ‚Ä¢ {tool}")

## 3. Test Chat Completion with vLLM

Using the **only available model**: vLLM (Llama 3.2-3B)

In [None]:
# Use the only available model: vLLM
MODEL_ID = "vllm-inference/llama-32-3b-instruct"

payload = {
    "model": MODEL_ID,
    "messages": [
        {"role": "user", "content": "What is the capital of France? Answer in one sentence."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
}

print(f"ü§ñ Using model: {MODEL_ID}")
print("=" * 50)

response = requests.post(
    f"{LLAMASTACK_URL}/v1/openai/v1/chat/completions",
    json=payload,
    timeout=60
)

if response.status_code == 200:
    result = response.json()
    content = result.get("choices", [{}])[0].get("message", {}).get("content", "")
    print(f"\nüìù Response from vLLM (Llama 3.2-3B):")
    print(content)
else:
    print(f"‚ùå Error: {response.status_code} - {response.text}")

## Phase 1 Summary

| Component | Count | Details |
|-----------|-------|---------|
| **LLM Models** | 1 | vLLM (Llama 3.2-3B) |
| **MCP Servers** | 1 | Weather |
| **Total Tools** | 3 | 2 RAG + 1 Weather |

---

### Next Step

After the admin applies the **Phase 2 configuration**, run the **phase2_client_demo.ipynb** notebook to see:
- 2 LLM models (vLLM + Azure OpenAI)
- 3 MCP servers (Weather + HR + Jira)