# OpenClio: Analyzing AI Agent System Prompts

This notebook demonstrates how to use OpenClio with Vertex AI to analyze a collection of system prompts for AI agents.

## Installation

Install OpenClio from GitHub:

In [None]:
!pip install git+https://github.com/ggilligan12/openclio.git

## Setup

In [None]:
import openclio
from sentence_transformers import SentenceTransformer

# Your GCP project ID
PROJECT_ID = "your-project-id"  # Replace with your actual project ID
MODEL_NAME = "gemini-1.5-flash"  # or "gemini-1.5-pro" for better quality

## Load Your System Prompts

Replace this with your actual system prompts. Each prompt should be a string.

In [None]:
# Example system prompts (replace with your actual data)
system_prompts = [
    "You are a helpful customer support agent for a SaaS company. Help users troubleshoot technical issues with patience and clarity. Always be professional and empathetic.",
    "You are a creative writing assistant. Help users brainstorm story ideas, develop characters, and refine their prose. Be encouraging and provide constructive feedback.",
    "You are a code review bot. Analyze code for security vulnerabilities, performance issues, and style violations. Provide specific, actionable feedback with examples.",
    "You are a financial advisor chatbot. Provide general financial education and guidance. Never give specific investment advice. Always include appropriate disclaimers.",
    "You are a language learning tutor. Help users practice conversation, explain grammar concepts, and provide vocabulary assistance. Be encouraging and patient.",
    # Add your system prompts here...
]

print(f"Loaded {len(system_prompts)} system prompts")

## Initialize Models

In [None]:
# Initialize Vertex AI LLM
llm = openclio.VertexLLMInterface(
    model_name=MODEL_NAME,
    project_id=PROJECT_ID,
    location="us-central1",
    max_output_tokens=1000,
    temperature=0.7,
)

# Initialize embedding model
embedding_model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

print("✓ Models initialized")

## Run Clio Analysis

This will:
1. Extract facets from each system prompt (Primary Purpose, Domain, Key Capabilities, Interaction Style)
2. Cluster similar prompts
3. Build a hierarchy
4. Generate UMAP visualization

**Note**: This may take several minutes depending on the number of prompts and Vertex AI API rate limits.

In [None]:
results = openclio.runClio(
    facets=openclio.systemPromptFacets,
    llm=llm,
    embeddingModel=embedding_model,
    data=system_prompts,
    outputDirectory="./clio_output",
    displayWidget=True,
    llmBatchSize=10,  # Lower batch size for Vertex AI rate limits
    verbose=True,
)

## Explore Results

The widget above shows:
- **Left**: UMAP plot of all system prompts
- **Right Top**: Hierarchical tree of clusters
- **Right Bottom**: Text viewer for selected cluster

Click on clusters in the tree to view the system prompts in that cluster!

## Programmatic Access

You can also access the results programmatically:

In [None]:
# Get facet values for a specific prompt
prompt_idx = 0
print(f"System Prompt: {system_prompts[prompt_idx][:100]}...\n")
print("Extracted Facets:")
for fv in results.facetValues[prompt_idx].facetValues:
    print(f"  {fv.facet.name}: {fv.value}")

In [None]:
# Explore the hierarchy for a specific facet
facet_idx = 0  # Primary Purpose facet
facet = results.facets[facet_idx]
print(f"Hierarchy for facet: {facet.name}\n")

if results.rootClusters[facet_idx]:
    for i, root_cluster in enumerate(results.rootClusters[facet_idx]):
        print(f"Top-level cluster {i+1}: {root_cluster.name}")
        if root_cluster.children:
            for child in root_cluster.children[:3]:  # Show first 3 children
                print(f"  └─ {child.name}")
else:
    print("No clusters for this facet")

## Tips

- **Rate Limits**: Vertex AI has rate limits. Use `llmBatchSize=10` or lower if you hit quota errors
- **Model Selection**: `gemini-1.5-flash` is faster and cheaper, `gemini-1.5-pro` is more accurate
- **Data Size**: For large datasets (>1000 prompts), consider running overnight
- **Checkpointing**: Results are cached in `outputDirectory`, so you can resume if interrupted