# 10 Minutes to kanoa

This notebook provides a comprehensive introduction to **kanoa** - an AI-powered analytics interpreter.

## Features Covered

    1. **Multi-Backend Support**: Gemini, Claude, and OpenAI (vLLM/Molmo/Gemma 3)
2. **Knowledge Base Integration**: Text and PDF support
3. **Matplotlib Figure Interpretation**: Direct visualization analysis
4. **DataFrame Analysis**: Tabular data interpretation
5. **Cost Tracking**: Token usage and cost monitoring
6. **Caching**: Efficient context reuse

---

## Setup

First, let's import the necessary libraries and set up our environment.

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

# Import kanoa
from kanoa import AnalyticsInterpreter

# Set plot style
plt.style.use("seaborn-v0_8-darkgrid")

print("✓ Setup complete!")

## 1. Basic Figure Interpretation with Gemini

Let's start with a simple example: interpreting a matplotlib figure using the Gemini backend.

In [None]:
# Create sample data
np.random.seed(42)
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
sales = [45000, 52000, 48000, 61000, 58000, 67000]
costs = [32000, 35000, 33000, 40000, 38000, 42000]

# Create visualization
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(len(months))
width = 0.35

ax.bar(x - width / 2, sales, width, label="Sales", color="#2ecc71")
ax.bar(x + width / 2, costs, width, label="Costs", color="#e74c3c")

ax.set_xlabel("Month", fontsize=12)
ax.set_ylabel("Amount ($)", fontsize=12)
ax.set_title("Q1-Q2 2024 Sales vs Costs", fontsize=14, fontweight="bold")
ax.set_xticks(x)
ax.set_xticklabels(months)
ax.legend()
ax.grid(axis="y", alpha=0.3)

plt.tight_layout()
plt.show()

print("Figure created! Now let's interpret it...")

In [None]:
# Initialize interpreter with Gemini backend
interpreter = AnalyticsInterpreter(backend="gemini-3", track_costs=True)

# Interpret the figure
result = interpreter.interpret_figure(
    fig=fig,
    context="This shows our company's sales and costs for the first half of 2024",
    focus="Identify trends and calculate profit margins",
    display_result=True,
)

## 2. DataFrame Analysis

kanoa can also interpret tabular data directly.

In [None]:
# Create sample DataFrame
df = pd.DataFrame(
    {
        "Month": months,
        "Sales": sales,
        "Costs": costs,
        "Profit": [s - c for s, c in zip(sales, costs)],
        "Margin_%": [(s - c) / s * 100 for s, c in zip(sales, costs)],
    }
)

print("Financial Summary:")
display(df)

In [None]:
# Interpret the DataFrame
result = interpreter.interpret_dataframe(
    df=df,
    context="Monthly financial performance data",
    focus="Analyze profit trends and identify any concerning patterns",
    display_result=True,
)

## 3. Knowledge Base Integration

Let's create a simple knowledge base and see how it enhances interpretations.

In [None]:
# Create a temporary knowledge base directory
kb_dir = Path("./temp_kb")
kb_dir.mkdir(exist_ok=True)

# Write domain knowledge
kb_content = """# Company Context

## Business Model
We are a SaaS company selling data analytics software to enterprise clients.

## Key Metrics
- Target profit margin: 35-40%
- Seasonal patterns: Q2 typically sees 15-20% growth over Q1
- Cost structure: 60% fixed costs, 40% variable

## Strategic Goals
- Maintain profit margins above 35%
- Achieve 10% month-over-month growth
- Keep cost growth below revenue growth
"""

with open(kb_dir / "company_context.md", "w") as f:
    f.write(kb_content)

print("✓ Knowledge base created!")

In [None]:
# Initialize interpreter with knowledge base
interpreter_with_kb = AnalyticsInterpreter(
    backend="gemini-3", kb_path=str(kb_dir), kb_type="text", track_costs=True
)

# Interpret with domain knowledge
result = interpreter_with_kb.interpret_dataframe(
    df=df,
    context="Monthly financial performance",
    focus="Evaluate performance against our strategic goals and industry benchmarks",
    display_result=True,
)

## 4. Multi-Backend Comparison

Let's compare interpretations across different backends.

In [None]:
# Create a time series plot
fig2, ax2 = plt.subplots(figsize=(10, 6))

# Generate synthetic time series data
days = np.arange(30)
baseline = 100
trend = days * 0.5
seasonal = 10 * np.sin(2 * np.pi * days / 7)
noise = np.random.normal(0, 3, 30)
values = baseline + trend + seasonal + noise

ax2.plot(days, values, marker="o", linewidth=2, markersize=4, color="#3498db")
ax2.axhline(y=baseline, color="gray", linestyle="--", alpha=0.5, label="Baseline")
ax2.fill_between(days, baseline, values, alpha=0.2, color="#3498db")

ax2.set_xlabel("Days", fontsize=12)
ax2.set_ylabel("Metric Value", fontsize=12)
ax2.set_title("30-Day Performance Metric", fontsize=14, fontweight="bold")
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 4.1 Gemini Interpretation

In [None]:
gemini_interpreter = AnalyticsInterpreter(backend="gemini-3", track_costs=True)

print("=" * 60)
print("GEMINI INTERPRETATION")
print("=" * 60)

gemini_result = gemini_interpreter.interpret_figure(
    fig=fig2,
    context="Daily performance metric tracking",
    focus="Identify trends, patterns, and anomalies",
    display_result=True,
)

### 4.2 Claude Interpretation

In [None]:
# Note: Requires ANTHROPIC_API_KEY environment variable
try:
    claude_interpreter = AnalyticsInterpreter(backend="claude", track_costs=True)

    print("=" * 60)
    print("CLAUDE INTERPRETATION")
    print("=" * 60)

    claude_result = claude_interpreter.interpret_figure(
        fig=fig2,
        context="Daily performance metric tracking",
        focus="Identify trends, patterns, and anomalies",
        display_result=True,
    )
except Exception as e:
    print(f"⚠️ Claude backend not available: {e}")
    print("Set ANTHROPIC_API_KEY to enable Claude backend.")

### 4.3 OpenAI / vLLM (Local) Interpretation

Connect to a local vLLM server hosting open-source models like **Ai2 Molmo** or **Gemma 3**.

This approach allows you to run powerful models on your own infrastructure (via `kanoa-mlops`) with full privacy.

In [None]:
# Note: Requires local vLLM server running (see kanoa-mlops)
try:
    # Connect to local vLLM hosting Molmo or Gemma 3
    vllm_interpreter = AnalyticsInterpreter(
        backend="openai",
        api_base="http://localhost:8000/v1",
        model="allenai/Molmo-7B-D-0924",  # or 'google/gemma-3-12b-it'
        track_costs=True,
    )

    print("=" * 60)
    print("vLLM (Molmo/Gemma 3) INTERPRETATION")
    print("=" * 60)

    vllm_result = vllm_interpreter.interpret_figure(
        fig=fig2,
        context="Daily performance metric tracking",
        focus="Identify trends, patterns, and anomalies",
        display_result=True,
    )
except Exception as e:
    print(f"⚠️ vLLM backend not available: {e}")
    print("Ensure vLLM server is running on localhost:8000")

## 5. Cost Tracking

One of kanoa's key features is transparent cost tracking.

In [None]:
# Get cost summary for Gemini
gemini_costs = gemini_interpreter.get_cost_summary()

print("\n" + "=" * 60)
print("COST SUMMARY - GEMINI")
print("=" * 60)
print(f"Backend: {gemini_costs['backend']}")
print(f"Total API calls: {gemini_costs['total_calls']}")
print(f"Input tokens: {gemini_costs['total_tokens']['input']:,}")
print(f"Output tokens: {gemini_costs['total_tokens']['output']:,}")
print(f"Total cost: ${gemini_costs['total_cost_usd']:.4f}")
print(f"Average cost per call: ${gemini_costs['avg_cost_per_call']:.4f}")

# Compare with Claude if available
try:
    claude_costs = claude_interpreter.get_cost_summary()
    print("\n" + "=" * 60)
    print("COST SUMMARY - CLAUDE")
    print("=" * 60)
    print(f"Backend: {claude_costs['backend']}")
    print(f"Total API calls: {claude_costs['total_calls']}")
    print(f"Input tokens: {claude_costs['total_tokens']['input']:,}")
    print(f"Output tokens: {claude_costs['total_tokens']['output']:,}")
    print(f"Total cost: ${claude_costs['total_cost_usd']:.4f}")
    print(f"Average cost per call: ${claude_costs['avg_cost_per_call']:.4f}")
except:
    pass

## 6. Advanced: Complex Visualization

Let's test kanoa with a more complex multi-panel visualization.

In [None]:
# Create complex multi-panel figure
fig3, axes = plt.subplots(2, 2, figsize=(14, 10))
fig3.suptitle("Comprehensive Analytics Dashboard", fontsize=16, fontweight="bold")

# Panel 1: Line plot with confidence interval
x = np.linspace(0, 10, 100)
y = np.sin(x) + x / 5
y_upper = y + 0.3
y_lower = y - 0.3

axes[0, 0].plot(x, y, "b-", linewidth=2, label="Actual")
axes[0, 0].fill_between(x, y_lower, y_upper, alpha=0.3, label="95% CI")
axes[0, 0].set_title("Time Series with Confidence Interval")
axes[0, 0].set_xlabel("Time")
axes[0, 0].set_ylabel("Value")
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Panel 2: Scatter plot with trend
x_scatter = np.random.randn(100)
y_scatter = 2 * x_scatter + np.random.randn(100) * 0.5
axes[0, 1].scatter(x_scatter, y_scatter, alpha=0.6, c=y_scatter, cmap="viridis")
z = np.polyfit(x_scatter, y_scatter, 1)
p = np.poly1d(z)
axes[0, 1].plot(x_scatter, p(x_scatter), "r--", alpha=0.8, linewidth=2)
axes[0, 1].set_title("Correlation Analysis")
axes[0, 1].set_xlabel("Feature X")
axes[0, 1].set_ylabel("Feature Y")
axes[0, 1].grid(True, alpha=0.3)

# Panel 3: Distribution
data_dist = np.random.normal(100, 15, 1000)
axes[1, 0].hist(data_dist, bins=30, edgecolor="black", alpha=0.7, color="#e74c3c")
axes[1, 0].axvline(
    data_dist.mean(),
    color="blue",
    linestyle="--",
    linewidth=2,
    label=f"Mean: {data_dist.mean():.1f}",
)
axes[1, 0].set_title("Distribution Analysis")
axes[1, 0].set_xlabel("Value")
axes[1, 0].set_ylabel("Frequency")
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3, axis="y")

# Panel 4: Box plot comparison
data_groups = [
    np.random.normal(100, 10, 100),
    np.random.normal(110, 15, 100),
    np.random.normal(95, 8, 100),
    np.random.normal(105, 12, 100),
]
axes[1, 1].boxplot(data_groups, labels=["Group A", "Group B", "Group C", "Group D"])
axes[1, 1].set_title("Group Comparison")
axes[1, 1].set_ylabel("Metric Value")
axes[1, 1].grid(True, alpha=0.3, axis="y")

plt.tight_layout()
plt.show()

In [None]:
# Interpret the complex dashboard
result = interpreter.interpret_figure(
    fig=fig3,
    context="Multi-panel analytics dashboard showing various statistical analyses",
    focus="Provide a comprehensive interpretation of all four panels, highlighting key insights and relationships",
    display_result=True,
)

## 7. Cleanup

In [None]:
# Clean up temporary knowledge base
import shutil

if kb_dir.exists():
    shutil.rmtree(kb_dir)
    print("✓ Cleaned up temporary files")

## Summary

This notebook demonstrated the core capabilities of kanoa:

✓ **Multi-Backend Support**: Seamlessly switch between Gemini, Claude, and Molmo

✓ **Knowledge Base Integration**: Enhance interpretations with domain-specific context

✓ **Flexible Input Types**: Interpret matplotlib figures and DataFrames

✓ **Cost Tracking**: Monitor token usage and API costs

✓ **Easy Integration**: Simple API that works directly in Jupyter notebooks

### Next Steps

Explore more advanced features:
- [User Guide](../docs/source/user_guide/index.md)
- [API Reference](../docs/source/api/index.md)
- [Backend-specific features](../docs/source/user_guide/backends.md)

---

*For more information, see the [kanoa documentation](https://github.com/lhzn-io/kanoa)*