# AI Skills + Ollama (Local LLMs)

This notebook demonstrates how to use AI Skills with locally-running Ollama models.

## Prerequisites

1. Install Ollama: https://ollama.ai
2. Pull a model: `ollama pull llama3.1`
3. Ensure Ollama is running: `ollama serve`

## Setup

In [None]:
!pip install aiskills[ollama,search] -q

## Check Ollama Status

In [None]:
from aiskills.integrations import create_ollama_client

client = create_ollama_client(model="llama3.1")

# Check if model is available
if client.is_model_available():
    print("llama3.1 is available!")
else:
    print("llama3.1 not found. Run: ollama pull llama3.1")

# List available models
print("\nAvailable models:")
for model in client.list_local_models():
    print(f"  - {model['name']}")

## Quick Start: Tool Calling (Recommended)

For models that support tool calling (llama3.1, mistral, qwen2, etc.):

In [None]:
from aiskills.integrations import create_ollama_client

# Tool calling is auto-enabled for supported models
client = create_ollama_client(model="llama3.1")

response = client.chat("Help me debug a memory leak in Python")
print(response)

## Prompt Injection Mode

For models without tool support (codellama, phi, etc.), use prompt injection:

In [None]:
# Disable tools, use prompt injection instead
client = create_ollama_client(model="llama3.1", use_tools=False)

# Skill content is injected into the prompt
response = client.chat_with_skill(
    skill_query="python debugging",
    user_message="How do I find where my memory is being leaked?"
)
print(response)

## Direct Skill Operations

In [None]:
# List available skills
skills = client.list_skills()
print(f"Found {len(skills)} skills:\n")
for skill in skills[:5]:
    print(f"  - {skill['name']}: {skill['description'][:50]}...")

In [None]:
# Search for skills
results = client.search_skills("testing")
print(f"Found {results.total} testing-related skills:")
for r in results.results:
    print(f"  - {r['name']}")

In [None]:
# Use a skill directly
result = client.use_skill("write unit tests for Python")
if result.success:
    print(f"Skill: {result.skill_name}")
    print(f"Tokens: {result.tokens_used}")
    print(f"\nPreview:\n{result.content[:400]}...")

## Generation Mode

For code completions instead of chat:

In [None]:
client = create_ollama_client(model="llama3.1", use_tools=False)

# Generate code with skill context
code = client.generate_with_skill(
    skill_query="python testing",
    prompt="Write pytest tests for this function:\n\ndef add(a, b):\n    return a + b"
)
print(code)

## Different Models

Compare responses from different models:

In [None]:
models_to_try = ["llama3.1", "mistral", "qwen2"]
question = "What's the best way to handle errors in Python?"

for model_name in models_to_try:
    try:
        client = create_ollama_client(model=model_name)
        if client.is_model_available():
            print(f"\n{'='*50}")
            print(f"Model: {model_name}")
            print('='*50)
            response = client.chat(question)
            print(response[:500] + "..." if len(response) > 500 else response)
        else:
            print(f"\n{model_name}: Not installed")
    except Exception as e:
        print(f"\n{model_name}: Error - {e}")

## Tool Calling Details

See exactly what tools are available:

In [None]:
from aiskills.integrations import get_ollama_tools
import json

tools = get_ollama_tools()
print("Available tools for Ollama:\n")
for tool in tools:
    func = tool['function']
    print(f"Tool: {func['name']}")
    print(f"  Description: {func['description'][:80]}...")
    print(f"  Parameters: {list(func['parameters']['properties'].keys())}")
    print()

## Interactive Chat Loop

In [None]:
from aiskills.integrations import create_ollama_client

client = create_ollama_client(model="llama3.1")

print("Chat with Ollama + AI Skills (type 'quit' to exit)")
print("="*50)

while True:
    user_input = input("\nYou: ")
    if user_input.lower() in ['quit', 'exit', 'q']:
        break
    
    response = client.chat(user_input)
    print(f"\nAssistant: {response}")

## Performance Tips

1. **Use smaller models** for faster responses: `gemma2:2b`, `phi3:mini`
2. **Prompt injection** is faster than tool calling for simple queries
3. **Pre-load skills** if you know what you need

## Troubleshooting

- **Connection refused**: Ensure `ollama serve` is running
- **Model not found**: Run `ollama pull <model>`
- **Slow responses**: Try a smaller model or use `chat_with_skill()` for simpler flows