# üß© Mini-Lab: Local LLM Setup with Ollama

**Module 1: Setup & Working Style for LLM Apps** | **Duration: ~20 min** | **Type: Mini-Lab**

---

## Learning Objectives

By the end of this mini-lab, you will be able to:

1. **Understand** open vs closed models
2. **Install and configure** Ollama for local LLM inference
3. **Use** local models through the OpenAI-compatible API

## Target Concepts

| Concept | Description |
|---------|-------------|
| Open vs Closed Models | Understanding model access paradigms |
| Local LLMs | Running models locally for privacy and cost savings |
| Ollama | Easy-to-use local LLM deployment tool |

## 1. Open vs Closed Models

### üîì Open Source Models (Open Weights)

| Model | Provider | Parameters | Best For |
|-------|----------|------------|----------|
| **Llama 3.2** | Meta | 1B-70B | General purpose |
| **Mistral** | Mistral AI | 7B | Fast inference |
| **Phi-3** | Microsoft | 3.8B | Edge devices |
| **Qwen 2.5** | Alibaba | 0.5B-72B | Coding, math |

**Benefits:** ‚úÖ Run locally, ‚úÖ Free, ‚úÖ Privacy, ‚úÖ Customizable

### üîí Closed/Proprietary Models (API Only)

| Model | Provider | Best For |
|-------|----------|----------|
| **GPT-4o** | OpenAI | Multimodal, reasoning |
| **Claude 3** | Anthropic | Analysis, long context |
| **Gemini Pro** | Google | Multimodal |

**Benefits:** ‚úÖ Best quality, ‚úÖ Easy to use, ‚úÖ No hardware needed

## 2. Installing Ollama

Ollama makes running local LLMs easy - like Docker for AI models.

### Installation

**macOS:**
```bash
brew install ollama
```

**Windows:**
1. Download from [ollama.ai/download](https://ollama.ai/download)
2. Run the installer
3. Ollama runs as a background service

**Linux:**
```bash
curl -fsSL https://ollama.ai/install.sh | sh
```

### Start Ollama Server

On Windows, Ollama starts automatically. On Mac/Linux:
```bash
ollama serve
```

In [None]:
import subprocess
import requests

def check_ollama():
    """Check if Ollama is installed and running."""
    
    print("üîç Checking Ollama Installation")
    print("=" * 50)
    
    # Check CLI
    try:
        result = subprocess.run(["ollama", "--version"], capture_output=True, text=True, timeout=5)
        if result.returncode == 0:
            print(f"‚úÖ Ollama CLI: {result.stdout.strip()}")
        else:
            print("‚ùå Ollama CLI not found")
            return False
    except FileNotFoundError:
        print("‚ùå Ollama not installed - download from ollama.ai")
        return False
    except Exception as e:
        print(f"‚ö†Ô∏è Error: {e}")
    
    # Check server
    try:
        response = requests.get("http://localhost:11434/api/tags", timeout=5)
        if response.status_code == 200:
            print("‚úÖ Ollama server running on port 11434")
            models = response.json().get("models", [])
            if models:
                print(f"\nüì¶ Installed models:")
                for m in models:
                    size_gb = m.get("size", 0) / (1024**3)
                    print(f"   - {m['name']} ({size_gb:.1f} GB)")
            else:
                print("\nüì¶ No models installed yet")
            return True
    except requests.ConnectionError:
        print("‚ùå Ollama server not running")
        print("   Start with: ollama serve")
        return False
    
    return False

check_ollama()

## 3. Download a Model

### Recommended Models

| Model | Size | RAM | Best For |
|-------|------|-----|----------|
| `llama3.2:3b` | ~2GB | 4GB+ | Learning |
| `llama3.2:8b` | ~5GB | 8GB+ | General use |
| `qwen2.5-coder:7b` | ~4GB | 8GB+ | Coding |

### Download Command

Run in your terminal (not in Jupyter):

```bash
# Small model - good for testing
ollama pull llama3.2:3b

# Test it
ollama run llama3.2:3b "What is Python?"
```

In [None]:
def list_models():
    """List available Ollama models."""
    try:
        response = requests.get("http://localhost:11434/api/tags", timeout=5)
        if response.status_code == 200:
            return [m["name"] for m in response.json().get("models", [])]
    except:
        pass
    return []

models = list_models()
if models:
    print(f"‚úÖ {len(models)} model(s) available: {models}")
else:
    print("‚ö†Ô∏è No models installed")
    print("   Run: ollama pull llama3.2:3b")

## 4. Using Ollama with Python

Ollama provides an **OpenAI-compatible API** - same code, different backend!

In [None]:
from openai import OpenAI

# Create client pointing to Ollama
ollama_client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Required but not used
)

print("‚úÖ Ollama client initialized")
print("   Base URL: http://localhost:11434/v1")

In [None]:
def test_ollama(model_name: str = None):
    """Test a local model."""
    
    # Auto-select model
    if model_name is None:
        available = list_models()
        if not available:
            print("‚ùå No models available")
            print("   Run: ollama pull llama3.2:3b")
            return False
        model_name = available[0]
    
    print(f"\nüß™ Testing: {model_name}")
    print("=" * 50)
    
    try:
        response = ollama_client.chat.completions.create(
            model=model_name,
            messages=[{"role": "user", "content": "Say 'Hello from local LLM!' in exactly those words."}],
            max_tokens=30,
            temperature=0
        )
        
        print(f"‚úÖ Success!")
        print(f"üì§ Response: {response.choices[0].message.content}")
        return True
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        return False

test_ollama()

## 5. Switching Between Local and Cloud

Use the same code for both providers:

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI
from typing import Literal

load_dotenv()

def get_client(provider: Literal["openai", "ollama"] = "openai") -> tuple[OpenAI, str]:
    """Get LLM client for the specified provider."""
    
    if provider == "openai":
        client = OpenAI()  # Uses OPENAI_API_KEY
        model = "gpt-4o-mini"
    else:  # ollama
        client = OpenAI(
            base_url="http://localhost:11434/v1",
            api_key="ollama"
        )
        models = list_models()
        model = models[0] if models else "llama3.2:3b"
    
    return client, model


def ask(question: str, provider: str = "openai"):
    """Ask a question using specified provider."""
    
    client, model = get_client(provider)
    print(f"\nü§ñ Using: {provider} / {model}")
    
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": question}],
        max_tokens=100,
        temperature=0.7
    )
    
    print(f"üí¨ {response.choices[0].message.content}")
    return response.choices[0].message.content

In [None]:
# Test with OpenAI
ask("What is 2 + 2?", provider="openai")

In [None]:
# Test with Ollama (if available)
if list_models():
    ask("What is 2 + 2?", provider="ollama")
else:
    print("‚ö†Ô∏è No Ollama models - skipping")

## 6. When to Use Each

### Use **Local (Ollama)** When:
- üîí Processing sensitive data
- üí∞ High volume, simple tasks
- üåê Offline environments
- üß™ Learning & experimentation

### Use **Cloud (OpenAI)** When:
- üéØ Maximum quality needed
- üöÄ Production applications
- üì± Limited local hardware
- üìä Complex reasoning tasks

## üéØ Summary

### Key Takeaways

1. **Open vs Closed Models**
   - Open: Llama, Mistral, Phi - run locally, customize
   - Closed: GPT-4, Claude - API only, best quality

2. **Ollama Setup**
   - Install from ollama.ai
   - Pull models: `ollama pull llama3.2:3b`
   - OpenAI-compatible API on port 11434

3. **Python Integration**
   - Same `openai` library, different `base_url`
   - Easy to switch between local and cloud

### Setup Checklist

- [ ] Ollama installed and running
- [ ] At least one model downloaded
- [ ] Tested both OpenAI and Ollama

### Next Steps

- **lab-hello-llm**: Build a CLI that works with both providers