# When Data Processing Needs Knowledge

## The Problem: When Regex Fails

How do you programmatically know these headlines are about the same area?

- "Pacific Palisades evacuated due to fires"  
- "Los Angeles under emergency status"

**Traditional approach fails:**

In [None]:
# Traditional regex approach - doesn't work!
$headline1 = "Pacific Palisades evacuated due to fires"
$headline2 = "Los Angeles under emergency status"

# This fails because there's no pattern that connects them
$headline1 -match "\bLA\b"
$headline1 -match "Los Angeles"
$headline1 -match "\bAngeles\b"
$headline2 -match "Pacific Palisades"
$headlineb -match "90210"

### The Magic: AI with Structured Output

Let's ask our local model a simple true/false question using **structured output**:

In [None]:
# Setup - make sure Ollama is running with: ollama serve
# and you've pulled the model: ollama pull llama3.1

# Demo 1: Simple true/false question
$question = "Is Pacific Palisades part of Los Angeles?"

# Define schema for true/false response
$schema = @{
    type = "object"
    properties = @{
        answer = @{
            type = "boolean"
        }
    }
    required = @("answer")
}

Now make the REST API call (Yes! AI tools are just a bunch of JSON)

In [None]:
# Make the API call
$body = @{
    model = "llama3.1"
    messages = @(
        @{
            role = "user"
            content = $question
        }
    )
    stream = $false
    format = $schema
} | ConvertTo-Json -Depth 3

$response = Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post -Body $body

# Parse the structured response
$answerData = $response.message.content | ConvertFrom-Json

"Question: $question"
"Answer: $($answerData.answer)"

Clean, structured data we can work with 🎉

### Real-World Example: MP3 Filename Cleaning

Let's see structured output tackle a practical problem - cleaning messy MP3 filenames using **world knowledge**:

In [None]:
# Sample messy MP3 filenames
$messyMp3Files = @(
    "01_bohrhap_queen.mp3",
    "material_girl-madonna85.mp3",
    "hotel_cali_eagles1976.mp3",
    "IMAGINE-J-LENNON-track2.mp3",
    "hey_jude_(beetles)_1968_.mp3",
    "billiejean_MJ_thriller.mp3",
    "sweet_child_of_mine_gnr87.mp3",
    "shake_it_off-taylorswift.mp3",
    "purple-haze-jimmy_hendrix_1967.mp3",
    "bohemian(queen)rhaps.mp3",
    "smells_like_teen_spirit_nirvana91.mp3",
    "halo_beyonce_2008.mp3"
)

# Define the schema for clean artist/song extraction
$schema = @{
    type = "object"
    properties = @{
        artist = @{ type = "string" }
        song = @{ type = "string" }
    }
    required = @("artist", "song")
}

# The prompt that makes all the difference - includes examples!
$prompt = @"
You are an AI that extracts artist and song names from messy MP3 filenames.

Examples:
1. "hotel_cali_eagles1976.mp3" → {"artist": "Eagles", "song": "Hotel California"}
2. "rolling_in_the_deep-adele_2011.mp3" → {"artist": "Adele", "song": "Rolling in the Deep"}
3. "californication-RHCP.mp3" → {"artist": "Red Hot Chili Peppers", "song": "Californication"}

Now, extract from this filename:
"@

# Process each file individually ("asking tiny questions")
foreach ($file in $messyMp3Files) {
    # Create the message for LLM
    $msg = "$prompt $file"

    # Create the payload with the schema object
    $body = @{
        model = "llama3.1"
        messages = @(@{role="user"; content=$msg})
        stream = $false
        format = $schema
    } | ConvertTo-Json -Depth 6 -Compress

    # Call the local LLM API
    $response = Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post -Body $body
    $info = $response.message.content | ConvertFrom-Json

    # Store result as PowerShell object
    [pscustomobject]@{
        Old = $file
        New = "$($info.artist) - $($info.song).mp3"
    }
}

### Performance Reality Check

**Hardware matters A LOT with local models:**

### Why This Matters for PowerShell Automation

## 🆚 Traditional vs AI-Enhanced Data Processing

| Challenge                | Traditional Regex Approach | Local LLM Approach                 |
| ------------------------ | -------------------------- | ---------------------------------- |
| Geographic relationships | ❌ Impossible               | ✅ Knows Pacific Palisades ∈ LA     |
| Misspelled data          | ❌ Brittle rules            | ✅ Fixes "beetles" → "Beatles"      |
| Abbreviations            | ❌ Endless patterns         | ✅ Expands "MJ" → "Michael Jackson" |
| Context understanding    | ❌ No context               | ✅ Understands intent               |
| Domain knowledge         | ❌ Requires databases       | ✅ Built-in world knowledge         |

## 🔥 Benefits of Local LLMs

* 🌐 No internet required
* 💰 No API costs
* 🔒 Complete privacy
* 🎯 Perfect for batch processing
* 🧠 Built-in world knowledge
* 📊 Returns clean PowerShell objects

### Model Recommendations

**Popular local models to try:**


| Model           | Parameters | Description |
|---------------|------------|-------------|
| **LLaMA (Meta)** | 7B, 13B, 70B | My favorite. The 70B variant achieves excellent factual accuracy but needs serious hardware. The 7B/13B versions are more accessible but may hallucinate more. Solid instruction-following capabilities. |
| **Mistral 7B** | 7B | Highly efficient, outperforming many larger models. Notable for lower hallucination rates, making it more reliable for factual queries. |
| **Gemma (Google)** | 2B, 7B | 2B variant designed for CPU/mobile use. The instruction-tuned 7B model competes well with other 7B models. Known for a friendly style and structured output using bullet points and Markdown. |

### Key Takeaways

**🎯 When to use Local LLMs for PowerShell automation:**

1. **Beyond Regex**: When you need world knowledge, not patterns
2. **Privacy Matters**: Sensitive data stays local
3. **Batch Processing**: Perfect for overnight jobs
4. **Cost Control**: No per-request API fees
5. **Offline Capable**: No internet dependency
6. **PowerShell Objects**: Clean, pipeable data

**💡 Remember: Ask tiny questions, get reliable answers!**

---

### Next Steps

- Try this notebook with your own messy data
- Experiment with different models in Ollama
- Check out the full blog series: **AI Integration for Automation Engineers**
- Consider local LLMs for your next PowerShell automation project!

**Questions? Let's chat after the session!** 🎤