<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/112_AI_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Install dependencies

In [1]:
!pip -q install anthropic python-dotenv rich

[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/297.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[91m‚ï∏[0m [32m297.0/297.2 kB[0m [31m16.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m297.2/297.2 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25h

## 2. Load API key

In [2]:
import os
from dotenv import load_dotenv
from anthropic import Anthropic, APIError
from rich.console import Console
from rich.markdown import Markdown

# Adjust path to your secrets file
load_dotenv("/content/API_KEYS.env")

anthropic_key = os.getenv("ANTHROPIC_API_KEY")
if not anthropic_key:
    raise RuntimeError("Missing ANTHROPIC_API_KEY in /content/API_KEYS.env")

print("‚úÖ Anthropic key loaded")

console = Console()
client = Anthropic(api_key=anthropic_key)

# Default to Claude 3.5 Haiku for speed & low cost
MODEL_NAME = os.environ.get("CLAUDE_MODEL", "claude-3-5-haiku-latest")

‚úÖ Anthropic key loaded


## 3. Helper Functions

In [24]:
import textwrap

def smart_print_markdown(output: str, width: int = 100):
    """Wrap plain text, preserve code fences."""
    in_code = False
    buf = []
    for line in output.splitlines():
        if line.strip().startswith("```"):
            # flush any wrapped text before toggling code mode
            if buf:
                print(textwrap.fill(" ".join(buf), width=width, replace_whitespace=False))
                print()
                buf = []
            print(line)
            in_code = not in_code
            continue
        if in_code:
            print(line)
        else:
            # collect non-code lines to wrap as paragraphs
            if line.strip() == "":
                if buf:
                    print(textwrap.fill(" ".join(buf), width=width, replace_whitespace=False))
                    print()
                    buf = []
            else:
                buf.append(line)
    if buf:
        print(textwrap.fill(" ".join(buf), width=width, replace_whitespace=False))
        print()

def ask_claude(prompt: str, system: str = "You are a helpful coding assistant.",
               render: str = "markdown",  # 'markdown' | 'wrapped' | 'none'
               return_text: bool = False) -> str | None:
    if not anthropic_key:
        raise RuntimeError("Missing ANTHROPIC_API_KEY.")
    msg = client.messages.create(
        model=MODEL_NAME,
        max_tokens=1000,
        temperature=0.2,
        system=system,
        messages=[{"role": "user", "content": prompt}],
    )
    parts = [b.text for b in msg.content if getattr(b, "type", None) == "text"]
    output = "\n\n".join(parts).strip() or "(No text)"

    if render == "markdown":
        console.print(Markdown(output))
    elif render == "wrapped":
        smart_print_markdown(output)
    # render == 'none' skips printing

    return output if return_text else None

conversation = []

import textwrap
from rich.console import Console
from rich.markdown import Markdown

console = Console()

def smart_print_markdown(output: str, width: int = 100):
    """
    Wrap plain text, preserve fenced code blocks.
    """
    in_code = False
    para_buf = []

    def flush_paragraph():
        if para_buf:
            text = " ".join(para_buf)
            print(textwrap.fill(text, width=width, replace_whitespace=False))
            print()
            para_buf.clear()

    for line in output.splitlines():
        fence = line.strip().startswith("```")
        if fence:
            # Finish any pending wrapped paragraph before toggling code
            flush_paragraph()
            print(line)
            in_code = not in_code
            continue

        if in_code:
            # Inside code block -> print verbatim
            print(line)
        else:
            # Outside code block -> buffer/wrap paragraphs
            if line.strip() == "":
                flush_paragraph()
            else:
                para_buf.append(line)

    flush_paragraph()

def chat_with_claude(
    prompt: str,
    system: str = "You are a helpful coding assistant.",
    render: str = "markdown",      # 'markdown' | 'wrapped' | 'none'
    return_text: bool = False,
    wrap_width: int = 100,
) -> str | None:
    """
    Send a prompt with conversation memory.
    - render='markdown'  -> pretty Markdown rendering (code blocks look great)
    - render='wrapped'   -> wrap only plain text, preserve code fences
    - render='none'      -> print nothing (use return_text=True if you need the string)
    """
    if not anthropic_key:
        raise RuntimeError("Missing ANTHROPIC_API_KEY.")

    conversation.append({"role": "user", "content": prompt})

    try:
        msg = client.messages.create(
            model=MODEL_NAME,
            max_tokens=3000,
            temperature=0.2,
            system=system,
            messages=conversation,
        )
        parts = [b.text for b in msg.content if getattr(b, "type", None) == "text"]
        output = "\n\n".join(parts).strip() or "(No text)"

        if render == "markdown":
            console.print(Markdown(output))
        elif render == "wrapped":
            smart_print_markdown(output, width=wrap_width)
        # render == 'none' -> no printing

        conversation.append({"role": "assistant", "content": output})
        return output if return_text else None

    except APIError as e:
        print("Anthropic API error:", e)
        raise

# Optional helpers
def reset_conversation():
    conversation.clear()

def last_reply() -> str | None:
    for m in reversed(conversation):
        if m["role"] == "assistant":
            return m["content"]
    return None




# Exercise: AI Evaluation of Code & Feature Implementations

## Prerequisites

* ‚úÖ Completed Tutorial 2.2 (Best-of-N Pattern implementation)
* ‚úÖ Three different implementations of the data export feature in separate git branches
* ‚úÖ Basic understanding of code evaluation principles
* ‚úÖ Claude Code installed and working

---

## Part 1: Why Code Evaluation Matters for AI Development

### üß† The Evaluation Challenge

When working with **AI labor**, you'll often have **multiple working solutions** to choose from.

Unlike traditional development (where only one version is built due to time constraints), AI‚Äôs speed enables exploring multiple alternatives.

But this creates a new challenge:

> **How do you systematically evaluate and compare different implementations?**

---

### üîÅ Traditional vs. AI Development Evaluation

| Traditional Development           | AI Development                                |
| --------------------------------- | --------------------------------------------- |
| Evaluate one solution             | Evaluate **multiple working implementations** |
| Based on experience and intuition | **Compare real code**, not theoretical ideas  |
| Hard to test different approaches | **Test actual user experiences side-by-side** |
| High cost to change direction     | **Make data-driven architectural decisions**  |

---

### üéØ What We're Evaluating

In this tutorial, we will evaluate **three different implementations of the same feature** across multiple dimensions:

* Code structure and maintainability
* UI/UX quality
* Technical architecture
* Libraries and frameworks used
* Export logic and flexibility
* Error handling and edge-case coverage

---

## Part 2: Setting Up for Systematic Evaluation

### ‚úÖ Step 1: Verify Your Implementations

Make sure you have all 3 versions locally:

```bash
cd expense-tracker-ai
git branch -a
```

You should see:

* `main`
* `feature-data-export-v1` ‚Üí *Simple CSV export*
* `feature-data-export-v2` ‚Üí *Advanced export with options*
* `feature-data-export-v3` ‚Üí *Cloud integration features*

---

### üöÄ Step 2: Launch Claude Code

Run the Claude CLI:

```bash
claude
```

---

### üìä Step 3: Create & Execute the Evaluation

We‚Äôll now ask Claude to:

1. **Switch between each branch**
2. **Analyze the implementation** based on:

   * Architecture
   * UI/UX patterns
   * Library usage
   * Code clarity and scalability
3. **Document the findings** in a structured comparison table

This process will help us understand:

* **How** each version works
* **What trade-offs** were made
* **Which design choices** are most valuable
* Whether to **adopt one version**, or **synthesize** the best elements from all



In [5]:
!pwd

/content


To simulate and manage a simple project like `expense-tracker-ai` inside a **Colab notebook**, you can use shell commands prefixed with `!` to interact with the command line (bash). Here's a step-by-step guide to:

1. Set up the project folder structure
2. Initialize Git
3. Create placeholder files
4. Make the first commit

---

### ‚úÖ Step 1: Create the Project Directory

In a **Google Colab** notebook:

---

### ‚úÖ `!` is for **shell commands**

It tells Colab to run the line as if it were in a terminal (bash shell). For example:

* `!mkdir folder` ‚Äì creates a directory
* `!ls` ‚Äì lists files
* `!git init` ‚Äì initializes a Git repo

Think of `!` as saying: ‚ÄúRun this command like I would in a terminal.‚Äù

---

### ‚úÖ `%` is for **magic commands**

Magic commands are built into the Jupyter/Colab environment itself.

* `%cd /path/to/dir` ‚Äì changes the current working directory **for the notebook kernel**
* `%time` ‚Äì measures execution time
* `%matplotlib inline` ‚Äì used to show plots inside notebook

The `%cd` magic is necessary because just using `!cd /some/dir` won‚Äôt actually **change the working directory of the notebook kernel** ‚Äî it would only change it temporarily inside that single shell command.

---

### üî• Example to illustrate:

```python
!cd /content
!mkdir test
!cd test
!pwd  # <-- You're still in /content, not /content/test
```

Now compare with:

```python
%cd /content
!mkdir test
%cd test
!pwd  # <-- Now you're inside /content/test
```

---

### üìå In short:

| Symbol | Purpose            | Affects Notebook State?        |
| ------ | ------------------ | ------------------------------ |
| `!`    | Runs shell command | ‚ùå (only inside that line)      |
| `%`    | Runs magic command | ‚úÖ (can change kernel behavior) |



In [8]:
# Create the main project directory
!mkdir -p /content/expense-tracker-ai

# Navigate into it (for future operations)
%cd /content/expense-tracker-ai

/content/expense-tracker-ai


In [9]:
!pwd

/content/expense-tracker-ai


### ‚úÖ Step 2: Initialize a Git Repository

```python
# Initialize Git repo
!git init
```



In [10]:
!git init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/expense-tracker-ai/.git/


### ‚úÖ If You Want to Rename the Branch to main (Recommended)

Most teams now use main as the standard default branch. Here's how to rename. The **capital `-M`** does make a difference in Git.

---

### üî§ `-m` vs `-M` in `git branch`

| Option | Meaning                          | Behavior                                        |
| ------ | -------------------------------- | ----------------------------------------------- |
| `-m`   | **Move (rename)** a branch       | Fails if the new branch name already exists     |
| `-M`   | **Force move** (rename) a branch | Overwrites an existing branch with the new name |

---

### üß† In Practice

* If you're **renaming a branch to a new, unused name** (like `main` in your case), both `-m` and `-M` will work exactly the same.
* If there's **already a `main` branch** and you want to **replace** it with the current branch, then you'd need `-M`.

---

### ‚úÖ Summary

For your use case in Colab:

```bash
!git branch -M main
```

is totally safe and future-proof ‚Äî it will rename to `main`, even if one already existed (which it doesn't right now).



In [11]:
!git branch -M main


### ‚úÖ Step 3: Make the Initial Commit

```python
!git add .
!git commit -m "Initial expense tracker implementation"
!git status
```



In [12]:
!git add .
!git commit -m "Initial expense tracker implementation"
!git status

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@d60f460db5ae.(none)')
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)


In [13]:
!git config --global user.name "Micah Shull"
!git config --global user.email "micahshull.datascientist@gmail.com"

In [14]:
!git status

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)


In [15]:
!git checkout -b feature-data-export-v1

Switched to a new branch 'feature-data-export-v1'


In [17]:
chat = '''
I want to add data export functionality to my expense tracker. For this first version, implement a SIMPLE approach.

VERSION CONTROL:
- Before you start, create a new branch called "feature-data-export-v1"
- Make all your changes in this branch
- Commit your changes when complete

VERSION 1 REQUIREMENTS:
- Add an "Export Data" button to the main dashboard
- When clicked, export all expenses as a CSV file
- Include columns: Date, Category, Amount, Description
- Use a simple, straightforward implementation
- Keep the UI minimal - just a button that triggers the download

IMPLEMENTATION APPROACH:
Focus on simplicity and getting it working quickly. Don't overthink the user experience - just make it functional. Use standard browser APIs for file download.

PROCESS:
1. Create and checkout the new branch "feature-data-export-v1"
2. Implement the CSV export functionality
3. Add the export button to the dashboard
4. Test that it works correctly
5. Commit your changes with a descriptive message

Remember: This is Version 1 of 3 - keep it simple and functional.
'''

chat_with_claude(chat)

In [18]:
!git checkout -b feature-data-export-v1

Switched to a new branch 'feature-data-export-v1'


In [19]:
!git add .
!git commit -m "Add CSV export functionality to expense tracker dashboard"

On branch feature-data-export-v1

Initial commit

nothing to commit (create/copy files and use "git add" to track)




### ‚úÖ **Files You Should Create (based on Claude‚Äôs output)**

1. `Dashboard.js` ‚Äì contains the **main UI** and the **export button**
2. `context/ExpenseContext.js` ‚Äì provides `useExpenses()` hook
3. (Optional) `App.js` ‚Äì if your app uses a main entry point for routing/layout

---

### üóÇÔ∏è Recommended Folder Structure

```bash
/content/expense-tracker-ai
‚îú‚îÄ‚îÄ components/
‚îÇ   ‚îî‚îÄ‚îÄ Dashboard.js
‚îú‚îÄ‚îÄ context/
‚îÇ   ‚îî‚îÄ‚îÄ ExpenseContext.js
‚îú‚îÄ‚îÄ App.js
‚îú‚îÄ‚îÄ README.md
```

---

### üìå Next Steps in Colab

Run the following to create the folders and files Claude is referencing:

```python
# Create necessary folders
!mkdir -p components
!mkdir -p context

# Create placeholder files
!touch components/Dashboard.js
!touch context/ExpenseContext.js
!touch App.js
!touch README.md
```

Once those are in place, re-run your Claude prompt. It will now recognize the structure and be able to properly insert its implementation.



In [21]:
!mkdir -p components
!mkdir -p context
!touch components/Dashboard.js
!touch context/ExpenseContext.js
!touch App.js
!touch README.md

In [25]:
chat_with_claude(chat)

In [26]:
!git checkout -b feature-data-export-v1

Switched to a new branch 'feature-data-export-v1'


In [27]:
!git add .

In [28]:
!git commit -m "Add simple CSV export funcionality for expenses"

[feature-data-export-v1 (root-commit) f6d6ee3] Add simple CSV export funcionality for expenses
 4 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 App.js
 create mode 100644 README.md
 create mode 100644 components/Dashboard.js
 create mode 100644 context/ExpenseContext.js


In [29]:
!git push -u origin feature-data-export-v1

fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
