## Preflight Check

If this fails with 401, regenerate OpenRouter key and restart kernel.


In [1]:
from dotenv import load_dotenv
from openai import OpenAI
import os

load_dotenv(dotenv_path=".env", override=False)

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"].strip(),
    default_headers={
        "HTTP-Referer": "http://localhost:8888",
        "X-Title": "Dallas Agent Workshop",
    },
)

resp = client.chat.completions.create(
    model=os.getenv("OPENROUTER_MODEL", "arcee-ai/trinity-large-preview:free"),
    messages=[{"role": "user", "content": "Reply with exactly: MODEL WORKING"}],
)

print(resp.choices[0].message.content)


MODEL WORKING


# Dallas AI ‚Äî Hands-on Agent Building (LangGraph + OpenRouter)

This notebook is the main workshop surface:
- You will run the agent locally.
- The model is accessed via OpenRouter.
- The agent can generate Python code and execute it using a controlled tool.

**Goal:** experience a real *plan ‚Üí code ‚Üí execute ‚Üí fix* loop.


## Agenda

- 5:30 - Check-in, food, networking (30)
- 6:00 - Why Agents? (Motivation & Framing) (5)
- 6:05 - Core Concepts & Architectures (5)
- 6:10 - Setup & Environment (30)
- 6:40 - Live Code Walkthrough (20)
- 7:00 - Hands-On Build Session (40)
- 7:40 - Engineering Discipline for Agents (10)
- 7:50 - Next: Virtual sessions, submitting PRs (10)
- 8:00 - Curated Resources


## Why Agents? (Motivation & Framing)

An agent is a loop that can **plan**, **use tools**, and **iterate** toward a goal. In this workshop, the agent can:
- Write Python code
- Execute it in a controlled way
- Read the result (stdout/stderr) and try again

Why this matters: lots of real work is not a single prompt. It needs multi-step problem solving, verification, and guardrails.


## Core Concepts & Architectures

We will use a simple LangGraph workflow that looks like:

`plan -> exec -> (fix -> exec)* -> finish`

Key ideas:
- **State**: the data that flows between steps (task, generated code, last run result).
- **Tools**: the controlled actions the agent can take (here: running Python; for research: web search).
- **Guardrails**: constraints that keep the agent safe/reliable (timeouts, blocked imports, "must print" requirement).
- **Evaluation mindset**: make the agent produce observable outputs so you can debug quickly (stdout, logs, reproducible steps).

During the live walkthrough, we will inspect `agent_lib.py` and connect each node to what you see on screen.


## 0) Setup

1. Create a venv and install deps:
```bash
pip install -r requirements.txt
```
2. Copy `.env.example` to `.env` and set `OPENROUTER_API_KEY`.
3. Restart kernel after editing `.env`.


In [2]:
import os
os.environ["OPENROUTER_MODEL"] = "arcee-ai/trinity-large-preview:free"

k = os.environ["OPENROUTER_API_KEY"]
print("repr:", repr(k[-10:]))
print("endswith newline?", k.endswith("\n"))
print("has spaces?", (" " in k))
print("len raw:", len(k), "len strip:", len(k.strip()))

repr: 'e000c20e40'
endswith newline? False
has spaces? False
len raw: 73 len strip: 73


In [3]:
import os, requests
headers = {"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY'].strip()}"}
r = requests.get("https://openrouter.ai/api/v1/models", headers=headers, timeout=20)
print(r.status_code)
print(r.text[:300])


200
{"data":[{"id":"google/gemini-3.1-pro-preview","canonical_slug":"google/gemini-3.1-pro-preview-20260219","hugging_face_id":"","name":"Google: Gemini 3.1 Pro Preview","created":1771509627,"description":"Gemini 3.1 Pro Preview is Google‚Äôs frontier reasoning model, delivering enhanced software engineer


## 1) Sanity check: run the Python execution tool

This runs locally with timeouts and basic restrictions. It is **not** a hardened sandbox.


In [4]:
from tools import run_python

code = """
print('hello from tool')
print(sum([1,2,3]))
"""

run_python(code)


{'ok': True,
 'stdout': 'hello from tool\n6\n',
 'stderr': '',
 'exit_code': 0,
 'note': 'Execution policy: temporary working directory, time-limited, and blocks some risky imports/calls. This is NOT a hardened sandbox.'}

## 2) Run a single agent task

We will run one end-to-end task:
- Planner proposes solution + code
- Executor runs code
- If fails, Fixer patches and retries (up to 3 attempts)


In [5]:
from agent_lib import run_task

task = "Write a Python function to compute Fibonacci(n) efficiently and print Fibonacci(35)."
result = run_task(task)

result['last_run']


=== GENERATED CODE ===
# iterative approach to compute Fibonacci(35)
a, b = 0, 1
for _ in range(35):
    a, b = b, a + b
print(a)


{'ok': True,
 'stdout': '9227465\n',
 'stderr': '',
 'exit_code': 0,
 'note': 'Execution policy: temporary working directory, time-limited, and blocks some risky imports/calls. This is NOT a hardened sandbox.'}

## 3) Workshop exercises

Try the tasks below. You can also author your own.
Tip: keep tasks self-contained and offline.


In [6]:
tasks = [
    "Parse this CSV string and compute the average of the 'latency_ms' column:\n\nts,latency_ms\n1,120\n2,110\n3,130\n4,90\n",
    "Implement rolling z-score anomaly score for this list and print the top 3 most anomalous points: [10,11,9,10,10,200,11,10,9,10]",
    "Given a list of (user_id, event_time, event_type), compute per-user session counts (30-min gap) and print a dict."
]

for t in tasks:
    print('\n' + '='*80)
    print('TASK:', t)
    out = run_task(t)
    print('OK:', out['last_run']['ok'])
    print('STDOUT:\n', out['last_run']['stdout'])
    if not out['last_run']['ok']:
        print('STDERR:\n', out['last_run']['stderr'])



TASK: Parse this CSV string and compute the average of the 'latency_ms' column:

ts,latency_ms
1,120
2,110
3,130
4,90

=== GENERATED CODE ===
import csv
from io import StringIO

csv_string = """ts,latency_ms
1,120
2,110
3,130
4,90"""

f = StringIO(csv_string)
reader = csv.DictReader(f)
latencies = [int(row['latency_ms']) for row in reader]
average = sum(latencies) / len(latencies)
print(average)
OK: True
STDOUT:
 112.5


TASK: Implement rolling z-score anomaly score for this list and print the top 3 most anomalous points: [10,11,9,10,10,200,11,10,9,10]
=== GENERATED CODE ===
import numpy as np

data = [10, 11, 9, 10, 10, 200, 11, 10, 9, 10]
window_size = 3

z_scores = []
for i in range(window_size - 1, len(data)):
    window = data[i - window_size + 1:i + 1]
    mean = np.mean(window)
    std = np.std(window, ddof=1)  # Use sample std (ddof=1)
    if std == 0:
        z = 0
    else:
        z = (data[i] - mean) / std
    z_scores.append((i, data[i], z))

# Sort by absolute z-score in

## 3a) Advanced: tighten/loosen execution policy

In `tools.py`, you can change:
- timeout
- banned patterns

For meetup safety, keep it restrictive.


## 4) Applied Exercise: Research Agent

Unlike the code-execution agent, this agent:
- Plans multi-step searches
- Gathers information from the web via Tavily
- Synthesizes findings into a structured report

**Use case:** competitive intelligence, market research, due diligence


In [7]:
from research_agent import run_research

question = "What are the top 3 AI chip companies in 2024 and what's their competitive advantage?"

print(f"RESEARCH QUESTION:\n{question}\n")

result = run_research(question)

print("\n" + "="*60)
print("FINAL REPORT:")
print("="*60)
print(result["report"])

RESEARCH QUESTION:
What are the top 3 AI chip companies in 2024 and what's their competitive advantage?


PLANNED QUERIES:
  1. top AI chip companies 2024 competitive advantage
  2. leading AI chip manufacturers 2024 market position
  3. AI chip industry leaders 2024 technology comparison

üîç Searching: top AI chip companies 2024 competitive advantage
   ‚Üí Found 3 sources
üîç Searching: leading AI chip manufacturers 2024 market position
   ‚Üí Found 3 sources
üîç Searching: AI chip industry leaders 2024 technology comparison
   ‚Üí Found 3 sources

‚úì Collected 9 total sources


FINAL REPORT:
**Executive Summary**
NVIDIA, AMD, and Google (Alphabet) are the top three AI chip companies in 2024, each leveraging distinct competitive advantages. NVIDIA dominates with over 70% of AI semiconductor sales, AMD offers cost-effective solutions, and Google excels in custom AI chip development for its ecosystem.

**Key Findings**
- NVIDIA leads the AI chip market with over 70% of AI semicond

## 5) Engineering Discipline for Agents

Agents feel "magical" until they fail. The fastest way to make them reliable is good engineering hygiene:

- **State management**: write down what the agent knows (inputs/outputs) and pass it explicitly.
- **Context management**: keep prompts short and structured; include only what is necessary; summarize when needed.
- **Memory (optional)**: decide what should persist across runs (none vs. per-session vs. long-term).
- **Token budgeting**: constrain output formats; avoid dumping large logs or huge documents into the model.
- **Governance & safety**: limit tools and permissions; log tool calls; treat credentials carefully; assume untrusted outputs.

In this repo, we keep things workshop-safe by forcing observable stdout, logging generated code, adding timeouts, and blocking risky Python calls.


## 7) Next: Virtual sessions + Submitting PRs

Stretch goals (great for follow-up sessions):
- Add more tools (file I/O, data APIs) with careful safety boundaries
- Turn the workflow into multiple collaborating agents (planner + specialists)
- Add lightweight evaluations (golden tests, regression prompts, success criteria)

If you improve the workshop materials, please open a PR against this repo.


## 8) Curated Resources

- Prompting best practices (Claude): https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices
- LangGraph docs: https://langchain-ai.github.io/langgraph/
- OpenRouter docs: https://openrouter.ai/docs

Tip: when learning, keep a small set of repeatable test prompts (like `2+2`, Fibonacci, CSV parse) to validate changes quickly.


## 6) Optional: make it multi-agent

You can extend `agent_lib.py` into multiple agents:
- planner
- coder
- executor
- verifier

LangGraph makes these edges explicit.
