# Lab 5 — Tool-Calling & Structured Output

**Module reference:** [Module 3, §3.6](https://github.com/kunalsuri/prompt-engineering-playbook/blob/main/learn/03-patterns.md) — Constrained Output  
[Module 5, §5.4](https://github.com/kunalsuri/prompt-engineering-playbook/blob/main/learn/05-advanced-patterns.md) — Evaluation Pipelines

This lab compares two strategies for getting reliably structured JSON output from LLMs:

1. **JSON-Mode Prompting** — natural-language instructions to emit JSON; validate with `json.loads()`
2. **Function-Calling (Tool-Calling)** — typed function schema enforced at the API level

**What you'll measure:** valid-JSON rate, field completeness, and consistency across runs.

---
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kunalsuri/prompt-engineering-playbook/blob/main/learn/labs/lab_05_tool_calling.ipynb)

## Setup — Install dependencies & set API key

**Free providers (no credit card required):**
- [Google Gemini](https://aistudio.google.com/apikey) — 15 RPM, 1M tokens/day ⭐
- [Groq](https://console.groq.com) — 30 RPM free tier

> **Note on tool-calling:** For the function-calling strategy, OpenAI or Google Gemini are recommended.
> Groq supports tool-calling on select models (llama3, mixtral).

In [None]:
# Install dependencies (first run only)
%pip install -q openai python-dotenv

import os
# Uncomment ONE of the following and paste your API key:
# os.environ['GOOGLE_API_KEY'] = 'your-key-here'   # Free at aistudio.google.com/apikey
# os.environ['GROQ_API_KEY']   = 'your-key-here'   # Free at console.groq.com
# os.environ['OPENAI_API_KEY'] = 'your-key-here'

key = os.getenv('GOOGLE_API_KEY') or os.getenv('GROQ_API_KEY') or os.getenv('OPENAI_API_KEY')
print(f'✓ API key found ({len(key)} chars)' if key else '⚠ No API key found. Set one above.')

In [None]:
# Download lab files if running in Colab
import os
if not os.path.exists('lab_utils.py'):
    base = 'https://raw.githubusercontent.com/kunalsuri/prompt-engineering-playbook/main/learn/labs/'
    !wget -q {base}lab_utils.py
    !wget -q {base}lab_05_tool_calling.py
    !wget -q {base}requirements.txt
    %pip install -q -r requirements.txt
    print('Lab files downloaded.')
else:
    print('Lab files already present.')

## Run the Experiment

The experiment tests both strategies on 5 unstructured product descriptions.
Results appear in a comparison table showing valid-JSON rate and field completeness.

In [None]:
from lab_05_tool_calling import run_experiment
run_experiment()

## Explore Individual Strategies

In [None]:
import json
from lab_utils import get_client
from lab_05_tool_calling import run_json_mode, run_tool_calling, PRODUCT_DESCRIPTIONS

client = get_client()
description = PRODUCT_DESCRIPTIONS[0]

print('Testing JSON-mode on:', description[:60], '...\n')
results = run_json_mode(description, client)
for i, r in enumerate(results, 1):
    if r['success']:
        print(f'Run {i}: ✓  →  {json.dumps(r["data"], indent=2)}')
    else:
        print(f'Run {i}: ✗  →  Invalid JSON: {r["error"]}')

## Reflection Questions

1. Which strategy had a higher valid-JSON rate on your run? Was the difference significant?
2. Did the model wrap output in markdown fences despite instructions? How did the JSON-mode code handle that?
3. Under what conditions would you choose JSON-mode over tool-calling in a production system?
4. How does constraining `key_specs` to be an `array` change model behavior compared to a plain `string` field?

**See also:** Module 3 §3.6 (Constrained Output) and [failure-gallery case 04](../failure-gallery/04-ambiguous-format/) for the failure this lab fixes.