In [9]:
import os
from openai import OpenAI
import json

# Initialize the OpenAI client
# Ensure your API key is set in your environment
client = OpenAI()

print("Client initialized successfully.")

Client initialized successfully.


## The Test Problem

We'll use a classic "back-of-the-envelope" estimation puzzle:

> **"How many piano tuners are there in Chicago?"**

This type of question has no obvious answer you can just look up. It requires breaking the problem into smaller, logical steps—exactly the kind of task where deeper reasoning should help. It's also a great way to see how different reasoning levels affect the quality and detail of the answer.

In [10]:
# A slightly complex reasoning task
prompt = "Estimate the number of piano tuners in Chicago based on first principles. Use Fermi estimation steps."

## 1. Deliberation Control: `reasoning.effort`
**Goal:** Control how much "thinking" the model does before answering.

**Parameters:** `reasoning={"effort": "low" | "medium" | "high"}`

We will compare "low" effort vs "high" effort.

In [11]:
# Helper function to print results clearly
def print_output(title, response):
    print(f"\n{'='*60}")
    print(f"  {title}")
    print(f"{'='*60}")
    print(f"Reasoning Tokens Used: {response.usage.output_tokens_details.reasoning_tokens}")
    print(f"Total Output Tokens: {response.usage.output_tokens}")
    print(f"\nFull Answer:")
    print("-"*60)
    print(response.output_text)
    print("-"*60)

print("Helper function defined.")

Helper function defined.


### 1A. Low Reasoning Effort

In [12]:
# 1A. Low Effort
print("Requesting Low Effort...")
response_low = client.responses.create(
    model="gpt-5.2",
    input=[{"role": "user", "content": prompt}],
    reasoning={"effort": "low"}
)
print_output("Low Effort Result", response_low)

Requesting Low Effort...

  Low Effort Result
Reasoning Tokens Used: 0
Total Output Tokens: 627

Full Answer:
------------------------------------------------------------
### Goal  
Estimate the number of **piano tuners in Chicago** using a **Fermi (first-principles) approach**.

---

## 1) Population of Chicago
Use city proper (not metro), since the question says “in Chicago.”

- Chicago population ≈ **2.7 million** people

Assume average household size:

- ≈ **2.5 people/household**  
- Households ≈ 2.7M / 2.5 ≈ **1.1 million households**

---

## 2) How many pianos are there?
We estimate the fraction of households with a piano.

- Many homes don’t have one; some schools/churches/venues do.
- Assume **~5% of households** have a piano (upright or grand).  
  - Pianos in households ≈ 1.1M × 0.05 ≈ **55,000**

Now add **institutions** (schools, churches, universities, venues, studios).

A rough institutional estimate:
- Suppose there are on the order of **a few thousand** institutions t

### 1B. High Reasoning Effort

In [13]:
# 1B. High Effort
print("Requesting High Effort...")
response_high = client.responses.create(
    model="gpt-5.2",
    input=[{"role": "user", "content": prompt}],
    reasoning={"effort": "high"}
)
print_output("High Effort Result", response_high)

Requesting High Effort...

  High Effort Result
Reasoning Tokens Used: 484
Total Output Tokens: 1145

Full Answer:
------------------------------------------------------------
### Goal
Estimate the number of **full‑time equivalent (FTE) piano tuners serving Chicago** using a Fermi approach.

I’ll interpret “Chicago” as **city proper** (not the whole metro). The result will be order‑of‑magnitude.

---

## 1) How many pianos are in Chicago?

**Population / households**
- Chicago population ≈ **2.7 million**
- Average household size ≈ **2.4**
- Households ≈ 2.7M / 2.4 ≈ **1.1 million households**

**Fraction of households with a piano**
- Many households don’t; some do (kids’ lessons, hobbyists, inherited uprights).
- Assume **~3%** of households have a piano (ballpark 1–5%).
- Household pianos ≈ 1.1M × 0.03 ≈ **33,000 pianos**

**Non-household pianos (institutions/businesses)**
Schools, churches, universities, rehearsal studios, venues, hotels, etc.
- As a rough add-on, assume institutio

## 2. Stopping Controls: `max_output_tokens`
**Goal:** Prevent runaway costs or "overthinking" by setting a hard budget on total output tokens.

**Parameter:** `max_output_tokens=500` 

This caps the *total* output token budget (reasoning + visible answer). When paired with high effort, if the model spends too many tokens reasoning, the visible answer may be truncated.

In [14]:
# 2. Strict Output Budget
# We set a total output token limit (reasoning + visible answer)
output_budget = 500

print(f"Requesting High Effort with Output Budget ({output_budget} tokens)...")

response_budget = client.responses.create(
    model="gpt-5.2",
    input=[{"role": "user", "content": prompt}],
    reasoning={"effort": "high"},
    max_output_tokens=output_budget
)

print_output("Budget Constrained Result", response_budget)

# Check token usage
reasoning_used = response_budget.usage.output_tokens_details.reasoning_tokens
total_output = response_budget.usage.output_tokens
visible_tokens = total_output - reasoning_used

print(f"\n[INFO] Budget: {output_budget} tokens")
print(f"  - Reasoning tokens: {reasoning_used}")
print(f"  - Visible answer tokens: {visible_tokens}")
print(f"  - Total output: {total_output}")

if total_output >= output_budget:
    print("[!] Output may have been truncated due to budget constraint.")

Requesting High Effort with Output Budget (500 tokens)...

  Budget Constrained Result
Reasoning Tokens Used: 500
Total Output Tokens: 500

Full Answer:
------------------------------------------------------------

------------------------------------------------------------

[INFO] Budget: 500 tokens
  - Reasoning tokens: 500
  - Visible answer tokens: 0
  - Total output: 500
[!] Output may have been truncated due to budget constraint.


## 3. Output Shaping via Prompt Engineering
**Goal:** Control what the user actually sees, regardless of the internal reasoning depth.

**Technique:** Since the Responses API doesn't support `response_format`, we use explicit prompt instructions to shape the output into JSON.

Even if the model thinks for thousands of tokens, we can ask it to deliver a clean JSON object.

In [15]:
# 3. Structured Output (via Prompt Engineering)
# Since the 'response_format' parameter is not currently supported by the client.responses.create method 
# in this environment, we will use explicit prompt instructions to shape the output into JSON.

json_instruction = """
Output the final answer as a valid JSON object with the following keys:
- population_chicago (integer)
- households_with_pianos (number, decimal % estimate)
- tuning_frequency_per_year (number)
- total_tunings_needed (integer)
- tunings_per_tuner_per_year (integer)
- estimated_tuners (integer)

Do not include markdown formatting (like ```json). Just the raw JSON string.
"""

print("\nRequesting High Effort with Output Shaping (JSON via Prompt)...")

# Append instructions to the user prompt
shaped_prompt = prompt + "\n\n" + json_instruction

response_shaped = client.responses.create(
    model="gpt-5.2",
    input=[{"role": "user", "content": shaped_prompt}],
    reasoning={"effort": "high"}
)

print(f"\n--- Shaped Output (High Reasoning, JSON View) ---")
print(f"Reasoning Tokens (Hidden): {response_shaped.usage.output_tokens_details.reasoning_tokens}")
print(f"Visible Answer:\n{response_shaped.output_text}")

# Verify it parses as JSON
try:
    data = json.loads(response_shaped.output_text)
    print("\n[SUCCESS] Output is valid JSON.")
except json.JSONDecodeError:
    print("\n[WARNING] Output is not valid JSON (Prompt shaping is less strict than response_format).")


Requesting High Effort with Output Shaping (JSON via Prompt)...

--- Shaped Output (High Reasoning, JSON View) ---
Reasoning Tokens (Hidden): 1013
Visible Answer:
{
  "population_chicago": 2700000,
  "households_with_pianos": 0.05,
  "tuning_frequency_per_year": 1.0,
  "total_tunings_needed": 54000,
  "tunings_per_tuner_per_year": 500,
  "estimated_tuners": 108
}

[SUCCESS] Output is valid JSON.
