# Developer Recipe Cookbook 🍳 – Notebook 1: Core Concepts  
![Neuron Circuit](https://source.unsplash.com/featured/?neural,abstract)

> **What’s inside**  
> 1. Anatomy of a “reasoning token”  
> 2. Reasoning‑effort & inference‑time trade‑offs  
> 3. Scaling laws (o1 vs o4-mini)  
> 4. 🔑 Quick‑start helpers (OpenAI & Azure OpenAI)

### Deeper Reading  
- [OpenAI Docs – Reasoning best practices](https://platform.openai.com/docs/guides/reasoning-best-practices)  
- [Prompt engineering for o1 & o3‑mini](https://techcommunity.microsoft.com/blog/azure-ai-services-blog/prompt-engineering-for-openai%E2%80%99s-o1-and-o3-mini-reasoning-models/4374010)  
- Wei et al., *Chain‑of‑Thought Prompting* (2022)  


## Recipe 1 — Anatomy of a Reasoning Token
### 1️⃣ What we’ll do
Dissect a chat completion and locate tokens the model spends *thinking*.

In [1]:
from openai import OpenAI
client = OpenAI()

prompt = "Think step-by-step: prove that √2 is irrational."
resp = client.chat.completions.create(
    model="o4-mini",
    messages=[{"role":"user","content":prompt}],
    temperature=0.3,
    logprobs=True,  # 👈 exposes per-token probs
)
# Display first 20 reasoning tokens
for tok, prob in zip(resp.choices[0].logprobs.tokens[:20], resp.choices[0].logprobs.token_logprobs[:20]):
    print(f"{tok!r:>10}  {prob:.2f}")
print("Usage:", resp.usage)


PermissionDeniedError: Error code: 403 - {'error': {'message': 'You are not allowed to request logprobs from this model', 'type': 'invalid_request_error', 'param': None, 'code': None}}

**Take‑away**  – Reasoning tokens often correspond to logical connectives. 

**Exercise:** Set `temperature=1.0` and observe token count vs coherence.

## Recipe 2 — Effort × Time Trade‑off
### 1️⃣ What we’ll do
Benchmark o1 vs o4‑mini on the same task at different `max_tokens`.

In [None]:
import time, statistics, pandas as pd, itertools
models = ["o1", "o4-mini"]
rows=[]
for m in models:
    for mt in [128,256,512]:
        t0=time.time()
        r=client.chat.completions.create(
            model=m,
            messages=[{"role":"user","content":"Explain the Nash Equilibrium in 5 bullets"}],
            max_tokens=mt
        )
        dur=time.time()-t0
        rows.append((m,mt,dur,r.usage.total_tokens))
df=pd.DataFrame(rows,columns=["model","max_tokens","sec","tokens"])
print(df)


### 3️⃣ Take‑away
- Higher `max_tokens` increases latency non‑linearly.
- **Exercise:** Graph `sec` vs `tokens` for both models.

## Recipe 3 — Scaling‑Laws Playground
Run the benchmark above across `temperature ∈ {0.1,0.5,0.9}` and plot accuracy vs cost.

In [None]:
# …placeholder for detailed experiment…

## 🔑 Quick‑start Helpers (importable)

In [None]:
def pretty_chat_openai(model:str, prompt:str, **kwargs):
    """Call OpenAI and pretty‑print tokens + response"""
    from openai import OpenAI
    c = OpenAI()
    r = c.chat.completions.create(
        model=model,
        messages=[{"role":"user","content":prompt}],
        **kwargs
    )
    print("🔢 Total tokens:", r.usage.total_tokens)
    print("📝 Reply:\n", r.choices[0].message.content)
    return r

def pretty_chat_azure(model:str, prompt:str, **kwargs):
    """Call Azure OpenAI and pretty‑print tokens + response"""
    from openai import AzureOpenAI
    c = AzureOpenAI(
        api_key = "AZURE_OPENAI_KEY",
        azure_endpoint = "https://<your-resource>.openai.azure.com/"
    )
    r = c.chat.completions.create(
        model=model,
        messages=[{"role":"user","content":prompt}],
        **kwargs
    )
    print("🔢 Total tokens:", r.usage.total_tokens)
    print("📝 Reply:\n", r.choices[0].message.content)
    return r
