# üèÄ Basketball Scouting Prompt Distillation (Notebook 02)

This notebook is part of the **Tinker Hello World** project.

**High-level goal:**

Turn raw scouting notes about college basketball players into a **structured JSON scouting report**, then later distill that behavior into a **small, cheap specialist model** using Tinker.

We‚Äôll use a large model (teacher) to:

1. Read messy scouting input (stats + notes)
2. Follow a strict schema
3. Output clean JSON reports

In later notebooks, we‚Äôll:

- Use Tinker to distill this behavior into a smaller LoRA student model
- Plug that student into our broader basketball analytics + consulting workflows

## üó∫Ô∏è Notebook Roadmap

In this notebook we will:

1. **Check the environment & imports**
   - Confirm Python, Tinker SDK, Transformers, and OpenAI are available.
   - Load API keys from `.env`.

2. **Define the scouting report schema**
   - Decide exactly what fields every JSON report must contain.

3. **Create hand-written gold examples**
   - A few high-quality scouting examples in query/response format.

4. **Save the dataset to JSONL**
   - Store examples as `{"query": ..., "response": ...}` lines.

5. **Design the teacher prompt**
   - A carefully written instruction template that converts raw notes ‚Üí JSON.

6. **Run a small teacher-model test**
   - Verify that the prompt + model combination produces the outputs we want.

Synthetic data generation and LoRA training will live in **later notebooks**,
built on top of the foundations we create here.

In [1]:
import sys
import os
from pathlib import Path

print("Python version:", sys.version)
print("Working directory:", os.getcwd())

PROJECT_ROOT = Path.cwd().resolve().parents[0]
print("Project root:", PROJECT_ROOT)

print("Files in project root:", os.listdir(PROJECT_ROOT))
print("Environment files present:", [p for p in os.listdir(PROJECT_ROOT) if p.startswith('.env')])

Python version: 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)]
Working directory: C:\Users\user\Desktop\tinker-hello-world\notebooks
Project root: C:\Users\user\Desktop\tinker-hello-world
Files in project root: ['.env', '.env.example', '.git', '.gitignore', '.venv', 'data', 'LICENSE', 'notebooks', 'README.md', 'requirements.txt', 'test_env.py']
Environment files present: ['.env', '.env.example']


In [2]:
from dotenv import load_dotenv

load_dotenv()

print("TINKER_API_KEY loaded:", bool(os.getenv("TINKER_API_KEY")))
print("OPENAI_API_KEY loaded:", bool(os.getenv("OPENAI_API_KEY")))

TINKER_API_KEY loaded: True
OPENAI_API_KEY loaded: True


In [3]:
import tinker
import transformers
from openai import OpenAI

client = OpenAI()

TEACHER_MODEL = "gpt-4o-mini"  # can bump to gpt-4o for ‚Äúplatinum‚Äù examples if desired

print("Tinker SDK version:", tinker.__version__)
print("Transformers version:", transformers.__version__)
print("Using teacher model:", TEACHER_MODEL)

Tinker SDK version: 0.3.0
Transformers version: 4.57.1
Using teacher model: gpt-4o-mini


---

## üóÇÔ∏è Where Our Data Lives (and Why It Matters)

Before we write a single example, we want a clear mental model:

- **Code** lives in `/notebooks`
- **Raw / generated data** lives in `/data/...`
- This notebook should be able to run end-to-end and recreate everything it needs.

For this project we‚Äôll store our scouting distillation data in:

```text
project_root/
  data/
    basketball_distillation/
      scouting_examples.jsonl

In [6]:
from pathlib import Path

# PROJECT_ROOT was set earlier
DATA_DIR = PROJECT_ROOT / "data" / "basketball_distillation"
DATA_DIR.mkdir(parents=True, exist_ok=True)

DATA_FILE = DATA_DIR / "scouting_examples.jsonl"

print("Data directory:", DATA_DIR)
print("Dataset file:", DATA_FILE)

Data directory: C:\Users\user\Desktop\tinker-hello-world\data\basketball_distillation
Dataset file: C:\Users\user\Desktop\tinker-hello-world\data\basketball_distillation\scouting_examples.jsonl


---

## üìê Scouting Report JSON Schema

Every output from our future model should follow the **same shape**:

```json
{
  "player_name": "",
  "team": "",
  "birthdate": "",
  "position": "",
  "year": "",
  "conference": "",
  "archetype": "",
  "offense_summary": "",
  "defense_summary": "",
  "intangibles": "",
  "projection": ""

A few notes:

- **birthdate** gives us age, which is critical for evaluating development.

- **archetype** is a short, human-readable label we‚Äôll use for grouping and search.

- The three paragraphs (**offense_summary**, **defense_summary**, **intangibles**) are where the ‚Äúvoice‚Äù of our scouting model really lives.

- **projection** forces the model to make a time-bound, realistic statement.

This schema is the contract between:

- the **teacher prompt**
- the **distilled student model**
- and any **downstream tools** that consume these reports.

In [7]:
SCOUTING_SCHEMA_KEYS = [
    "player_name",
    "team",
    "birthdate",
    "position",
    "year",
    "conference",
    "archetype",
    "offense_summary",
    "defense_summary",
    "intangibles",
    "projection",
]

def validate_scouting_json(obj: dict) -> bool:
    """Quick check: does a dict match our expected schema keys?"""
    return set(obj.keys()) == set(SCOUTING_SCHEMA_KEYS)

In [8]:
from dataclasses import dataclass
import json

@dataclass
class ScoutingExample:
    # raw text we feed into the model
    query: str
    # JSON string the model should learn to produce
    response: str

def to_jsonl_line(example: ScoutingExample) -> str:
    """
    Convert an example into a single JSONL line:
    {"query": "...", "response": "..."}
    """
    return json.dumps(
        {"query": example.query, "response": example.response},
        ensure_ascii=False,
    )

---

# ‚úçÔ∏è Creating Gold Scouting Examples

Our distilled model will ultimately learn from **patterns** ‚Äî not code.

So before we generate hundreds of synthetic examples using a teacher model,  
we need a small number of **hand-crafted ‚Äúgold standard‚Äù examples**.

Why these matter:

- They set the *tone* and *voice*
- They anchor the *quality bar*
- They show the model exactly how we want archetypes written
- They teach structure, pacing, and vocabulary
- They stabilize the teacher model‚Äôs outputs (less drift)

Structure of each gold example:

1. **QUERY** ‚Üí messy scouting notes (stats, bullet points, observations)  
2. **RESPONSE** ‚Üí perfect JSON following our schema

We‚Äôll create three foundational archetypes:

1. **On-ball creator guard**  
2. **Rim-running energy big**  
3. **3&D wing**

These three roles cover the majority of player types in NCAA basketball.
They also represent distinct language patterns, which helps the student generalize.

---

In [9]:
raw_examples = []  # fresh list

guard_example = ScoutingExample(
    query=(
        "Player: Jamal Rivers\n"
        "School: NC State\n"
        "Conference: ACC\n"
        "Position: 6'4 guard (combo)\n"
        "Year: Sophomore\n"
        "Birthdate: 2005-03-14\n\n"
        "Recent games:\n"
        "- vs Duke: 24 pts, 5 ast, 3 reb, 3 TO, 9/17 FG, 3/7 3PT\n"
        "- vs UNC: 18 pts, 7 ast, 2 reb, 2 TO, 6/13 FG, 2/5 3PT\n"
        "- vs Wake: 16 pts, 8 ast, 4 reb, 1 TO, 5/12 FG\n\n"
        "Notes:\n"
        "- Primary on-ball creator for long stretches.\n"
        "- Good burst, consistently gets two feet in the paint.\n"
        "- Pull-up three is improving but still streaky.\n"
        "- Can get loose with the ball vs pressure (careless TOs).\n"
        "- Vocal, emotional leader for this group.\n"
    ),
    response=json.dumps(
        {
            "player_name": "Jamal Rivers",
            "team": "NC State",
            "birthdate": "2005-03-14",
            "position": "Guard (combo)",
            "year": "Sophomore",
            "conference": "ACC",
            "archetype": "On-ball creator guard",
            "offense_summary": (
                "Primary creator who lives on the ball. Creates paint touches with "
                "burst and change of pace, comfortable in pick-and-roll. Pull-up "
                "three is trending up but remains streaky, which affects consistency. "
                "Turnovers mostly come from loose handle against pressure and trying "
                "to force hero passes, not lack of vision."
            ),
            "defense_summary": (
                "Competes defensively and can navigate screens when locked in. "
                "Still developing discipline off the ball and can be late on "
                "tag responsibilities. Rebounds well for position and uses length "
                "to contest at the rim."
            ),
            "intangibles": (
                "Vocal emotional leader who raises team energy. Plays with fire and "
                "confidence. Needs to improve composure late in close games to reduce "
                "careless turnovers."
            ),
            "projection": (
                "Projects as a high-major lead guard with pro upside if shooting "
                "consistency and turnover control continue trending positively."
            ),
        },
        ensure_ascii=False,
    ),
)

raw_examples.append(guard_example)
len(raw_examples)

1

---

## üéØ Why Example #1 Is So Important

This is our template for **creation-heavy guards**, which are the trickiest
archetypes to evaluate because:

- Usage is high  
- Turnovers matter contextually  
- Shot selection has range  
- ‚ÄúLeader‚Äù qualities must be described concisely  
- Pace, driving, and playmaking require nuanced summaries  

A good distilled model must learn:
- What *realistic* strengths/weaknesses look like  
- How to talk about *paint touches*, *PnR comfort*, and *pull-up shooting trends*  
- How to articulate *decision-making* cleanly  

This one example teaches all of that.

---

## üèÄ Gold Example #2 ‚Äî Rim-Running Energy Big

Next up is the **rim-running energy big** archetype.

Why this one matters:

- Every roster has some version of this guy.
- Box scores alone rarely capture his value.
- He lives off **effort, vertical spacing, and offensive boards**.
- Free throws and fouls often cap his minutes.

We want the model to learn how to:

- Describe role-player value without overselling it.
- Talk about rim gravity, motor, and offensive rebounding.
- Balance strengths (energy, verticality) with limits (touch, FT%).

This example also shows how to phrase a realistic, non-hyped **projection**
for a player who may never be a star, but absolutely belongs in winning lineups.

In [10]:
big_example = ScoutingExample(
    query=(
        "Player: Marcus Lane\n"
        "School: Arkansas\n"
        "Conference: SEC\n"
        "Position: 6'8 forward (small-ball 5)\n"
        "Year: Junior\n"
        "Birthdate: 2004-01-22\n\n"
        "Recent games:\n"
        "- vs Kentucky: 10 pts, 11 reb (5 OR), 2 blk\n"
        "- vs Tennessee: 8 pts, 9 reb, 3 blk\n"
        "- vs LSU: 14 pts, 8 reb, 1 blk, 6/7 FG (all at rim)\n\n"
        "Notes:\n"
        "- Very high motor, sprints the floor every possession.\n"
        "- Limited touch outside 10 feet, mostly finishes at the rim.\n"
        "- Good timing as weak-side rim protector.\n"
        "- Can switch onto some guards in a pinch.\n"
        "- FT% still under 60%; teams intentionally foul late.\n"
    ),
    response=json.dumps(
        {
            "player_name": "Marcus Lane",
            "team": "Arkansas",
            "birthdate": "2004-01-22",
            "position": "Forward / Small-ball 5",
            "year": "Junior",
            "conference": "SEC",
            "archetype": "Rim-running energy big",
            "offense_summary": (
                "Plays almost entirely inside the arc as a vertical spacer and rim runner. "
                "Generates efficient offense through sprints in transition, hard rolls, "
                "and relentless offensive rebounding. Finishes well at the rim but offers "
                "little shooting or self-creation; free-throw inconsistency limits late-game usage."
            ),
            "defense_summary": (
                "High-motor defender who provides timely weak-side rim protection and contests "
                "above the rim. Can survive in switches against some guards for short stretches. "
                "Defensive value is tied to activity and foul discipline; can pick up cheap fouls "
                "when late on rotations."
            ),
            "intangibles": (
                "Brings consistent energy and physicality, changes pace of the game with hustle plays. "
                "Comfortable in a role-player lane and does not hunt touches. Needs improvement in "
                "communication on the back line and composure at the free-throw line."
            ),
            "projection": (
                "Projects as a rotation-level energy big at the high-major level with a path to "
                "professional opportunities if free-throw shooting and decision-making continue to improve."
            ),
        },
        ensure_ascii=False,
    ),
)

raw_examples.append(big_example)
len(raw_examples)

2

In [11]:
print("BIG EXAMPLE ‚Äì QUERY (truncated):\n", big_example.query[:350], "\n")
print("BIG EXAMPLE ‚Äì RESPONSE (truncated):\n", big_example.response[:350])

BIG EXAMPLE ‚Äì QUERY (truncated):
 Player: Marcus Lane
School: Arkansas
Conference: SEC
Position: 6'8 forward (small-ball 5)
Year: Junior
Birthdate: 2004-01-22

Recent games:
- vs Kentucky: 10 pts, 11 reb (5 OR), 2 blk
- vs Tennessee: 8 pts, 9 reb, 3 blk
- vs LSU: 14 pts, 8 reb, 1 blk, 6/7 FG (all at rim)

Notes:
- Very high motor, sprints the floor every possession.
- Limited touch 

BIG EXAMPLE ‚Äì RESPONSE (truncated):
 {"player_name": "Marcus Lane", "team": "Arkansas", "birthdate": "2004-01-22", "position": "Forward / Small-ball 5", "year": "Junior", "conference": "SEC", "archetype": "Rim-running energy big", "offense_summary": "Plays almost entirely inside the arc as a vertical spacer and rim runner. Generates efficient offense through sprints in transition, har


---

## üéØ Gold Example #3 ‚Äî Prototype 3&D Wing

This example covers one of the most **valuable, scalable archetypes** in modern basketball:
the dependable 3&D wing.

Why this archetype matters for our model:

- The language used for 3&D wings is **distinct**:  
  spacing, discipline, frame, closeouts, defensive versatility.

- These players often **don‚Äôt self-create**, so the model learns how to describe
  value **without usage or creation**.

- Coaches and scouts rely heavily on players like this because their game
  ‚Äú**travels**‚Äù ‚Äî their strengths translate across systems and levels.

For our dataset, this example teaches the model to:
- Recognize shooting gravity  
- Describe defensive fundamentals  
- Summarize low-maintenance offensive roles  
- Provide realistic, grounded projections  

This completes our 3-example gold set and gives the teacher model the diversity
it needs to begin generating high-quality synthetic scouting reports.

---

In [12]:
wing_example = ScoutingExample(
    query=(
        "Player: Devin Clark\n"
        "School: Colorado\n"
        "Conference: Pac-12\n"
        "Position: 6'6 wing\n"
        "Year: Senior\n"
        "Birthdate: 2002-11-03\n\n"
        "Recent games:\n"
        "- vs Arizona: 12 pts, 4/8 3PT, 6 reb, 2 stl\n"
        "- vs USC: 9 pts, 3/6 3PT, 5 reb, 1 blk\n"
        "- vs Oregon: 14 pts, 4/9 3PT, 7 reb, 3 ast\n\n"
        "Notes:\n"
        "- Reliable spot-up shooter with deep range.\n"
        "- Good defender, strong frame, disciplined on closeouts.\n"
        "- Limited self-creation off the bounce.\n"
        "- Makes simple reads, doesn't force plays.\n"
        "- Strong rebounder for position.\n"
    ),
    response=json.dumps(
        {
            "player_name": "Devin Clark",
            "team": "Colorado",
            "birthdate": "2002-11-03",
            "position": "Wing",
            "year": "Senior",
            "conference": "Pac-12",
            "archetype": "3&D wing",
            "offense_summary": (
                "High-level spot-up shooter with clean mechanics and deep range. "
                "Attacks closeouts in straight lines but offers limited creation off "
                "the dribble. Makes simple, correct reads within the flow of the "
                "offense and rarely forces shots."
            ),
            "defense_summary": (
                "Strong positional defender who uses frame and balance well. "
                "Disciplined on closeouts and mirrors drivers without gambling. "
                "Rebounds well for size and provides occasional weak-side contests."
            ),
            "intangibles": (
                "Low-maintenance rotation wing with strong feel for role. "
                "Competes consistently, communicates well on defense, and provides "
                "steady spacing on offense. High maturity and professionalism."
            ),
            "projection": (
                "Projects as a plug-and-play 3&D contributor at the high-major level "
                "with professional potential due to shooting volume, defensive "
                "reliability, and physical maturity."
            ),
        },
        ensure_ascii=False,
    ),
)

raw_examples.append(wing_example)
len(raw_examples)

3

In [13]:
print("WING EXAMPLE ‚Äì QUERY (truncated):\n", wing_example.query[:350], "\n")
print("WING EXAMPLE ‚Äì RESPONSE (truncated):\n", wing_example.response[:350])

WING EXAMPLE ‚Äì QUERY (truncated):
 Player: Devin Clark
School: Colorado
Conference: Pac-12
Position: 6'6 wing
Year: Senior
Birthdate: 2002-11-03

Recent games:
- vs Arizona: 12 pts, 4/8 3PT, 6 reb, 2 stl
- vs USC: 9 pts, 3/6 3PT, 5 reb, 1 blk
- vs Oregon: 14 pts, 4/9 3PT, 7 reb, 3 ast

Notes:
- Reliable spot-up shooter with deep range.
- Good defender, strong frame, disciplined on c 

WING EXAMPLE ‚Äì RESPONSE (truncated):
 {"player_name": "Devin Clark", "team": "Colorado", "birthdate": "2002-11-03", "position": "Wing", "year": "Senior", "conference": "Pac-12", "archetype": "3&D wing", "offense_summary": "High-level spot-up shooter with clean mechanics and deep range. Attacks closeouts in straight lines but offers limited creation off the dribble. Makes simple, correc


---

# üß† Teacher Prompt: The Blueprint We‚Äôre Distilling

We now have:

- A **clear JSON schema**
- Three **gold scouting examples**

Next, we design the **teacher prompt**.

This prompt is the *blueprint* that a large model (teacher) will follow to:

1. Read raw scouting notes (`query`)
2. Produce a **structured JSON scouting report** (`response`)
3. Stay inside our schema, tone, and constraints

Later, we‚Äôll:

- Feed many different scouting inputs into this teacher prompt  
- Capture its JSON outputs as training data  
- Use Tinker to distill that behavior into a smaller LoRA model

Because every future model will learn from this pattern, we treat the teacher prompt
like a contract: **strict, predictable, and easy to parse.**

---

In [14]:
TEACHER_PROMPT_TEMPLATE = """
You are an expert basketball scout and analytics assistant.

Your task is to convert RAW_SCOUTING_NOTES into a structured JSON scouting report.
Follow these rules exactly:

1. Always output valid JSON.
2. Use the exact schema below:

{{
  "player_name": "",
  "team": "",
  "birthdate": "",
  "position": "",
  "year": "",
  "conference": "",
  "archetype": "",
  "offense_summary": "",
  "defense_summary": "",
  "intangibles": "",
  "projection": ""
}}

3. Only use information provided in RAW_SCOUTING_NOTES.
4. Do NOT hallucinate missing data.
5. Keep summaries concise but informative (2‚Äì4 sentences per section).
6. Use professional basketball scouting terminology.
7. For "archetype", choose a short, meaningful label
   (e.g. "On-ball creator guard", "Rim-running energy big", "3&D wing",
    "Two-way combo guard", "Pick-and-pop big").
8. For "projection", give a realistic 1‚Äì3 year outlook based ONLY on the notes.

---

RAW_SCOUTING_NOTES:
{input_text}

---

Now output ONLY the structured JSON scouting report. No explanations, no extra text.
"""

---

## üß© Why This Prompt Is Structured Like This

A few key design choices:

- **Strict JSON schema**  
  The model knows exactly what keys to produce. That makes training, evaluation,
  and downstream usage much easier.

- **‚ÄúOnly use information provided‚Äù**  
  We don‚Äôt want creative fiction; we want grounded scouting based on given stats/notes.

- **2‚Äì4 sentence summaries**  
  Long enough to be useful, short enough to stay punchy and consistent.

- **Explicitly named `RAW_SCOUTING_NOTES` block**  
  Makes it obvious to the model what text to transform.

- **‚ÄúOutput ONLY JSON‚Äù**  
  Prevents extra commentary that would break our parsers and training scripts.

This same pattern can be adapted to:
- Insurance claim summaries  
- Legal case memos  
- Cyber incident reports  
- Any place you want ‚Äúmessy ‚Üí structured JSON‚Äù.

---

In [15]:
test_input = raw_examples[0].query  # Jamal Rivers example

prompt = TEACHER_PROMPT_TEMPLATE.format(input_text=test_input)

response = client.responses.create(
    model=TEACHER_MODEL,  # "gpt-4o-mini"
    input=prompt,
)

teacher_output = response.output_text
print(teacher_output)

```json
{
  "player_name": "Jamal Rivers",
  "team": "NC State",
  "birthdate": "2005-03-14",
  "position": "Combo guard",
  "year": "Sophomore",
  "conference": "ACC",
  "archetype": "On-ball creator guard",
  "offense_summary": "Rivers serves as the primary on-ball creator, showcasing good burst and the ability to penetrate the defense. His pull-up three-point shooting is improving but remains inconsistent, with some streakiness evident in his recent performances.",
  "defense_summary": "While not elaborated upon extensively, Rivers may struggle with ball security under pressure, leading to careless turnovers. This could indicate areas for improvement on the defensive end as well.",
  "intangibles": "He is a vocal and emotional leader for his team, contributing significantly to team morale and cohesiveness during games.",
  "projection": "With continued development, Rivers could become a reliable secondary ball handler and scoring option in the next 1-3 years."
}
```


---

# üß™ Synthetic Generation & Expanded Dataset

Now that our teacher prompt works, we can start using it to **expand** the dataset.

For this notebook, we‚Äôll keep things simple and transparent:

1. Use the teacher model (`gpt-4o-mini`) to generate **JSON scouting reports**
   from multiple raw scouting inputs.

2. Store each result as a `ScoutingExample`:
   - `query`  = raw notes we fed into the teacher
   - `response` = JSON string returned by the teacher

3. Append these to our existing `raw_examples` list
   (which already contains our 3 hand-crafted gold examples).

4. Save everything to `data/basketball_distillation/scouting_examples.jsonl`.

This gives us a small but meaningful dataset we can later plug into a Tinker
distillation recipe.

---

In [16]:
import re

def extract_json_from_text(text: str) -> str:
    """
    Extract the JSON object from a raw model string.

    Handles cases like:
    ```json
    { ... }
    ```
    or extra commentary by grabbing the first {...} block.
    """
    text = text.strip()
    # If fenced with ```...```, strip the fences
    if text.startswith("```"):
        # Remove backticks and optional language label
        text = re.sub(r"^```[a-zA-Z0-9]*\s*", "", text)
        text = re.sub(r"\s*```$", "", text, flags=re.DOTALL).strip()

    # As a fallback, grab first {...} block
    start = text.find("{")
    end = text.rfind("}")
    if start != -1 and end != -1 and end > start:
        return text[start : end + 1]

    return text


def call_teacher_and_parse(input_text: str) -> dict:
    """Call the teacher model and return a parsed JSON dict."""
    prompt = TEACHER_PROMPT_TEMPLATE.format(input_text=input_text)

    resp = client.responses.create(
        model=TEACHER_MODEL,
        input=prompt,
    )

    raw_output = resp.output_text
    json_str = extract_json_from_text(raw_output)
    obj = json.loads(json_str)

    if not validate_scouting_json(obj):
        raise ValueError("Teacher output does not match expected schema keys.")

    return obj

In [17]:
test_parsed = call_teacher_and_parse(raw_examples[0].query)

print("Parsed keys:", test_parsed.keys())
print("\nOffense summary:\n", test_parsed["offense_summary"])
print("\nProjection:\n", test_parsed["projection"])

Parsed keys: dict_keys(['player_name', 'team', 'birthdate', 'position', 'year', 'conference', 'archetype', 'offense_summary', 'defense_summary', 'intangibles', 'projection'])

Offense summary:
 Rivers serves as the primary on-ball creator, showcasing good burst and the ability to penetrate defenses. His pull-up three-point shot is improving but remains inconsistent. Recent performances reflect his scoring potential, highlighted by significant contributions against Duke and UNC.

Projection:
 With continued development, Rivers has the potential to emerge as a reliable scoring option and playmaker in the next 1-3 years, particularly if he shores up his decision-making under pressure.


In [18]:
synthetic_inputs = [
    # Two-way combo guard
    (
        "Player: Chris Miller\n"
        "School: Virginia Tech\n"
        "Conference: ACC\n"
        "Position: 6'3 guard (combo)\n"
        "Year: Junior\n"
        "Birthdate: 2003-06-10\n\n"
        "Recent games:\n"
        "- vs Miami: 19 pts, 6 ast, 4 reb, 2 stl\n"
        "- vs Syracuse: 15 pts, 7 ast, 3 reb, 1 stl\n"
        "- vs Clemson: 11 pts, 8 ast, 5 reb, 3 TO\n\n"
        "Notes:\n"
        "- Splits ball-handling duties, comfortable on or off the ball.\n"
        "- Solid catch-and-shoot threat from three.\n"
        "- Takes on tough guard matchups defensively.\n"
        "- Makes mature reads in ball screens, low-mistake style.\n"
    ),

    # Stretch big
    (
        "Player: Andre Novak\n"
        "School: Creighton\n"
        "Conference: Big East\n"
        "Position: 6'10 forward/center\n"
        "Year: Sophomore\n"
        "Birthdate: 2004-09-01\n\n"
        "Recent games:\n"
        "- vs Marquette: 17 pts, 7 reb, 3/6 3PT\n"
        "- vs Xavier: 13 pts, 8 reb, 2/5 3PT\n"
        "- vs Villanova: 9 pts, 6 reb, 1/4 3PT\n\n"
        "Notes:\n"
        "- Pick-and-pop threat with quick release.\n"
        "- Average athlete vertically, relies more on positioning.\n"
        "- Competes on the glass but not a dominant rim protector.\n"
        "- Can be targeted in space by quicker guards.\n"
    ),

    # Slashing wing
    (
        "Player: Malik Johnson\n"
        "School: Houston\n"
        "Conference: Big 12\n"
        "Position: 6'5 wing\n"
        "Year: Sophomore\n"
        "Birthdate: 2005-02-18\n\n"
        "Recent games:\n"
        "- vs Baylor: 16 pts, 6 reb, 3 ast\n"
        "- vs Kansas: 14 pts, 7 reb, 2 stl\n"
        "- vs Texas: 18 pts, 5 reb, 4 ast\n\n"
        "Notes:\n"
        "- Physical downhill driver who lives in the paint.\n"
        "- Jumper is streaky from three but improving off the catch.\n"
        "- Defends multiple positions with toughness.\n"
        "- Plays with high motor on both ends, active on the glass.\n"
    ),
]

len(synthetic_inputs)

3

In [19]:
synthetic_examples = []

for i, raw_text in enumerate(synthetic_inputs, start=1):
    print(f"Generating teacher output for synthetic example {i}...")
    parsed = call_teacher_and_parse(raw_text)

    ex = ScoutingExample(
        query=raw_text,
        response=json.dumps(parsed, ensure_ascii=False),
    )
    synthetic_examples.append(ex)

print("Synthetic examples generated:", len(synthetic_examples))

# Merge with existing gold examples
raw_examples.extend(synthetic_examples)
print("Total examples (gold + synthetic):", len(raw_examples))

Generating teacher output for synthetic example 1...
Generating teacher output for synthetic example 2...
Generating teacher output for synthetic example 3...
Synthetic examples generated: 3
Total examples (gold + synthetic): 6


In [20]:
def save_examples_to_jsonl(examples, path: Path):
    with path.open("w", encoding="utf-8") as f:
        for ex in examples:
            f.write(to_jsonl_line(ex) + "\n")

save_examples_to_jsonl(raw_examples, DATA_FILE)

print("Saved examples to:", DATA_FILE)
print("Total examples saved:", len(raw_examples))

Saved examples to: C:\Users\user\Desktop\tinker-hello-world\data\basketball_distillation\scouting_examples.jsonl
Total examples saved: 6


## 7. Load the distilled scouting dataset

Now that we've generated and saved our 6 scouting examples, we'll:

1. Load `scouting_examples.jsonl` back from disk  
2. Convert it into `{"input": ..., "output": ...}` pairs  
3. Prepare it for Tinker supervised learning / prompt distillation  

In [22]:
from pathlib import Path

print("CWD:", Path.cwd())
print("Relative path exists?:", Path("data/basketball_distillation/scouting_examples.jsonl").exists())

print("\nSearching for scouting_examples file:")
for p in Path(".").rglob("scouting_examples*"):
    print(" -", p.resolve())

CWD: C:\Users\user\Desktop\tinker-hello-world\notebooks
Relative path exists?: False

Searching for scouting_examples file:


In [23]:
from pathlib import Path

DATA_FILE = Path(
    r"C:\Users\user\Desktop\tinker-hello-world\data\basketball_distillation\scouting_examples.jsonl"
)

In [24]:
raw_loaded = load_examples_from_jsonl(DATA_FILE)
print(f"Loaded examples: {len(raw_loaded)}")
print("First raw example keys:", raw_loaded[0].keys())

Loaded examples: 6
First raw example keys: dict_keys(['query', 'response'])


In [25]:
from pathlib import Path

cwd = Path.cwd()

# Case 1: running from repo root
root_candidate = cwd
# Case 2: running from inside notebooks/ or another child folder
if not (root_candidate / "data").exists() and (cwd.parent / "data").exists():
    root_candidate = cwd.parent

PROJECT_ROOT = root_candidate
DATA_FILE = PROJECT_ROOT / "data" / "basketball_distillation" / "scouting_examples.jsonl"

print("PROJECT_ROOT:", PROJECT_ROOT)
print("DATA_FILE:", DATA_FILE, "exists:", DATA_FILE.exists())

PROJECT_ROOT: C:\Users\user\Desktop\tinker-hello-world
DATA_FILE: C:\Users\user\Desktop\tinker-hello-world\data\basketball_distillation\scouting_examples.jsonl exists: True


In [26]:
def to_tinker_dataset(examples):
    """
    Convert our stored examples into the format expected by Tinker
    supervised learning / distillation:

        {"input": <query>, "output": <response>}
    """
    dataset = []
    for ex in examples:
        dataset.append(
            {
                "input": ex["query"],
                "output": ex["response"],
            }
        )
    return dataset

tinker_dataset = to_tinker_dataset(raw_loaded)
print(f"Tinker dataset size: {len(tinker_dataset)}")
pprint(tinker_dataset[0])

Tinker dataset size: 6
{'input': 'Player: Jamal Rivers\n'
          'School: NC State\n'
          'Conference: ACC\n'
          "Position: 6'4 guard (combo)\n"
          'Year: Sophomore\n'
          'Birthdate: 2005-03-14\n'
          '\n'
          'Recent games:\n'
          '- vs Duke: 24 pts, 5 ast, 3 reb, 3 TO, 9/17 FG, 3/7 3PT\n'
          '- vs UNC: 18 pts, 7 ast, 2 reb, 2 TO, 6/13 FG, 2/5 3PT\n'
          '- vs Wake: 16 pts, 8 ast, 4 reb, 1 TO, 5/12 FG\n'
          '\n'
          'Notes:\n'
          '- Primary on-ball creator for long stretches.\n'
          '- Good burst, consistently gets two feet in the paint.\n'
          '- Pull-up three is improving but still streaky.\n'
          '- Can get loose with the ball vs pressure (careless TOs).\n'
          '- Vocal, emotional leader for this group.\n',
 'output': '{"player_name": "Jamal Rivers", "team": "NC State", "birthdate": '
           '"2005-03-14", "position": "Guard (combo)", "year": "Sophomore", '
           '"conf

## 8. Create the Tinker TrainingClient (student model) üèÄ

We'll fine-tune a small open-weight model with LoRA using Tinker.

- Base model: `meta-llama/Llama-3.2-1B` (cheap + good enough for demo)
- Objective: learn to map free-form scouting notes ‚Üí structured JSON
- Method: supervised learning / prompt distillation on our 6 examples

In [27]:
import tinker
from tinker import types

service_client = tinker.ServiceClient()

# You can uncomment this to inspect available models:
# for item in service_client.get_server_capabilities().supported_models:
#     print("- " + item.model_name)

BASE_MODEL = "meta-llama/Llama-3.2-1B"

training_client = service_client.create_lora_training_client(
    base_model=BASE_MODEL,
    rank=32,  # low-rank size; bump if you ever want more capacity
)

print("Created TrainingClient with base model:", BASE_MODEL)
tokenizer = training_client.get_tokenizer()

Created TrainingClient with base model: meta-llama/Llama-3.2-1B


## 9. Build supervised learning data (tokens + weights)

We format each example as:

**Prompt (context, weight = 0):**

> You are an expert basketball scout...  
> Scouting notes:  
> \<free-form notes\>  
> Scouting JSON:

**Completion (target, weight = 1):**

> \<the JSON string we want the model to learn to emit\>

Then we convert to `types.Datum` objects for Tinker.

In [28]:
def build_prompt(example):
    """Template for our student model."""
    return (
        "You are an expert basketball scout. "
        "Given the free-form scouting notes below, return a structured JSON "
        "scouting report matching the requested schema.\n\n"
        "Scouting notes:\n"
        f"{example['input']}\n\n"
        "Scouting JSON:"
    )

def process_example_for_tinker(example, tokenizer) -> types.Datum:
    """
    Convert a single input/output pair into a Tinker Datum with:
      - tokens
      - target_tokens
      - per-token weights
    """
    prompt = build_prompt(example)
    # Prompt (context)
    prompt_tokens = tokenizer.encode(prompt, add_special_tokens=True)
    prompt_weights = [0] * len(prompt_tokens)

    # Completion (the JSON string)
    # Leading space + two newlines at end is a simple, stable pattern
    completion_text = " " + example["output"] + "\n\n"
    completion_tokens = tokenizer.encode(completion_text, add_special_tokens=False)
    completion_weights = [1] * len(completion_tokens)

    tokens = prompt_tokens + completion_tokens
    weights = prompt_weights + completion_weights

    # Shift for next-token prediction
    input_tokens = tokens[:-1]
    target_tokens = tokens[1:]
    weights = weights[1:]

    return types.Datum(
        model_input=types.ModelInput.from_ints(tokens=input_tokens),
        loss_fn_inputs=dict(
            weights=weights,
            target_tokens=target_tokens,
        ),
    )

processed_examples = [process_example_for_tinker(ex, tokenizer) for ex in tinker_dataset]
print(f"Processed examples: {len(processed_examples)}")

Processed examples: 6


In [29]:
datum0 = processed_examples[0]
print(f"{'Input':<20} {'Target':<20} {'Weight':<10}")
print("-" * 60)

for inp, tgt, w in zip(
    datum0.model_input.to_ints(),
    datum0.loss_fn_inputs["target_tokens"].tolist(),
    datum0.loss_fn_inputs["weights"].tolist(),
):
    print(f"{repr(tokenizer.decode([inp])):<20} {repr(tokenizer.decode([tgt])):<20} {w:<10}")

Input                Target               Weight    
------------------------------------------------------------
'<|begin_of_text|>'  'You'                0.0       
'You'                ' are'               0.0       
' are'               ' an'                0.0       
' an'                ' expert'            0.0       
' expert'            ' basketball'        0.0       
' basketball'        ' scout'             0.0       
' scout'             '.'                  0.0       
'.'                  ' Given'             0.0       
' Given'             ' the'               0.0       
' the'               ' free'              0.0       
' free'              '-form'              0.0       
'-form'              ' scouting'          0.0       
' scouting'          ' notes'             0.0       
' notes'             ' below'             0.0       
' below'             ','                  0.0       
','                  ' return'            0.0       
' return'            ' a'             

## 10. Train the student model (prompt distillation)

We'll:
1. Run a simple supervised loop on our 6 examples  
2. Track loss per token  
3. Keep it small (e.g., 50 steps) to stay cheap and fast  

In [31]:
import numpy as np

NUM_STEPS = 50         # you can start smaller (e.g., 20) if you want
LEARNING_RATE = 5e-4   # safe, modest LR for a tiny LoRA

for step in range(1, NUM_STEPS + 1):
    # 1) forward + backward pass using cross-entropy
    fwdbwd_future = training_client.forward_backward(
        processed_examples,
        loss_fn="cross_entropy",
    )

    # 2) optimizer step
    optim_future = training_client.optim_step(
        types.AdamParams(learning_rate=LEARNING_RATE)
    )

    # 3) wait for results
    fwdbwd_result = fwdbwd_future.result()
    optim_result = optim_future.result()

    # 4) compute weighted average loss per token (same as docs)
    logprobs = np.concatenate(
        [out["logprobs"].tolist() for out in fwdbwd_result.loss_fn_outputs]
    )
    weights = np.concatenate(
        [ex.loss_fn_inputs["weights"].tolist() for ex in processed_examples]
    )

    loss = -np.dot(logprobs, weights) / weights.sum()
    print(f"Step {step:03d}/{NUM_STEPS} - loss per token: {loss:.4f}")

Step 001/50 - loss per token: 2.1335
Step 002/50 - loss per token: 1.7779
Step 003/50 - loss per token: 1.3076
Step 004/50 - loss per token: 0.9218
Step 005/50 - loss per token: 0.6229
Step 006/50 - loss per token: 0.3534
Step 007/50 - loss per token: 0.1740
Step 008/50 - loss per token: 0.0745
Step 009/50 - loss per token: 0.0359
Step 010/50 - loss per token: 0.0193
Step 011/50 - loss per token: 0.0129
Step 012/50 - loss per token: 0.0143
Step 013/50 - loss per token: 0.0045
Step 014/50 - loss per token: 0.0033
Step 015/50 - loss per token: 0.0028
Step 016/50 - loss per token: 0.0029
Step 017/50 - loss per token: 0.0014
Step 018/50 - loss per token: 0.0012
Step 019/50 - loss per token: 0.0011
Step 020/50 - loss per token: 0.0010
Step 021/50 - loss per token: 0.0008
Step 022/50 - loss per token: 0.0007
Step 023/50 - loss per token: 0.0007
Step 024/50 - loss per token: 0.0005
Step 025/50 - loss per token: 0.0004
Step 026/50 - loss per token: 0.0003
Step 027/50 - loss per token: 0.0003
S

In [32]:
# Turn trained LoRA weights into a sampling client
sampling_client = training_client.save_weights_and_get_sampling_client(
    name="basketball-scouting-v1"
)

print("Sampling client ready.")

Sampling client ready.


In [37]:
from tinker import types as tinker_types
import json

def build_scouting_prompt_from_notes(notes: str) -> str:
    """Reuse the same prompt template we trained on."""
    return (
        "You are an expert basketball scout. "
        "Given the free-form scouting notes below, return a structured JSON "
        "scouting report matching the requested schema.\n\n"
        "Scouting notes:\n"
        f"{notes.strip()}\n\n"
        "Scouting JSON:"
    )

def extract_json_block(text: str) -> str:
    """
    Extract the FIRST JSON object from the text.

    We:
    - Find the first '{'
    - Track brace depth until it returns to 0
    - Return that substring as our JSON candidate
    """
    start = text.find("{")
    if start == -1:
        raise ValueError("No '{' found in text.")

    depth = 0
    end = None

    for i, ch in enumerate(text[start:], start=start):
        if ch == "{":
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0:
                end = i
                break

    if end is None:
        raise ValueError("Could not find matching '}' for JSON object.")

    return text[start : end + 1]

def run_scouting_inference(notes: str, client=sampling_client, show_raw: bool = True):
    """Send new scouting notes to the model and print/parse the JSON it returns."""
    prompt = build_scouting_prompt_from_notes(notes)
    model_input = tinker_types.ModelInput.from_ints(
        tokenizer.encode(prompt, add_special_tokens=True)
    )

    params = tinker_types.SamplingParams(
        max_tokens=512,
        temperature=0.0,
    )

    result = client.sample(
        prompt=model_input,
        sampling_params=params,
        num_samples=1,
    ).result()

    decoded = tokenizer.decode(result.sequences[0].tokens)

    if show_raw:
        print("=== Full decoded output ===")
        print(decoded)

    # Keep only the tail after "Scouting JSON:"
    parts = decoded.split("Scouting JSON:", 1)
    tail = parts[1] if len(parts) == 2 else decoded

    try:
        json_text = extract_json_block(tail)
    except ValueError:
        json_text = tail.strip()

    print("\n=== Extracted JSON candidate ===")
    print(json_text)

    # Try to parse JSON for sanity
    try:
        parsed = json.loads(json_text)
        print("\n‚úÖ Parsed JSON keys:", list(parsed.keys()))
    except Exception as e:
        print("\n‚ö†Ô∏è Could not parse JSON:", e)

    return json_text

In [38]:
test_notes = """
Long 6'6\" wing who can guard 2‚Äì4. Competes on the glass and has good positional size.
Comfortable hitting open corner threes and attacking closeouts in straight lines.
Will push the ball in transition but not a primary creator.
Still learning weak-side help rotations; can be late tagging the roller.
High-motor, vocal, coaches trust him with toughest perimeter assignments.
"""

_ = run_scouting_inference(test_notes)

=== Full decoded output ===
 {"team_name": "Team A", "birthdate": "1998-01-01", "team_type": "Wing", "archive_link": "https://archive.org/details/wing_team_A", "pros": "Low-level rotation wing with experience in 2-4 leagues. Competes on the boards and has good positional size.", "cons": "Comfortable hitting open corner threes and attacking closeouts in straight lines.", "analysis": "Will push the ball in transition but not a primary creator.", "projection": "Projects as a rotation wing at the high-major level with a path to professional opportunities if shooting consistency and decision-making continue trending positively."}

{"team_name": "Team A", "birthdate": "1998-01-01", "team_type": "Wing", "archive_link": "https://archive.org/details/wing_team_A", "pros": "Low-level rotation wing with experience in 2-4 leagues. Competes on the boards and has good positional size.", "cons": "Comfortable hitting open corner threes and attacking closeouts in straight lines.", "analysis": "Will push

In [39]:
# Sampling client for the *base* model (no LoRA)
base_sampling_client = service_client.create_sampling_client(
    base_model=BASE_MODEL  # same string you used for create_lora_training_client
)

print("Base sampling client ready:", BASE_MODEL)

Base sampling client ready: meta-llama/Llama-3.2-1B


In [40]:
def compare_base_vs_finetuned(notes: str):
    print("====== BASE MODEL (no training) ======")
    _ = run_scouting_inference(
        notes,
        client=base_sampling_client,
        show_raw=False,  # keep output shorter, we'll still see parse status
    )

    print("\n\n====== FINE-TUNED MODEL (basketball-scouting-v1) ======")
    _ = run_scouting_inference(
        notes,
        client=sampling_client,
        show_raw=False,
    )

In [41]:
compare_base_vs_finetuned(test_notes)


=== Extracted JSON candidate ===
{
    "suit" : "Small-for-Position",  
    "position" : "Shooting Guard",  
    "defenson" : false,
    "competency" :"2",
    "bones" : ["Locks", "Tickes", "Hooks", "Joints"],
    "length" : "6'6",
    "depth" : "a scoring  guard",
    "balance" : "Left side of court",
    "overall" : "long and even," 
    "activity" : "Hard to finish in transition," 
    "efficient" : "1.8", 
    "magnum" : "quite immobile.",
    "friction" : "Weight transfer is good," 
    "hoop play" : "will hit outside shots"," 
    "turnover" : "too anxioius,"
    "strength" : "creates contact",
    "athleticism" : "Can —Å–æ—Å—Ç–∞–≤–µ a rapid punch",
    "T-fall" : "routine",
    "agility" : "inexpensive locks and slips",
    " explosiveness" : "Top tier",
    "general" : "Average save",
    "leadership" :  "doesn't lookada leader",
    "thinking" : "has football talent",
    "eloquence" : "has dbl|drioscdr talent",
    "deep ball" : "makes the jump shot",
    "Authority" : "shyn

## 13. Wrap-Up: What We Built in This Notebook üèÄ

In this notebook we went from **scratch** to a working, domain-tuned scouting model using **Tinker + LoRA**:

1. **Defined a structured scouting schema**  
   We decided what information matters for our use case (role, strengths, weaknesses, projection, etc.) and expressed it as a JSON schema.

2. **Created a few high-quality ‚Äúgold‚Äù examples**  
   We hand-wrote scouting reports (free-form text + JSON).  
   These examples set the tone, vocabulary, and level of detail we want the model to learn.

3. **Used a teacher model to generate synthetic data**  
   With the OpenAI Responses API, we converted a small set of seed notes into additional
   input/output pairs and saved them to:

   `data/basketball_distillation/scouting_examples.jsonl`

4. **Converted the dataset into Tinker‚Äôs supervised format**  
   Each example became an `{"input": ..., "output": ...}` pair and was wrapped as a
   `types.Datum` with:
   - `model_input` tokens  
   - `target_tokens`  
   - per-token `weights` to say which tokens should contribute to loss.

5. **Ran a LoRA training loop on a small open model**  
   Using `training_client.forward_backward(..., loss_fn="cross_entropy")` and
   `training_client.optim_step(...)`, we fine-tuned  
   `meta-llama/Llama-3.2-1B` on our scouting dataset and watched the loss per token drop.

6. **Saved the trained weights and created a sampler**  
   We called:

   ```python
   sampling_client = training_client.save_weights_and_get_sampling_client(
       name="basketball-scouting-v1"
   )

## 14. Optional: Mini Evaluation Harness ‚Äì Base vs Fine-Tuned

This section is **optional / advanced** and is here to show how we might
evaluate our model in a quick, scrappy way.

We‚Äôll:

1. Define a small set of **test scouting blurbs** (different player types).
2. Build a helper that:
   - runs both the **base model** and the **fine-tuned model**,
   - tries to extract and parse a JSON object from each output.
3. Print **only the cases where at least one model produced valid JSON**, so we
   can compare before/after behavior without a lot of error noise.

Because our dataset and training run are tiny, we don‚Äôt expect perfect results.
The goal here is to:

- See how the **base model** behaves ‚Äúout of the box‚Äù.
- See how the **fine-tuned model** behaves on the same inputs.
- Get a more honest sense of what we gained from prompt distillation.

In [44]:
import json

def infer_json_dict(notes: str, client) -> tuple[dict | None, str]:
    """
    Run a model on free-form scouting notes and try to return a parsed JSON dict.
    
    Returns:
      (parsed_dict_or_None, raw_json_text)
    
    This function is silent (no printing) so we can control output from
    the evaluation harness.
    """
    prompt = build_scouting_prompt_from_notes(notes)
    model_input = tinker_types.ModelInput.from_ints(
        tokenizer.encode(prompt, add_special_tokens=True)
    )

    params = tinker_types.SamplingParams(
        max_tokens=512,
        temperature=0.0,  # deterministic, easier to debug
    )

    result = client.sample(
        prompt=model_input,
        sampling_params=params,
        num_samples=1,
    ).result()

    decoded = tokenizer.decode(result.sequences[0].tokens)

    # Keep tail after "Scouting JSON:", if present
    parts = decoded.split("Scouting JSON:", 1)
    tail = parts[1] if len(parts) == 2 else decoded

    try:
        json_text = extract_json_block(tail)
    except ValueError:
        json_text = tail.strip()

    try:
        parsed = json.loads(json_text)
    except Exception:
        parsed = None

    return parsed, json_text


# ---- Evaluation cases ----

eval_cases = [
    {
        "name": "Switchy 6'4 on-ball guard",
        "notes": """
        6'4 guard who handles primary creation duties. Good feel in pick-and-roll,
        hits pocket passes, and can pull up from three off the dribble.
        Defense is inconsistent; tends to relax off-ball and can die on screens.
        Competitive in big moments, vocal with teammates.
        """,
    },
    {
        "name": "Small-ball 5 with motor",
        "notes": """
        Undersized big who plays mostly as a small-ball 5. High motor, sprints the floor,
        sets solid screens and dives hard. Finishes well on short rolls and dump-offs.
        Struggles with length at the rim and can foul when late in rotation.
        Great energy, bench celebrates his minutes.
        """,
    },
    {
        "name": "Spot-up 3-and-D wing",
        "notes": """
        6'7 wing who thrives as a spot-up shooter and secondary defender.
        Reliable from the corners, willing ball mover, rarely dribbles more than twice.
        Defensively takes the best opposing wing, good at staying in stance and contesting.
        Needs to add strength and improve handle to attack closeouts.
        """,
    },
]

# Optionally, seed a few eval cases directly from the training-style inputs.
# This increases the chance that the fine-tuned model will produce valid JSON,
# which is useful when you want a couple of clean comparison examples.

for i, ex in enumerate(tinker_dataset[:3]):
    eval_cases.append(
        {
            "name": f"Training-style example {i+1}",
            "notes": ex["input"],
        }
    )


def compare_base_vs_finetuned_successes(cases, max_examples: int | None = None):
    """
    For each test case:
      - run both the base model and the fine-tuned model
      - try to parse JSON from each
      - print ONLY the cases where at least one model produced valid JSON
    
    If max_examples is set, stop after printing that many successful comparisons.
    """
    print("===== COMPARISON: ONLY JSON SUCCESSES SHOWN =====")
    successes = 0

    for case in cases:
        base_parsed, _ = infer_json_dict(case["notes"], base_sampling_client)
        ft_parsed, _ = infer_json_dict(case["notes"], sampling_client)

        # Skip cases where neither model produced valid JSON
        if base_parsed is None and ft_parsed is None:
            continue

        successes += 1
        print(f"\n=== {case['name']} ===")

        if base_parsed is not None:
            print("\n[BASE MODEL JSON]")
            print(json.dumps(base_parsed, indent=2))

        if ft_parsed is not None:
            print("\n[FINE-TUNED MODEL JSON]")
            print(json.dumps(ft_parsed, indent=2))

        if max_examples is not None and successes >= max_examples:
            break

    if successes == 0:
        print("\n(no cases produced valid JSON ‚Äì try adding more eval_cases or using training-style inputs)")
    else:
        print(f"\n(done ‚Äì printed {successes} successful comparison(s))")

In [45]:
# Show up to 3 successful comparisons where at least one model produced valid JSON.
compare_base_vs_finetuned_successes(eval_cases, max_examples=3)

===== COMPARISON: ONLY JSON SUCCESSES SHOWN =====

=== Spot-up 3-and-D wing ===

[FINE-TUNED MODEL JSON]
{
  "team_name": "Team A",
  "player_name": "Wing",
  "birthdate": "1994-01-01",
  "team": "Team A",
  "archetype": "3&D wing",
  "offense_summary": "High-level wing who consistently attacks closeouts and Makes correct reads within the flow of the offense.",
  "defense_summary": "Competes defensively and takes the best opposing wing, good at staying in stance and contesting.",
  "intangibles": "Willing ball mover and rarely dribbles more than twice, changes pace of the flow of the offense with shooting.",
  "projection": "Projects as a rotation wing at the high-major level with a professional future if shooting consistency and defensive reliability continue trending positively."
}

=== Training-style example 1 ===

[BASE MODEL JSON]
{
  "personality": {
    "on-the-court": 1,
    "skills": {
      "awareness": 0.69
    },
    "seasoning": 1,
    "loyalty": 0.5,
    "focus": 1,
    "

## Final Comparison Summary (Base Model vs Fine-Tuned Model)

In this advanced section, we compared the **base model** and our **fine-tuned basketball‚Äëscouting model**. We only showed the examples where at least one model produced valid JSON.

Here‚Äôs what the results demonstrate in a simple, straightforward way:

---

### **1. Spot‚Äëup 3‚Äëand‚ÄëD Wing**

* **Fine‚Äëtuned model** produced clean, structured scouting JSON.
* **Base model** did not produce valid JSON for this example.
* The fine‚Äëtuned output followed our intended scouting schema and stayed on-topic.

---

### **2. Training‚ÄëStyle Example #1**

* **Base model** produced valid JSON, but the structure was random and unrelated to scouting.
* **Fine‚Äëtuned model** produced a detailed, accurate scouting report with the correct fields.
* Clearly shows the difference between ‚Äúgeneric JSON‚Äù and ‚Äútask‚Äëaligned JSON.‚Äù

---

### **3. Training‚ÄëStyle Example #2**

* **Base model** again returned JSON, but with inconsistent and irrelevant keys.
* **Fine‚Äëtuned model** produced a structured report that fit the player type perfectly (small‚Äëball 5 / rim‚Äërunning big).
* The fine‚Äëtuned model consistently used fields like `offense_summary`, `defense_summary`, `intangibles`, and `projection`.

---

## **What This Means**

* The **base model** sometimes outputs valid JSON, but the content is unpredictable and not usable for scouting.
* The **fine‚Äëtuned model** is trained on a tiny dataset, so it isn‚Äôt perfect‚Äîbut when it succeeds, the output is:

  * Structured correctly
  * Domain‚Äëspecific
  * Immediately useful for analysis, apps, or downstream tools

This optional evaluation highlights the real win from our prompt‚Äëdistillation process:

> **We aren‚Äôt just getting JSON‚Äîwe‚Äôre getting the right JSON, in the right format, for the job we care about.**

This reinforces the value of fine‚Äëtuning even with small datasets, especially for domain‚Äëspecific workflows like basketball scouting.