# Google Code Golf 2025 — ARC-AGI Minimal Solutions (Starter)

This notebook is a **clean starter** for the Kaggle competition:
**Google Code Golf Championship (NeurIPS 2025)**.  
Goal: for **each of the 400 ARC-AGI public tasks**, write a **Python 3 program** that
produces the correct transformation using the **fewest bytes** — 1 char saved = 1 point.

**What you get here:**
- Minimal, competition-compliant scaffolding (no external imports beyond stdlib)
- Utilities to **load tasks**, **evaluate solutions**, and **count byte length**
- A **solution registry** so you can assign different tiny functions to different tasks
- An automated **submission packer** that writes `task001.py ... task400.py` into `submission.zip`
- A few tiny **baseline patterns** (identity, constant fill, color map) to get started quickly



## 1. Imports & Paths



In [None]:
# Competition-safe imports (stdlib only)
from pathlib import Path
import json, re, os, sys, itertools, functools, math, random, zipfile


In [None]:

CANDIDATE_INPUT_DIRS = [
    Path("/kaggle/input/google-code-golf-2025"),

    Path("/kaggle/working/google-code-golf-2025") # if you unzip there
]

DATA_DIR = None
for p in CANDIDATE_INPUT_DIRS:
    if p.exists():
        DATA_DIR = p
        break

print("DATA_DIR:", DATA_DIR)


## 2. Data Loader (ARC-AGI JSON)
Each task is a JSON with `train` and `test` lists. We’ll use the **train** pairs to validate functions.

- Each grid is a 2D list of ints (colors 0–9).
- Your solver signature must be `def p(g): ...` returning a **grid** of same shape as expected output.


In [None]:
def load_all_tasks(base: Path):
    """
    Load all task JSONs under `base`. Returns a dict: {task_id: {'train': [...], 'test': [...]}}.
    """
    tasks = {}
    if base is None:
        return tasks
    # Common ARC layouts: tasks/ and/or json/; adjust if needed.
    candidates = [base, base/"tasks", base/"json", base/"arc"]
    files = []
    for c in candidates:
        if c.exists():
            files += list(c.rglob("*.json"))
    for fp in sorted(files):
        try:
            obj = json.loads(fp.read_text())
        except Exception as e:
            try:
                obj = json.load(open(fp, "r"))
            except:
                continue
        tid = fp.stem
        tasks[tid] = obj
    return tasks

TASKS = load_all_tasks(DATA_DIR)
print("Loaded tasks:", len(TASKS))
if TASKS:
    sample_id = sorted(TASKS)[0]
    print("Example task id:", sample_id)


## 3. Evaluator & Byte-Count
We evaluate a candidate `p(g)` against all train pairs.  
We also count **source bytes** (UTF‑8) for leaderboard scoring: `max(1, 2500 - bytes)`.


In [None]:
def grids_equal(a, b):
    return a == b

def run_fn_on_grid(fn, g):
    return fn([row[:] for row in g])  # defensive copy

def evaluate_on_task(fn, task):
    ok = True
    for ex in task.get("train", []):
        pred = run_fn_on_grid(fn, ex["input"])
        if not grids_equal(pred, ex["output"]):
            ok = False
            break
    return ok

def byte_len_of_function(fn):
    import inspect
    src = inspect.getsource(fn)
    return len(src.encode("utf-8"))

def score_for_bytes(nbytes):
    return max(1, 2500 - nbytes)


## 4. Tiny Baseline Patterns
These are **starting points only**. Shorten aggressively.

> Tips for code golf on ARC:
> - Prefer single-letter names; avoid spaces/newlines where legal (but readable here for clarity).
> - Collapse loops and conditions; use truthy math and tuple-indexing tricks.
> - Avoid imports. Reuse Python builtins cleverly.
> - Consider **task-specific minimal solvers** rather than a universal one.


In [None]:
# Identity (returns grid unchanged) — tiny but rarely correct.
def id0(g):
    return g

# Constant fill with the most frequent color in the input.
def fill_mode(g):
    from collections import Counter
    flat = sum(g, [])
    m = Counter(flat).most_common(1)[0][0]
    return [[m]*len(g[0]) for _ in g]

# Remap one color to another if it appears (simple color-map example).
def map51(g):
    # replace color 5 with 1
    return [[(1 if v==5 else v) for v in row] for row in g]


## 5. Solution Registry
Map **task IDs → functions**.  
Start with defaults, then override specific tasks as you discover short solutions.


In [None]:
# Global default (fall back)
DEFAULT_FN = id0

# Override examples; replace 'task001' with real ids (e.g., hashes) in the dataset.
SOLUTION_REGISTRY = {
    # "task001": fill_mode,
    # "task002": map51,
}

def get_fn_for_task(tid):
    return SOLUTION_REGISTRY.get(tid, DEFAULT_FN)


## 6. Quick Validation on a Few Tasks
Pick some IDs and check if your current functions pass training pairs.  
(They likely won’t; iterate to improve!)


In [None]:
def validate_some(n=5):
    tids = list(sorted(TASKS))[:n]
    rows = []
    for tid in tids:
        fn = get_fn_for_task(tid)
        ok = evaluate_on_task(fn, TASKS[tid])
        rows.append((tid, ok, byte_len_of_function(fn)))
    return rows

try:
    print(validate_some(5))
except Exception as e:
    print("Validation skipped (no JSONs found yet). Error:", e)


## 7. Submission Writer
Creates `submission.zip` with files: `task001.py ... task400.py` (or your dataset’s ids).

**Rules enforced:**
- Each file defines exactly one function: `def p(g): ...`
- Only stdlib used, no cross-file imports
- The file content is **your tiny solution** for that task


In [None]:
def to_minified_source(fn):
    """
    Convert a Python function object to a tiny single-file source defining `p(g)`.
    You can make this much shorter by hand-golfing.
    """
    import inspect, textwrap, re
    src = inspect.getsource(fn)
    src = re.sub(r"def\s+\w+\(g\):", "def p(g):", src)
    src = textwrap.dedent(src)
    return src

def write_submission_zip(tasks: dict, out_zip="submission.zip"):
    out_path = Path(out_zip)
    with zipfile.ZipFile(out_path, "w", compression=zipfile.ZIP_DEFLATED) as z:
        for i, tid in enumerate(sorted(tasks)):
            fn = get_fn_for_task(tid)
            code = to_minified_source(fn)
            fname = f"task{(i+1):03d}.py"
            z.writestr(fname, code)
    print("Wrote:", out_path, "size=", out_path.stat().st_size, "bytes")

def write_blank_400(out_zip="submission.zip"):
    with zipfile.ZipFile(out_zip, "w", compression=zipfile.ZIP_DEFLATED) as z:
        for i in range(1,401):
            z.writestr(f"task{i:03d}.py", "def p(g):\n return g\n")
    print("Wrote:", out_zip, "size=", Path(out_zip).stat().st_size, "bytes")


## 8. Build Submission
- If you have the dataset loaded in `TASKS`, call `write_submission_zip(TASKS)`.
- If not yet, create a **blank 400-file scaffold** and edit later.


In [None]:
# Try to build from loaded tasks; otherwise write a blank scaffold
if TASKS:
    write_submission_zip(TASKS, "submission.zip")
else:
    write_blank_400("submission.zip")


## 9. Next Steps — How to Improve
- Replace baselines with **task-specific** minimal functions.
- Use a **notebook cell per cluster of tasks** (e.g., symmetry, border, color fill).
- Once a function works for a task’s train pairs, **golf the bytes**:
  - Shorten names, remove spaces/semicolons judiciously.
  - Replace `if/else` with arithmetic or tuple indexing.
  - Inline constants and compress loops/list comps.
- Keep a tiny **fuzzer** for robustness (random shapes/colors).
- Rebuild `submission.zip` whenever you update the registry.
