# Sanitized reproduction (no Kaggle account)
This notebook is a clean, self‑contained reproduction that **does not require Kaggle credentials**. It downloads the official ARC datasets from GitHub, constructs a training/eval dataset that excludes eval solutions, trains the model from scratch, and runs inference on eval inputs only.

## Why this proves there is no leakage
- The dataset is built from ARC‑1 training pairs (inputs + outputs), ARC‑1 eval **inputs only**, and optional ConceptARC. Eval solutions are never constructed
- After building `assets/challenges_dihedral_both.json`, every file (other than core model files) is deleted, so there’s nowhere for solutions to hide.
- Training starts from scratch (`checkpoint_path=None`) and uses only the cleaned dataset file.
- Inference runs with `SOLUTIONS_PRESENT=False` and produces `submission.json` without reading any solutions.
- If you want to be extra strict, delete the final visualization cell and score the submission yourself.

FYI:
- The core files do not make any calls to the internet, or use extra libraries (written in pure pytorch), and there's no loading of checkpoints

## What to change
- Pick a runconfig (with or without ConceptARC) in the config cell below.
- Reduce `EVAL_BATCH_SIZE` if your GPU is smaller than A100.


Steps:
- Upload this on colab/modal
    - No need to mount your google drive or modal volume
- Decide which experiment you want to reproduce (list below) and modify config accordingly
- Choose A100
- hit run all

The original run used an extra dataset called conceptARC. This dataset is clean. But to reduce the burden of verification, I add a config where this dataset is not used (performance reduces a bit)

Configs
- Run with ConceptARC for 11 epochs (10 color augments) - about 8-9%
- Run with ConceptARC for 101 epochs (100 color augments) - 25-28%
- Run without ConceptARC for 11 epochs (10 color augments) - about 7%
- Run without ConceptARC for 101 epochs (100 color augments) - about 23%

In [None]:
# Choose runconfig
runconfig = ["concept", 11]  # expect 8-9%
# runconfig = ["concept", 21]  # expect 16%
# runconfig = ["concept", 101] # expect 25-28%

# This runconfig has less verification burden by removing exrtra dataset, but perf drops slightly
# runconfig = ["no_concept", 11] # expect 7%
# runconfig = ["no_concept", 11] # expect 14%
# runconfig = ["no_concept", 101] # expect 23%

In [None]:
root_folder = "root"
# root_folder = "content" # for colab

%cd /$root_folder/
!git clone https://github.com/mvakde/mdlARC.git # `-b <branch_name> --single-branch` if branch
%cd /$root_folder/mdlARC

In [None]:
if runconfig[0] == "concept":
    !python dataset_building_scripts/build_datasets.py --datasets arc1 conceptarc  --splits train eval --cleanup none
else:
    !python dataset_building_scripts/build_datasets.py --datasets arc1  --splits train eval --cleanup none
!python dataset_building_scripts/augment_dataset_dihedral.py

# Delete all files, especially solutions
!find "assets" -mindepth 1 ! -path "assets/challenges_dihedral_both.json" -exec rm -rf -- {} +
!rm -rf /$root_folder/mdlARC/run-script.ipynb
!rm -rf /$root_folder/mdlARC/sanitised-env-run-script.ipynb
!rm -rf /$root_folder/mdlARC/dataset_building_scripts
!rm -rf /$root_folder/mdlARC/readme.md
!rm -rf /$root_folder/mdlARC/img

## Data is now “solution‑free”
At this point, the only dataset file we keep is `assets/challenges_dihedral_both.json`.
- It contains **train inputs/outputs** and **eval inputs only**.
- All other files and folders (including anything that could contain solutions) are deleted.

You can inspect this file before continuing if you want to verify it manually.


In [None]:
from pathlib import Path
import argparse
import importlib
import sys

PROJECT_ROOT = Path.cwd()
SRC_DIR = PROJECT_ROOT / "src"
if SRC_DIR.exists() and str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

import utils, tinytransformer, train

importlib.reload(utils)  # pick up code changes during iteration
importlib.reload(tinytransformer)
importlib.reload(train)

args = {
    # run config
    "num_workers": 0,
    "device": "cuda",  # 'cuda' | 'mps' | 'cpu'
    "do_validate": False,
    "name": "arc1-cleanenv-30M-vvwide-bs32-101ep-100color-ccdb-18dec0430",  # download file name
    "GPU": "A100-noaugreg",  # just for logging purposes
    # paths - must pass as Path("<path_to_dir>")
    "train_log_file": Path("runs/training_log.txt"),
    "save_path": Path("runs/tiny.pt"),
    "checkpoint_path": None,  # Path("runs/tiny.pt"),  # or None to start from scratch
    "data_path": Path("assets/challenges_dihedral_both.json"),
    # hyperparameters
    "epochs": runconfig[1],
    "batch_size": 32,
    "val_batch_size": 300,
    "enable_color_aug_train": True,
    "max_color_augments_train": (runconfig[1] - 1),
    "color_aug_seed": 42,
    "lr": 3e-4,
    "weight_decay": 0.01,
    "grad_clip": 1.0,
    "dropout": 0.1,
    "seed": 42,
    # Model Architecture
    "d_model": 768,  # 128, 256, 512, 768 | 128, 384, 640
    "n_heads": 12,  # 4, 8, 8/16, 12 | 4, 12, 10
    "d_ff": 3072,  # 512, 1024, 2048, 3072 | 512, 1536, 2560
    "n_layers": 4,  # 4, 6, 16, 16 | 24, 28, 24
    # Visibility toggles
    "log_train_strings": False,
    "log_train_limit": 10,
    "log_inference_prompt": False,
    "inference_temperature": None,
    "inference_top_k": None,
}
cfg = argparse.Namespace(**args)

runs_dir = Path("runs")
runs_dir.mkdir(parents=True, exist_ok=True)
with (runs_dir / "config.txt").open("w") as f:
    for k, v in args.items():
        f.write(f"{k}: {v}\n")

model, dataset, dataloader, device, data_path = train.build_model_and_data(cfg)

In [None]:
# Training only

from time import perf_counter

t_start = perf_counter()

# ---
# direct
train.train_model(
    cfg,
    model=model,
    dataloader=dataloader,
    dataset=dataset,
    device=device,
    data_path=data_path,
)


# # periodic checkpointing
# cfg.save_path = Path(f"runs/tiny-{cfg.epochs}.pt")
# for i in range(3):
#   if i != 0:
#     cfg.checkpoint_path = cfg.save_path
#     cfg.save_path = Path(f"runs/tiny-{cfg.epochs*(i+1)}.pt")
#   train.train_model(cfg, model=model, dataloader=dataloader, dataset=dataset, device=device, data_path=data_path)
# ---

t_duration = perf_counter() - t_start
print(f"Training took {t_duration:.2f}s")

with open(Path("runs/timing.txt"), "w") as f:
    f.write(f"Training: {t_duration:.4f} s\n")

In [None]:
# cleaning up memory to run inference
utils.cleanup_memory(globals())


## Submission is generated without solutions
The next cells run inference and generate `runs/<run_name>/submission.json` with **no access to eval solutions**.  
Everything stays solution‑free unless you choose to add your own scoring later.

In [None]:
from pathlib import Path
import importlib
import evaluations
import utils

importlib.reload(evaluations)
importlib.reload(utils)

PATH_BOTH = Path("assets/challenges_dihedral_both.json")

EVAL_CONFIGS = [("eval", runconfig[1] - 1, PATH_BOTH)]

EVAL_BATCH_SIZE = 1300
SPLITS = ["test"]
CHECKPOINT_PATH = Path("runs/tiny.pt")
SOLUTIONS_PRESENT = False
EVAL_TASK_IDS = None  # Set to None to evaluate full dataset, or ["00576224", ...] for specific tasks
LOG_CORRECT_GRIDS = False  # Print the actual grid, IDs, and augmentation indices for fully correct grids

evaluations.run_evaluation_configs(
    cfg,
    EVAL_CONFIGS,
    eval_batch_size=EVAL_BATCH_SIZE,
    splits=SPLITS,
    checkpoint_path=CHECKPOINT_PATH,
    include_targets=SOLUTIONS_PRESENT,
    task_ids=EVAL_TASK_IDS,
    log_correct_grids=LOG_CORRECT_GRIDS,
)


In [None]:
# visualisation
EVAL_SUB_FOLDER = EVAL_CONFIGS[0][0]
VIS_MODE = "submission"  # "!" = compare vs solutions, "submission" = attempts-only
utils.visualize_eval_submissions(EVAL_SUB_FOLDER, mode=VIS_MODE)


## Stop here for a strict no‑solutions run
At this point, the model has already produced `runs/<run_name>/submission.json` without any access to solutions. **This means the run is clean!**.

**The next 2 cells score the submission and visualise differences with ground truth. This requires downloading the solutions**

If you want a strict no‑solutions audit, stop here and score the submission yourself manually

In [None]:
%cd /$root_folder/mdlARC
!rm -rf /$root_folder/mdlARC/assets/
!python dataset_building_scripts/build_datasets.py --datasets arc1  --splits eval --cleanup none --with-solutions

In [None]:
from pathlib import Path
import utils

SOLUTIONS_FILE = Path("assets/solutions.json")
SUBMISSION_FILE = Path(f"runs/{EVAL_SUB_FOLDER}/submission.json")

utils.score_arc_submission(SOLUTIONS_FILE, SUBMISSION_FILE)

In [None]:
# Visualise and compare the differences between the ground truth solutions and the correct answers
EVAL_SUB_FOLDER = EVAL_CONFIGS[0][0]
utils.visualize_eval_submissions(
    EVAL_SUB_FOLDER,
    submission_base="mdlARC/runs",
    solutions_file="asolutions.json",
    mode="compare",
)