# Foxtrot-Core · Interactive *Bitstream Learning* Tutorial

A comprehensive, hands-on walkthrough of the **BitLearn** workflow.

> **Setup (do this first)**
> **Required:** install **foxtrot-core with the _analysis_ extra**, plus a TensorFlow backend  
> • **CPU (portable, recommended default):**  
>   `pip install -U "foxtrot-core[analysis,cpu]"`
>
> • **GPU (Linux/Windows with CUDA):**  
>   `pip install -U "foxtrot-core[analysis,gpu]"`
>
> This brings in the data-science stack used by `bitlearn.*` (NumPy, pandas, scikit-learn, ipywidgets, ipympl, matplotlib, etc.) **and** installs TensorFlow (via `cpu` or `gpu`) for training and inference.  

The practical steps:

1. **Configure** an experiment (titles, feature dims, labels)  
2. **Pre-process** raw datasets into `.npz`  
3. **Train** single episodes and parameter **sweeps**  
4. **Evaluate** models (single label, run, or whole sweep) and inspect detailed metrics



## How datasets are provided

The raw training datasets (≈ 22 GB total) live in a **sub‑module** backed by the public Hugging Face Hub. They are **not** downloaded on a plain `git clone`. Fetch them only when you need to train or evaluate Bit‑Learn models.

You can obtain the data in **either** of two ways:

- **Via the `prjfoxtrot` repo submodule** (recommended for the VS Code extension workflow)
- **Directly from Hugging Face** and copy into your workspace

**Expected location (relative to this notebook):** `data/raw/`



## 0 Environment & Paths

This cell defines common paths and prints your environment so you can quickly verify things are in place.


In [None]:
# │ codecell 0 │
from pathlib import Path
import sys

ROOT = Path.cwd()
RAW_DIR = ROOT / "data" / "raw"
PRE_DIR = ROOT / "data" / "preprocessed"
EXP_DIR = ROOT / "experiments"
for d in (RAW_DIR, PRE_DIR, EXP_DIR):
    d.mkdir(parents=True, exist_ok=True)

print("Python:", sys.version.split()[0])
try:
    import tensorflow as tf
    print("TensorFlow:", tf.__version__)
except Exception as e:
    print("TensorFlow not importable:", e)

print("ROOT:", ROOT)
print("RAW_DIR exists:", RAW_DIR.exists(), "→", RAW_DIR)
print("PRE_DIR exists:", PRE_DIR.exists(), "→", PRE_DIR)
print("EXP_DIR exists:", EXP_DIR.exists(), "→", EXP_DIR)



## 1 Get the datasets

### Option A — Use the `prjfoxtrot` repo submodule (VS Code extension workflow)

From a shell **in your repo root**:

```bash
# 1) Clone code only (fast)
git clone https://github.com/prjfoxtrot/prjfoxtrot.git
cd prjfoxtrot

# 2) Bring the dataset submodule into your working tree (pointer files only)
git submodule update --init \
  default-workspace/projects/bitlearn/0-template/data/raw

# 3) Download the large blobs via Git LFS (~22 GB total)
cd default-workspace/projects/bitlearn/0-template/data/raw
git lfs pull
```

The VS Code extension copies `default-workspace/` into each new workspace you create. The raw data will then be available under your workspace at:

```
projects/bitlearn/0-template/data/raw/
```
Copy the four files into this tutorial’s `data/raw/` folder (relative to this notebook) so the UI panels can find them.



### Option B — Download directly from Hugging Face

Dataset page: **`prjfoxtrot/prjfoxtrot-datasets`**

If you prefer Python, you can use `huggingface_hub` to fetch the files and place them under `data/raw/`:

> ⚠️ Internet access may be disabled in some environments. If this fails, use Option A or download the files manually and place them in `data/raw/`.


In [None]:
# │ codecell 1 │
from pathlib import Path
import shutil
from huggingface_hub import hf_hub_download

REPO = "prjfoxtrot/prjfoxtrot-datasets"
FILES = {
    "train": [
        "lut_dataset_SLICEM.json",
        "lut_dataset_SLICEL.json",
    ],
    "synthetic": [
        "synthetic_lut_dataset_SLICEM.7z",
        "synthetic_lut_dataset_SLICEL.7z",
    ],
}
FILES["all"] = FILES["train"] + FILES["synthetic"]

def fetch_bitlearn_raw(which="train", dest="data/raw"):
    dest = Path(dest); dest.mkdir(parents=True, exist_ok=True)
    names = FILES["all"] if which == "all" else FILES.get(which, [which])
    for name in names:
        src = hf_hub_download(REPO, filename=name, repo_type="dataset", resume_download=True)
        shutil.copy2(src, dest / name)
        print(dest / name)
    return dest

#fetch_bitlearn_raw("all") 


### Verify the four raw files

You should see these under `data/raw/`:

- `synthetic_lut_dataset_SLICEM.7z`
- `lut_dataset_SLICEL.json`
- `lut_dataset_SLICEM.json`
- `synthetic_lut_dataset_SLICEL.7z`


In [None]:
# │ codecell 2 │
from pathlib import Path
files = list((Path("data/raw")).glob("*"))
for p in sorted(files):
    try:
        sz = p.stat().st_size
        print(f"{p.name:40s}  {sz/1024/1024:8.1f} MB")
    except FileNotFoundError:
        print(p.name, "(missing)")



## 2 Configuration

Use the **Config** panel to set project‑wide defaults:

- **Experiment title** (used in run/sweep folder names)
- **Feature dimension**
- **Labels** (name → number of bits)

Click **Save** to write a `config.yaml` next to this notebook.

> The UI writes only the `dataset` section; other keys are preserved.


In [None]:
# │ codecell 3 │
from foxtrot_core.bitlearn.ui.panels import launch_config_panel

launch_config_panel(
    experiment_title="ARTIX7_SLICEM",
    feature_dim=256,
    labels={"LC1": 64, "LC2": 64, "LC3": 64, "LC4": 64},
)



## 3 Pre‑processing

Transform raw files under `data/raw/` into compact `.npz` files under `data/preprocessed/`.

- **Tag** becomes the suffix in `preprocessed_<tag>.npz` (e.g., `train`, `test`).
- `sample_n=0` means **use all rows**.

### 3.a Training split


In [None]:
# │ codecell 4 │
from foxtrot_core.bitlearn.ui.panels import launch_preprocess_panel

launch_preprocess_panel(
    tag="train",
    raw_file="data/raw/lut_dataset_SLICEM.json",
    sample_n=3000,   # 0 → all rows
)



### 3.b Test split


In [None]:
# │ codecell 5 │
from foxtrot_core.bitlearn.ui.panels import launch_preprocess_panel

launch_preprocess_panel(
    tag="test",
    raw_file="data/raw/synthetic_lut_dataset_SLICEM.7z",
    sample_n=100000,   # 0 → all rows
)



## 4 Training — episodes & sweeps

Open the **Train / Sweep** panel to launch single runs or Cartesian sweeps.

**Highlights**

- Dataset dropdown lists **all** `.npz` under `data/preprocessed/` (use **Refresh** to pick up new files)
- Enter comma‑separated values (e.g., `1e-3,5e-4`) or set‑literals (e.g., `{512,256}`) to sweep
- Per‑run snapshot: a full `config.yaml` is saved to each episode folder *before* training
- Folder naming:
  - Sweep root: `experiments/sweep_<experiment_title>_<STAMP>/`
  - Episode run: `experiments[/sweep_...]/run_<experiment_title>_<STAMP>_<TAG>/`

During training, the progress bar updates per epoch without overlapping text; logs wrap and don’t cause horizontal scrollbars.


In [None]:
# │ codecell 6 │
from foxtrot_core.bitlearn.ui.panels import launch_train_panel

launch_train_panel()   # UI defaults come from config.yaml



## 5 Evaluation

Open the **Evaluate** panel to compute metrics on a test set.

**Scopes**

- **Label** — evaluate a single trained label model
- **Run** — evaluate **all labels** in an episode
- **Sweep** — evaluate **all runs × all labels**

**Outputs per label** (under `<label_dir>/test/`):

- `metrics.json` — `accuracy`, `precision`, `recall`, `f1`, `tp`, `tn`, `fp`, `fn`, plus label‑error rate
- `errors.csv` — per‑sample mispredictions with `y_true` vs `y_pred` and bit deltas


In [None]:
# │ codecell 7 │
from foxtrot_core.bitlearn.ui.panels import launch_test_panel

launch_test_panel(
    test_npz="data/preprocessed/preprocessed_test.npz"
)



## 6 Programmatic API (optional)

You can script training and evaluation without the UI.

Below are minimal examples; see the source in `foxtrot_core.bitlearn.core.*` for full details.


In [None]:
# │ codecell 8 │
# Evaluate one trained label programmatically
from pathlib import Path
from foxtrot_core.bitlearn.core.evaluate import evaluate_episode

# Point to a label folder (contains model.keras) or its parent episode folder
# label_dir = Path("experiments/run_ARTIX7_SLICEM_.../LC1")
# result = evaluate_episode(label_dir, Path("data/preprocessed/preprocessed_test.npz"))
# print(result.metrics)


In [None]:
# │ codecell 9 │
# Train a single episode programmatically
# (UI does this for you; included here for completeness.)
from pathlib import Path
from foxtrot_core.bitlearn.core.config import TrainingConfig
from foxtrot_core.bitlearn.core.train import train_episode

# cfg = TrainingConfig(
#     dataset_npz=Path("data/preprocessed/preprocessed_train.npz"),
#     output_dir=Path("experiments/run_manual_example"),
#     epochs=10,
#     batch_size=64,
#     learning_rate=1e-3,
# )
# results = train_episode(cfg)
# for r in results:
#     print(r.label_name, r.metrics.get("val_accuracy", None))



## 7 Directory structure reference

After a run, you’ll see something like:

```
experiments/
└─ sweep_ARTIX7_SLICEM_20250810-123456/        # only if sweeping
   ├─ run_ARTIX7_SLICEM_20250810-123457_arch512x256_lr1e-3_bs64_ep50/
   │  ├─ LC1/
   │  │  ├─ model.keras
   │  │  ├─ history.npy
   │  │  ├─ history.png
   │  │  └─ test/
   │  │     ├─ metrics.json
   │  │     └─ errors.csv
   │  ├─ LC2/
   │  └─ ...
   └─ run_...
```



## 8 Troubleshooting & Tips

- **No datasets listed** in the panels  
  → Ensure the four raw files are under `data/raw/`. For evaluation, ensure test `.npz` is under `data/preprocessed/`.

- **Dropdowns don’t show new files**  
  → Use the **Refresh** button (Train panel) or re‑run the panel cell.
