Phase 0: Establish canonical paper-faithful baselines and lock reference metrics

## Purpose
Create a reproducible paper-faithful baseline package that all downstream tickets must compare against.

## Mandatory Reading (blocking)
Before writing code, the agent **must** read and summarize in a first issue comment:
- `reports/NL_IMPLEMENTATION_ORACLE.md` (sections: 3, 4.1-4.7, 5, 6.1-6.4)
- `reports/paper/NL-print.extracted.clean.txt` (Eq. 21-24, 28-31 and Table 1 text)
- `docs/PAPER_COMPLIANCE.md`

The first comment must include:
1. 5-10 bullet summary of the current implementation state.
2. Exact list of code paths used for baseline runs.
3. Confirmed gaps that this ticket does **not** change.

## Required Code Anchors
- `src/nested_learning/training.py`
- `src/nested_learning/model.py`
- `configs/pilot_paper_faithful.yaml`
- `configs/pilot_selfmod_paper_faithful.yaml`
- `scripts/eval/zeroshot.py`
- `scripts/eval/niah.py`
- `scripts/eval/continual.py`
- `scripts/eval/passkey.py`
- `scripts/eval/pg19_perplexity.py`

## Scope
- Define canonical baseline run commands for:
  - `hope_selfmod` paper-faithful
  - `hope_attention` comparison
- Produce frozen baseline artifact bundle with checksums + evals.
- Add `reports/` baseline table (short + explicit command provenance).

## Runbook
- `uv run python train.py --config-name pilot_selfmod_paper_faithful train.steps=2000`
- `uv run python train.py --config-name pilot_paper_faithful model.block_variant=hope_attention train.steps=2000`
- Run full eval suite for both checkpoints.

## Deliverables
- Baseline artifact bundle under `artifacts/`.
- Eval JSON set under `eval/` for both baselines.
- Report file documenting exact commands + hashes + metrics.

## Acceptance Criteria
- No NaN/Inf in training logs.
- All eval outputs exist for both baselines.
- `scripts/checkpoint/verify.py` passes on all published checkpoints.
- First issue comment includes mandatory reading summary.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 0: Establish canonical paper-faithful baselines and lock reference metrics #1

Purpose

Mandatory Reading (blocking)

Required Code Anchors

Scope

Runbook

Deliverables

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Phase 0: Establish canonical paper-faithful baselines and lock reference metrics #1

Description

Purpose

Mandatory Reading (blocking)

Required Code Anchors

Scope

Runbook

Deliverables

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions