Hassana Labs — Leon Chlon (lc574@cantab.ac.uk)
LoRA-free forward-pass fine-tuning for Hugging Face causal language models.
ntkmirror learns a small signed controller on top of a frozen Transformer. It
adds no LoRA modules and makes no permanent weight edits. The controller is a
sparse set of shared log-gates on decoder-layer output channels:
h'_{layer, token, channel} = exp(s_{layer, channel}) h_{layer, token, channel}
The gates are learned from teacher-forced examples and then attached to the same Hugging Face model during evaluation or generation.
git clone https://github.com/leochlon/ntkmirror.git
cd ntkmirror
pip install -e .By default the CLI loads Hugging Face models with trust_remote_code=False. Use
--trust-remote-code only for repositories you trust, and pin reproducible
experiments with --revision <commit-sha-or-tag> and, when needed,
--tokenizer-revision <commit-sha-or-tag>.
Create train.jsonl:
{"prompt":"Question: 14 + 27 = ?\nAnswer:","completion":" 41"}
{"prompt":"Question: 36 + 18 = ?\nAnswer:","completion":" 54"}Fit a controller:
ntkmirror fit \
--model Qwen/Qwen2.5-0.5B-Instruct \
--train train.jsonl \
--out controller.ptEvaluate it:
ntkmirror eval \
--model Qwen/Qwen2.5-0.5B-Instruct \
--controller controller.pt \
--eval eval.jsonlGenerate with it:
ntkmirror generate \
--model Qwen/Qwen2.5-0.5B-Instruct \
--controller controller.pt \
--prompt "Question: 47 + 36 = ?\nAnswer:"pip install -e .
bash examples/run_demo.shFor a smaller run:
GATES=512 STEPS=40 bash examples/run_demo.shfrom transformers import AutoModelForCausalLM, AutoTokenizer
from ntkmirror import ForwardFineTuner, load_jsonl_examples
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto").cuda()
tuner = ForwardFineTuner(model, tokenizer, gates=5000)
tuner.fit(load_jsonl_examples("train.jsonl"), steps=240)
tuner.save("controller.pt")
print(tuner.generate("Question: 47 + 36 = ?\nAnswer:"))Preferred JSONL schema:
{"prompt":"...context...","completion":"...teacher-forced target..."}Also accepted:
{"instruction":"...","response":"..."}
{"question":"...","answer":"..."}
{"text":"..."}ntkmirror trains on explicit supervised targets. Raw prompt / completion
rows are used as written. Chat rows with messages are also accepted; by
default only the final assistant message is supervised, and --chat-template auto uses the tokenizer chat template when one is available. Use
--chat-template none for the transparent role-prefixed fallback serializer.
| Option | Default | Meaning |
|---|---|---|
--gates |
5000 |
number of layer-channel log-gates |
--steps |
240 |
AdamW steps on gate parameters only |
--lr |
5e-3 |
controller learning rate |
--max-log-gate |
0.05 |
bound on each signed log-gate |
--layers |
all |
decoder layers to score and gate |
--score-batches |
16 |
batches used to select gates |
Controllers are saved in signed log-gate coordinates, so composition is simple: add the signed log-gates, clip to a safe budget, and attach the resulting controller. This is the activation-space analogue of adding task directions, except the addition happens in log-mask/mirror coordinates rather than LoRA weight space.
ntkmirror compose \
--controllers runs/gsm8k_controller.pt runs/mbpp_controller.pt \
--out runs/gsm8k_plus_mbpp.pt \
--report runs/composition_report.json
ntkmirror inspect \
--controllers runs/gsm8k_controller.pt runs/mbpp_controller.pt runs/gsm8k_plus_mbpp.ptA disjoint-task runner is included:
pip install -e '.[datasets]'
bash scripts/run_disjoint_composition.shIt builds GSM8K and MBPP JSONL subsets, fits one controller per task, composes
them, and evaluates base / task-A / task-B / composed controllers on both eval
sets. See docs/composability.md.
The v2 operating layer adds request-scoped controller runtime isolation,
controller linting/cards, model compatibility doctor reports, validation and
retain-data training hygiene, safer composition planning, and memory namespaces
with versioning / soft delete / rollback / audit. See
docs/v2_operating_layer.md.
Useful commands:
ntkmirror doctor --model Qwen/Qwen2.5-0.5B-Instruct --out doctor.json
ntkmirror lint --controller controller.pt --require-revision --out lint.json
ntkmirror card --controller controller.pt --out controller.card.md
ntkmirror compose-plan --controllers a.pt b.pt --out plan.json
ntkmirror memory audit --store runs/memory --out memory_audit.jsonV2 adds a hallucination detector: an evidence-support verifier that scores
whether a claim is grounded in its evidence. It reports canonical verifier
probability, order-marginalized
probability, an ISR dispersion-penalized score, and, when the optional legacy
kv_delta_bayes_ntk backend is available, a closed-form NTK KV order-debias
score. See docs/isr_kv_verifier.md.
ntkmirror isr-auc \
--model Qwen/Qwen2.5-0.5B-Instruct \
--dataset vitaminc \
--n 200 \
--num-orderings 6 \
--out runs/isr_vitaminc.jsonScore your own claims with --data-jsonl. Each line is one JSON object: a
claim, its evidence (a spans list or a single evidence string), and the
gold label (supported true/false, or label supported/refuted).
{"claim": "The answer is Paris.", "spans": ["Paris is the capital of France."], "supported": true}
{"claim": "The answer is Rome.", "evidence": "Paris is the capital of France.", "label": "refuted"}ntkmirror isr-auc \
--model Qwen/Qwen2.5-0.5B-Instruct \
--data-jsonl claims.jsonl \
--out runs/isr_custom.jsonA memory item can be stored as a controller: one controller per conversation,
document, user preference, task style, or procedure. At inference time,
ntkmirror retrieves relevant items, composes their signed log-gates, and
attaches the composed controller before generation. This biases the forward pass
without appending memory text to the prompt. Treat it as a behavioral/style or
procedure-control mechanism, not as a substitute for factual retrieval, source
provenance, or RAG when factual grounding matters. By default zero-score memory
retrievals are treated as no-hit rather than attaching arbitrary controllers.
Fit-and-store a memory controller:
ntkmirror memory add \
--model Qwen/Qwen2.5-0.5B-Instruct \
--store runs/memory \
--id arithmetic-carrying \
--train examples/math_train.jsonl \
--text "worked addition arithmetic with carrying" \
--tags math,arithmeticOr register an existing controller:
ntkmirror memory add \
--store runs/memory \
--id arithmetic-carrying \
--controller runs/arithmetic.pt \
--text "two-digit addition with carrying: add ones, carry, then tens"Retrieve, compose, and generate:
ntkmirror memory search \
--store runs/memory \
--query "solve an addition problem with carrying"
ntkmirror memory generate \
--model Qwen/Qwen2.5-0.5B-Instruct \
--store runs/memory \
--query "addition with carrying" \
--prompt "Problem: 47 + 36 = ?\nSolution:"Try the demo:
bash examples/run_memory_demo.shThe default retriever is a dependency-free lexical TF-IDF scorer. That is
intentional for first-run UX: the main bottleneck in controller memory is
retrieval quality, not controller storage. For production, replace the retriever
with an embedding or hybrid vector-store layer, enforce provenance policies, and
keep the same compose_states interface. Controller stores are a trust boundary:
load controllers only from trusted stores, because stale, poisoned, or
checkpoint-incompatible controllers can silently degrade model behavior. See
docs/persistent_memory.md.
The fit command trains signed log-gates by support NLL and remains the
deployable path. A separate research path adds diagnostics and a field-locked
fitting harness for the stricter NTK-dual claim: the local activation-control
tangent
B_C(s) = d(P_C z(s)) / ds
should realise the full frozen-model weight-SGD projected-logit field
d_C^theta = -eta J_{theta,C} J_{theta,S}^T g_S .
Bv is an exact autograd JVP, B^T y is an exact VJP, and the CG operator is
B M^{-1} B^T + ridge I. Reports include adjoint_error, symmetry_error,
range_residual, the unconstrained forward realized_residual, and the
box-constrained clipped_realized_residual. field_residual is the
safety-facing clipped forward residual, not the same local matvec used inside
the solve.
Audit whether a selected gate basis can realise the full-weight field:
ntkmirror dual-diagnose \
--model Qwen/Qwen2.5-0.5B-Instruct \
--support train.jsonl \
--calibration eval.jsonl \
--controller controller.pt \
--projection topk --top-k 32 \
--target-step-size 1e-5 \
--jvp-mode exact \
--metric activationFit pathwise by matching the full-weight NTK field instead of using support-Adam:
ntkmirror fit-dual \
--model Qwen/Qwen2.5-0.5B-Instruct \
--train train.jsonl \
--out controller_dual.pt \
--steps 8 \
--projection topk --top-k 32 \
--jvp-mode exact \
--metric activationCheck whether a finite controller has left the initial gate tangent:
ntkmirror secant \
--model Qwen/Qwen2.5-0.5B-Instruct \
--controller controller.pt \
--eval eval.jsonlThe important numbers are range_residual and realized_residual, not raw
gate norm. A large secant error only says the initial gate chart is no longer
a global linear model; it does not by itself refute pathwise NTK duality. See
docs/activation_control_ntk.md for the theory, command details, and failure
mode checklist.
A safe diffusion scale-gate runner is also included:
python scripts/diffusion/train_scale_gate_adam_m.py \
--image-dir images \
--prompts "a photo of sks dog" \
--out runs/diffusion_scale_gates.pt \
--steps 1500It uses Adam with a step-adaptive activation metric and cosh self-damping, and
represents channel pruning with finite q_prune hard-dead masks, separate
q/shift caps, and non-finite guards.
The default UX remains the simple deployable support-Adam package. The
diagnostic and field-locked commands expose a research harness for NTK-vector
diagnostics and field-locked local updates; they are slower than fit and are
not the default first-run path.
Always report the base model, controller, LoRA/SFT baseline, random-gate
control, wrong-memory control, and no-retrieval fallback on the same train/eval
manifest. For exact-answer tasks, report exact accuracy and teacher-forced NLL.
For multiple-choice tasks, prefer length-normalized choice NLL and also disclose
summed-loss accuracy. For system claims, report adaptation time, peak memory,
model revision, tokenizer revision, dataset hashes, and retrieval settings. See
docs/method.md for failure modes.
@software{chlon2026ntkmirror,
author = {Leon Chlon},
title = {{NTK-Mirror: LoRA-free forward-pass fine-tuning via signed log-mask controllers}},
year = {2026},
organization = {Hassana Labs},
url = {https://github.com/leochlon/ntkmirror}
}MIT © 2026 Hassana Labs — Leon Chlon.