Skip to content

piotrwilam/Atlas2x2

Repository files navigation

Atlas2x2

Code and analysis for the paper What, Where, and How: Disentangling the Roles of Task, Language, and Model in Code Model Representations.

A cross-model, cross-language mechanistic-interpretability study: we extract neural circuits for 58 testable Python + 57 testable Rust concepts in Qwen2.5-Coder-7B (28 layers) and DeepSeek-Coder-V1-6.7B (32 layers) — the minimum 2×2 design that lets us ask whether what earns dedicated circuitry, where it lives, and how it is processed is a property of the task, the language, or the model. The answer separates along three axes: what is task-determined and transfers across models; where and how are model-determined; how strongly a construct is represented is language-determined.

Quickstart

git clone https://github.com/piotrwilam/Atlas2x2.git
cd Atlas2x2
uv sync                                    # or: python -m venv .venv && pip install -e .
export ATLAS_DATA_ROOT=~/Data/Atlas2x2     # cache dir; missing files auto-fetch from HF

Verify the paper's locked numbers against the released artifacts:

pytest tests/test_paper_numbers.py -v

The first run will download the Tier-1 files (~4 MB) on demand into $ATLAS_DATA_ROOT. Subsequent runs use the local cache. Set ATLAS_HF_OFFLINE=1 to disable the fallback.

Regenerate any paper figure from its config:

python experiments/fig2_concept_scatter.py        --config-name paper/figure2_concept_scatter
python experiments/fig3_concept_fraction_profile.py --config-name paper/figure3_concept_fraction
python experiments/fig4_temporal_dynamics.py      --config-name paper/figure4_temporal_dynamics
python experiments/fig5_cross_language_sharing.py --config-name paper/figure5_cross_language_sharing
python experiments/fig6_cluster_test.py           --config-name paper/figure6_cluster_test
python experiments/fig7_dissociation.py           --config-name paper/figure7_dissociation

Each figure script writes a fresh timestamped output dir under results/, with PDF + PNG + a run_info.json recording the exact inputs and derived statistics.

Repository structure

The codebase is organised in three layers that mirror the paper's pipeline.

circuits/        # Layer 1 — artifact generation (GPU; frozen for v0.1.0)
    extraction/      forward-pass extraction of MLP outputs at last token
    binarisation/    ε-threshold; per-concept masks per (lang, model, ε)
    decomposition/   concept-only / shared / token-only against checker masks

atlas/           # Layer 2 — analysis library (CPU, fast, importable)
    io/              loaders for masks / aggregates / probe / dissociation /
                     flow-type / cross-language-sharing artifacts
    analysis/        Spearman, Jaccard, Ward linkage, permutation tests,
                     flow-type classifier, early-bias, cross-language sharing
    plotting/        paper / paper-wide / poster / slides style;
                     dendrogram, temporal-dynamics, group-coherence helpers
    paths.py         DATA_ROOT resolution

experiments/     # Layer 3 — one script per paper figure
    fig2_concept_scatter.py            (Figure 1)
    fig3_concept_fraction_profile.py   (Figure 2)
    fig4_temporal_dynamics.py          (Figure 3)
    fig5_cross_language_sharing.py     (Figure 4)
    fig6_cluster_test.py               (Figure 5 — dendrogram + perm test)
    fig7_dissociation.py               (Figure 7)
    figA1_circuit_size_by_flow_type.py (Figure 8 — 4-panel appendix)
    figA2_python_dendrogram.py         (Figures 9 + 10 — P×QW and P×DS)
    figA4_probe_accuracy.py            (Figure 11 left)
    figA4_jaccard_cosine.py            (Figure 11 right)

configs/         # Hydra configs — one per figure variant
tests/           # pytest; test_paper_numbers.py locks every paper claim
scripts/         # ops: promote figures to paper repo

Data

The frozen experimental artifacts — neuron-list masks at ε ∈ {0.001, 0.1, 0.5}, decomposition tables (concept-only / shared / token-only) per (lang, model) cell, flow-type assignments, cross-language sharing per-layer values, dissociation deltas, probe weights, Jaccard–cosine pairs — are released as a HuggingFace dataset. The Python loaders in atlas/io/ auto-fetch missing files from the Hub.

Direct download:

from huggingface_hub import hf_hub_download
hf_hub_download(
    repo_id="piotrwilam/Atlas2x2",
    repo_type="dataset",
    filename="7_E6_flow_type_assignments.xlsx",
)

See the dataset README for the full file schema.

Reproducing the paper

Every numerical claim is locked in tests/test_paper_numbers.py. With the dataset materialised at $DATA_ROOT:

pytest tests/test_paper_numbers.py -v   # 39 tests, all should pass

Locked claims include:

Section Claim
§3 58 testable Python + 57 testable Rust concepts
§4.2 Cross-model concept-fraction correlation: ρ = 0.638 (Py), 0.673 (Ru), p < 10⁻⁷
§4.4 / §6.1 Six atomicity concepts (Assert, Break, Continue, Import, ImportFrom, Pass) classified two_phase in Python × Qwen; flow-type agreement 88.6% (Py) / 85.1% (Ru)
§5.1 Rust/Python strength ratio: 2.91× (Qwen), 2.07× (DeepSeek)
§5.3 Cross-language sharing ratio: 1.94× (DS/QW); 7/7 pass DS, 6/7 pass QW
§6.2 Type-trait cluster cohesion: Jaccard 0.535, null 0.112, p < 0.001
§7.1 Double dissociation: 4 pass (Import, Try, While, Assert); Break fails
§7.2 Probe accuracy band 97.6–99.7%, Jaccard–cosine peak r = 0.645 at L20

Citation

@article{Wilam2026Atlas2x2,
  title  = {What, Where, and How: Disentangling the Roles of Task, Language, and Model in Code Model Representations},
  author = {Wilam, Piotr},
  year   = {2026}
}

About

Code for the Atlas2x2 paper — a cross-model, cross-language circuit atlas for code models. Companion to huggingface.co/datasets/piotrwilam/Atlas2x2

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors