Transformer Causal Analysis

causal mediation analysis for transformer attention mechanisms in in-context learning and theory of mind reasoning.

quick start

# basic causal analysis
python causal_analysis.py --model_type "8B" --prompt_num 100 --base_rule "ABA"

# theory of mind analysis  
python codebase/tasks/identity_rules/cma.py \
  --use_tom_prompts \
  --context_type "abstract" \
  --base_rule "ABA" \
  --prompt_num 50 \
  --model_type "Llama-3.2-1B"

# behavioral evaluation (no mechanistic analysis)
python behavioral/behavioral_eval.py --config_type full_comparison

architecture

causal_analysis.py: main activation patching for rule learning (ABA vs ABB patterns)
codebase/tasks/identity_rules/cma.py: refactored modular cma pipeline
behavioral/: pure behavioral evaluation with wandb tracking
codebase/: integrated LLMSymbMech framework for symbolic processing analysis

key experiments

activation patching: systematically patch activations between prompt conditions to identify causal components for rule following

theory of mind: adapted cma framework from token patterns to false belief reasoning - identifies belief tracking vs location retrieval heads

behavioral evaluation: temperature sweeps across models for pure tom performance without mechanistic overhead

models supported

llama-3.1 (8B, 70B, instruct variants), qwen2.5 (7B-72B), gpt-2, gemma-2. automatic embedding resize handles tokenizer mismatches.

setup

get hf token: https://huggingface.co/settings/tokens
export HF_TOKEN="your_token"
pip install transformers transformer_lens torch wandb

output

results saved as heatmaps showing causal importance by layer×head. raw tensors and prompt files included for replication.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
cma_results		cma_results
codebase		codebase
slurm_scripts		slurm_scripts
tom_datasets		tom_datasets
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
analyze_concrete_examples.py		analyze_concrete_examples.py
prompt_cases_summary.md		prompt_cases_summary.md
simple_copy_to_hpc.bat		simple_copy_to_hpc.bat
test_abstract_context_examples.py		test_abstract_context_examples.py
test_behavioral_prompts.py		test_behavioral_prompts.py
test_cma_integration.py		test_cma_integration.py
test_prompt_variants.py		test_prompt_variants.py
test_simple_abstract_context.py		test_simple_abstract_context.py
test_tom_fix.py		test_tom_fix.py
test_wandb_heatmap.py		test_wandb_heatmap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Causal Analysis

quick start

architecture

key experiments

models supported

setup

output

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

idilks/modularSymbTom

Folders and files

Latest commit

History

Repository files navigation

Transformer Causal Analysis

quick start

architecture

key experiments

models supported

setup

output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages