Skip to content

[ACL 2024] Official PyTorch implementation of "Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task"

Notifications You must be signed in to change notification settings

jannik-brinkmann/backward-chaining-circuits

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Mechanistic Analysis of a Transformer Trained on Symbolic Multi-Step Reasoning Task

This is the official implementation of A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task.

Figure 1: Given an input prompt, the model concatenates edge tokens in a single token position (A), and copies the goal node into the final token position (B). Then, the next step is identified by applying an iterative algorithm that climbs the tree one level per layer (C).

Usage

1. Dependencies

To install dependencies:

conda env update --file environment.yml

2. Training and Evaluation Code

To train a model from scratch or continue the training, use training.py. We provide functions that have been used for aanay in src/utils.py.

3. Pre-trained Model

The model checkpoint we studied in our work is provided in model.pt.

4. Replication of Results

The notebook figures.ipynb replicates all figures we report in our paper.

Citation Information

BibTeX citation:

@misc{brinkmann2024mechanistic,
      title={A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task}, 
      author={Jannik Brinkmann and Abhay Sheshadri and Victor Levoso and Paul Swoboda and Christian Bartelt},
      year={2024},
      eprint={2402.11917},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

[ACL 2024] Official PyTorch implementation of "Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.7%
  • Python 3.3%