This repository contains the code to reproduce the experiments performed in the Dynamical Mean-Field Theory of Self-Attention Neural Networks article.
Transformer-based models have demonstrated exceptional performance across diverse domains, becoming the state-of-the-art solution for addressing sequential machine learning problems. Even though we have a general understanding of the fundamental components in the transformer architecture, little is known about how they operate or what are their expected dynamics. Recently, there has been an increasing interest in exploring the relationship between attention mechanisms and Hopfield networks, promising to shed light on the statistical physics of transformer networks. However, to date, the dynamical regimes of transformer-like models have not been studied in depth. In this paper, we address this gap by using methods for the study of asymmetric Hopfield networks in nonequilibrium regimes --namely path integral methods over generating functionals, yielding dynamics governed by concurrent mean-field variables. Assuming 1-bit tokens and weights, we derive analytical approximations for the behavior of large self-attention neural networks coupled to a softmax output, which become exact in the large limit size. Our findings reveal nontrivial dynamical phenomena, including nonequilibrium phase transitions associated with chaotic bifurcations, even for very simple configurations with a few encoded features and a very short context window. Finally, we discuss the potential of our analytic approach to improve our understanding of the inner workings of transformer models, potentially reducing computational training costs and enhancing model interpretability.
In order to install to run the scripts, you must first create and install the requirements in an environment. In order to do so follow the following steps:
- Create environment if you do not have one:
python -m venv venv
. - Load the environment:
source venv/bin/activate
. - Install the requirements:
python -m pip install -r requirements.txt
.
To reproduce the bifurcation diagrams in a consumer set-up. Follow the following steps.
- First, create or use a config file. You can use templates located in
cfgs
folder. Usingcfgs/bif_diagram_inf_0_zoom-in.yaml
you can replicate the zoomed-in version of the diagram in the paper. Usingcfgs/bif_diagram_inf_0.yaml
you can replicate the other diagram. Then, follow one of steps 2 or 3. - Configure the bottom part
bifurcation_diagrams.py
and dopython bifurcation_diagrams.py
. - Configure
slurm_parallel/launcher_bifurcation_diagrams_inf_ctx_run_localtest.sh
and doslurm_parallel/source launcher_bifurcation_diagrams_inf_ctx_run_localtest.sh
If you want to make use of HPC capabilities and compute experiments in parallel using Slurm, first follow step one and then configure and run launcher_bifurcation_diagrams_inf_ctx_run_zoom-in.sh
or launcher_bifurcation_diagrams_inf_ctx_run.sh
located in slurm_parallel
.
Then, to collect the results configure and run slurm_parallel/launcher_bifurcation_diagrams_inf_ctx_plot.sh
.
The calculation of the bifurcation diagrams is very costly computationally. If you want to examine the trajectories presented in the paper, do python simulate_mf_trajectory_inf_paper_3pats.py
.
Scripts under the pending_to_update
folder are older scripts that are planned to be updated and refactorized in the future.