Skip to content

Reproductions of some of the graphs presented in Toy Models of Superposition from Transformer Circuits

Notifications You must be signed in to change notification settings

jtf9808/Superposition-in-Non-linear-Neural-Networks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Superposition in Non-linear Neural Networks

This repository contains reproductions of some of the graphs presented in Toy Models of Superposition showing how smaller non-linear neural networks (networks with non-linear activation functions) can exhibit superposition to represent/compress larger neural networks.

I was interested to see how the representations form, especially after seeing Extracting Interpretable Features from Claude 3 Sonnet, so have extended these graphs to show how they change during training.

I have only tested the code with python3.11.

Non-linear neural network layers can represent more features than they have neurons through a process called superposition provided these features are sparse. The following graphs show how 5 dimensional features are mapped into a 2-dimensional nn layer for different sparsities.

For more explanation on the following graphs see here.

linear_network_sparsity_0.00_plots.gif non_linear_network_sparsity_0.00_plots.gif

linear_network_sparsity_0.25_plots.gif non_linear_network_sparsity_0.25_plots.gif

linear_network_sparsity_0.50_plots.gif non_linear_network_sparsity_0.50_plots.gif

linear_network_sparsity_0.80_plots.gif non_linear_network_sparsity_0.80_plots.gif

linear_network_sparsity_0.90_plots.gif non_linear_network_sparsity_0.90_plots.gif

linear_network_sparsity_0.95_plots.gif non_linear_network_sparsity_0.95_plots.gif

About

Reproductions of some of the graphs presented in Toy Models of Superposition from Transformer Circuits

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages