DevInterp

A Python Library for Developmental Interpretability Research

DevInterp is a python library for conducting research on developmental interpretability, a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately controlling the development of structure over training.

Read more about developmental interpretability.

⚠️ This library is still in early development. Don't expect things to work on a first attempt. We are actively working on improving the library and adding new features. If you have any questions or suggestions, please feel free to open an issue or submit a pull request.

Installation

To install devinterp, simply run:

pip install devinterp

Requirements: Python 3.8 or higher.

Getting Started

To see DevInterp in action, check out our example notebooks:

Minimal Example

from devinterp.slt import estimate_learning_coeff, estimate_learning_coeff_with_summary
from devinterp.optim import SGLD

# Assuming you have a PyTorch Module and DataLoader
learning_coeff = estimate_learning_coeff(model, trainloader, ...)

# If you want to see mean, std, and learning coeff estimate per chain
learning_coeff_summary = estimate_learning_coeff_with_summary(model, trainloader, ...)

Features

Estimate the learning coefficient.
- Supported optimizers:
  - SGLD
  - SGNHT

Contributing

See CONTRIBUTING.md for guidelines on how to contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
docs		docs
examples		examples
src/devinterp		src/devinterp
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DevInterp

A Python Library for Developmental Interpretability Research

Installation

Getting Started

Minimal Example

Features

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DevInterp

A Python Library for Developmental Interpretability Research

Installation

Getting Started

Minimal Example

Features

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages