Skip to content

Official implementation of FIND (NeurIPS '23) Function Interpretation Benchmark and Automated Interpretability Agents

Notifications You must be signed in to change notification settings

multimodal-interpretability/FIND

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FIND: Function Interpretation Dataset

A Function Interpretation Benchmark for Evaluating Interpretability Methods (official implementation)

Sarah Schwettmannn*, Tamar Rott Shaham*, Joanna Materzynska, Neil Chowdhury, Shuang Li, Jacob Andreas, David Bau, Antonio Torralba.
* equal contribution

FIND overview

This repository is under active development, expect updates!

Setup

Clone this repository:

git clone https://github.com/multimodal-interpretability/FIND
cd FIND

Install dependencies:

pip install -r requirements.txt

Then download and unzip the FIND dataset into ./src/find-dataset:

wget -P ./src/find_dataset/ https://zenodo.org/record/8039658/files/FIND-dataset.zip
unzip ./src/find_dataset/FIND-dataset.zip -d ./src/

We include the dataset structure and examples of 5 functions per category under ./src/find_dataset/

Run interpretations

To run the interpretation, run cd ./src/run_interpretations/ and follow the instructions on the README file. The code will also allow you to add your own interpreter model.

You can also download the full FIND interpretations benchmark and unzip it to ./src/run_interpretations/:

wget -P ./src/run_interpretations https://data.csail.mit.edu/FIND/FIND-interpretations.zip
unzip ./src/run_interpretations/FIND-interpretations.zip -d ./src/run_interpretations/

See interpretation examples at ./src/notebooks/example_interpretations.ipynb

Evaluate interpretations

To evaluate the interpretations, run cd ./src/evaluate_interpretations/ and follow the instructions on the README file.

Generate new functions

To generate a new set of numeric and/or strings functions, run cd ./src/make_functions/ and follow the instructions on the README file.

About

Official implementation of FIND (NeurIPS '23) Function Interpretation Benchmark and Automated Interpretability Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages