Skip to content

mustafaaljadery/monosemanticity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Monosemantic Search

We take the visualization interface from Anthropic's Towards Monosemanticity: Decomposing Language Models With Dictionary Learning and make it 80x+ faster.

Check it out here!

speed

How does this work?

We first scrape all of the data from Anthropic. Then index all the tokens we want to search using redis. Redis then allows for extremely fast retrieval.

We then created a backend using python and flask that allows us to retreive data really quickly with the redis indexs.

Finally, all of this is presented with our NextJS frontend.

The biggest optimization here is the redis indexing. Redis allows us to make our search numerous times faster than the current search at Anthropic's visualization page.

The only problem is that sorting all of the data in memory is expensive. The redis DB is about 3.6GB in size.

Dev & Production

First get all the data. This is done from the scraper folder. It's all already scraped. You can use that data as well.

To scrape all of the data run

python3 main.py

The go we must index all of the data on redis. Make sure you have a redis instance that can handle 4GB of data.

From the server folder:

pip3 install requirements.txt

then

python3 indexing.py

then run the flask server, app.py.

Finally, to run the frontend, go to the frontend folder.

npm install

then build the web app

npm run build

finally start it

npm run start

Acknowledgement

This work was created by Mustafa Aljadery & Siddharth Sharma.

About

Understanding neural networks with dictionary learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published