Latent Drafting

This repository attempts to simplify a workflow for exploring latent spaces. We draft maps of embeddings via a combination of node scripts, python notebooks and Observable notebooks.

Embed any csv

If you have a csv file where you'd like to embed the text in a given column, you can do so with this command:

node embed-csv.mjs <filename> <text column> <id column>

If there is no id column it will use the row # in the input csv as the id. It expects the csv file to be in the data/ directory.

Reddit embeddings

You can download up to 1000 top posts for any subreddit via the reddit API. We then embed the title and selftext for all the posts.

# download the top 1000 posts for DataIsBeautiful
node reddit-posts.js dataisbeautiful fetch 1000

#embed the posts
node reddit-embeddings.js dataisbeautiful

Then we can dimensionality reduce it with UMAP and cluster with HDBSCAN

see jupyter/Reddit umap.ipynb

Then we can also generate labels for our clusters:

node cluster-labels.mjs dataisbeautiful

The results of this process aren't committed to the repo but can be explored here: https://observablehq.com/@codingwithfire/dataisbeautiful-top-posts

Simonw Blog entries

Simon Willison publishes a popular blog on technology, and he also published embeddings of his almost 3000 blog posts.

The process is done in jupyter/simonw umap.ipynb

The map can be seen here: https://observablehq.com/@codingwithfire/simonw-blog-entry-map

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
jupyter		jupyter
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
cluster-labels.mjs		cluster-labels.mjs
embed-csv.mjs		embed-csv.mjs
embed-parquet.mjs		embed-parquet.mjs
embeddings.mjs		embeddings.mjs
package-lock.json		package-lock.json
package.json		package.json
reddit-embed.mjs		reddit-embed.mjs
reddit-posts.js		reddit-posts.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jupyter

jupyter

.env.example

.env.example

.gitignore

.gitignore

README.md

README.md

cluster-labels.mjs

cluster-labels.mjs

embed-csv.mjs

embed-csv.mjs

embed-parquet.mjs

embed-parquet.mjs

embeddings.mjs

embeddings.mjs

package-lock.json

package-lock.json

package.json

package.json

reddit-embed.mjs

reddit-embed.mjs

reddit-posts.js

reddit-posts.js

Repository files navigation

Latent Drafting

Embed any csv

Reddit embeddings

Simonw Blog entries

About

Releases

Packages

Contributors 2

Languages

enjalot/latent-drafting

Folders and files

Latest commit

History

Repository files navigation

Latent Drafting

Embed any csv

Reddit embeddings

Simonw Blog entries

About

Resources

Stars

Watchers

Forks

Languages