Skip to content

johnowhitaker/tulipmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tulipmap

image

This is a quick afternoon data exploration project by me, Johno. All code by GPT 5.4. Please don't hammer iNaturalist's API too hard if you choose to replicate this. Below is an (AI-generated) overview of the project and how to get this running - feel free to target a different taxa rather than repeating for tulips - ask an agent of your choice to make the change.

What This Does

This repo:

  • fetches tulip observation metadata and image URLs from iNaturalist
  • downloads image files locally if you want them
  • embeds image URLs with a CLIP model on Replicate
  • lets you search the embedding space in a small review app
  • builds a filtered subset from a CLIP query and threshold
  • projects that subset with UMAP and shows it in a local viewer

The current working filtered set is based on:

  • query: closeup photo of a tulip flower, filling the frame
  • threshold: 0.22

Requirements

  • Python 3.12-ish
  • a Replicate API key for the CLIP embedding steps
  • gh only if you want to publish the repo somewhere

Put your Replicate key in a .env file in the repo root like this:

REPLICATE_API_TOKEN=...

Install dependencies:

uv pip install -r requirements.txt

Fastest Way To Reproduce

1. Fetch metadata

python3 scripts/fetch_inat_metadata.py \
  --limit 10000 \
  --output data/metadata/tulips_medium_10k.csv

2. Download images

python3 scripts/download_inat_images.py \
  --input data/metadata/tulips_medium_10k.csv \
  --output-dir data/images/medium_10k \
  --limit 10000

3. Build CLIP embeddings

This step uses Replicate and requires REPLICATE_API_TOKEN.

python3 scripts/embed_inat_clip.py --dataset main

4. Review/search the embeddings

python3 review_app/app.py

Open http://127.0.0.1:5000.

5. Build the filtered subset

python3 scripts/build_filtered_dataset.py

6. Build the UMAP/clustering/color bundle

python3 scripts/build_projection_bundle.py

7. View the projection

python3 projection_app/app.py

Open http://127.0.0.1:5001.

Main Files

  • scripts/fetch_inat_metadata.py: query iNaturalist and write a CSV manifest
  • scripts/download_inat_images.py: download images locally
  • scripts/embed_inat_clip.py: build CLIP embeddings through Replicate
  • scripts/build_filtered_dataset.py: create a saved filtered subset from a CLIP query
  • scripts/build_projection_bundle.py: compute KMeans clusters, UMAP coordinates, and center-crop average colors
  • review_app/: text-query CLIP review/search UI
  • projection_app/: UMAP projection viewer

Notes

  • Generated data, downloaded images, embeddings, filtered subsets, projection bundles, and .env are all ignored by git.
  • If you want to adapt this to a different taxon, the main place to start is the iNaturalist fetch step and the CLIP filter query.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors