Skip to content

stephenpadgett1/data-centers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

US Data Centers Map

An interactive dark map of US data centers — operational, under construction, and planned — with editorial judgment calls on each facility (purpose-built vs speculative, AI vs general compute, operator type).

Live data is pulled from OpenStreetMap and a curated layer of headline announced AI megacampuses, classified by Claude, and published as a static JSON the site reads at runtime. The whole thing is refreshed daily.

How it works

OSM Overpass API ──fetch.py──▶ data/facilities.raw.json ─┐
news + trade RSS ──discover.py─▶ candidates ─(Claude)─┐  │
data/curated.json (curated + discovered campuses) ────┼──┤
data/classifications.json (editorial cache) ──────────┼──┼─build.py─▶ site/public/data/data-centers.json
pipeline/operators.json (operator lookup) ────────────┘  ┘                + build-meta.json
                                                                          + data/unclassified.json (worklist)

Data sources for planned facilities (the hardest part — OSM barely maps them):

  • pipeline/discover.py harvests Google News search feeds + trade press (DataCenterDynamics, DataCenterKnowledge, Bisnow, Data Center POST), filters to likely new US announcements, and writes a candidate worklist.
  • During the daily refresh, Claude reads the candidates, keeps the genuine new projects, geocodes them (pipeline/geocode.py, OSM Nominatim), and adds them to data/curated.json — typically status: planned, purpose: speculative.
  • Feed URLs live in pipeline/sources.json (brittle by nature — re-verify if a feed goes quiet).

Power-generation layer (EIA Form 860M): a toggleable second layer of ~6,200 US power plants (all planned/under-construction + operating ≥25 MW, ~96% of capacity), colored by fuel and sized by megawatts, to show the grid behind the buildout. pipeline/fetch_power.py ingests it → site/public/data/power-plants.json. This is a separate monthly step (EIA updates monthly; the workbook is ~14 MB) and is the only part that needs a dependency (openpyxl) — the daily refresh stays stdlib-only.

Insights & geographic aggregation: an "Insights" dashboard ranks states by data-center count vs. generation GW (supply vs. demand), with type/fuel/workload breakdowns, and an optional state choropleth shades the map by either metric. build.py backfills each facility's state via point-in-polygon against pipeline/us-states.geojson (OSM only populates ~half) and emits site/public/data/us-states.geojson for the choropleth.

  • No backend, no database. The "data store" is site/public/data/data-centers.json, committed to the repo.
  • Front-end: Vite + TypeScript + MapLibre GL JS, OpenFreeMap dark basemap (no API key).
  • Hosting: GitHub Pages, served from the gh-pages branch. scripts/deploy.sh builds the site and publishes it there — fully self-contained, no CI required. (docs/deploy-actions.yml.example is an alternative GitHub Actions workflow if you prefer CI deploys; it needs a token with workflow scope.)

Local development

# 1. Pull + build the data (Python stdlib only, no pip install)
python3 pipeline/fetch.py        # pull OSM Overpass -> data/facilities.raw.json
python3 pipeline/build.py        # merge -> site/public/data/data-centers.json
python3 pipeline/validate.py     # sanity-check the output

# 2. (Optional, monthly) refresh the power-generation layer
pip install -r pipeline/requirements.txt
python3 pipeline/fetch_power.py  # EIA-860M -> site/public/data/power-plants.json

# 3. Run the site
cd site
npm install
npm run dev                      # http://localhost:5173

Daily refresh

The refresh runs through Claude Code (see scripts/refresh.md for the exact checklist). It pulls fresh OSM data, classifies any newly-seen facilities, re-checks the curated megacampuses, rebuilds the JSON, commits the data, and republishes the site.

A launchd job (scripts/refresh-cron.sh) runs this daily on the local machine:

./scripts/refresh-cron.sh        # one full refresh + deploy cycle

Manual deploy any time:

./scripts/deploy.sh              # build + publish to gh-pages

Data & attribution

  • Facility locations: © OpenStreetMap contributors, ODbL.
  • Basemap: OpenFreeMap © OpenMapTiles.
  • Status, capacity, and classifications are editorial estimates generated from public sources; treat them as informed approximations, not authoritative records.

About

Interactive dark map of US data centers — operational, under construction, and planned — with editorial judgment calls. Daily-refreshed from OpenStreetMap + curated AI megacampuses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors