Interactive web map of Facebook's Social Connectedness Index (SCI) — the relative likelihood that a Facebook user in one place has a friendship link to a user in another place. Click any country, EU region (NUTS2), US state, or worldwide admin-2 region; the map recolors to show how strongly that location connects to everywhere else.
Canonical deploy: sci-map.michaelbailey.org (also embedded at michaelbailey.org/sci-map/).
Data source: Humanitarian Data Exchange — Social Connectedness Index (Johnston, Kuchler, Kulkarni, Stroebel — 2026 vintage). The SCI was introduced by Bailey, Cao, Kuchler, Stroebel, Wong (JEP 2018).
Map design adapted from DeepMoiré.
- Pick how you want to use it — wayfinder
- What the user sees
- Architecture & data flow
- Run it locally — no accounts, no DNS
- Embed the canonical deploy on your own site
- Fork & deploy your own production instance — full setup walkthrough
- Rebuild the GADM2 dataset (ETL)
- Costs & limits
- Contributing
- License
Choose the section that matches what you want to do — each gets more involved than the last:
| Goal | Sections to read | What you need beforehand |
|---|---|---|
| Just experiment / read the code / contribute a PR | Run it locally | Git + Python. No accounts. |
| Embed the map on a site you already have | Embed the canonical deploy | A page where you can put one <iframe> tag. |
| Host your own copy on a custom domain | Fork & deploy | A Mapbox account, a Cloudflare account, and a domain (~30 min). |
Four tabs (top-left), one map (full viewport), one info panel (right):
| Tab | Geography | Source data | Where it lives |
|---|---|---|---|
| Countries | World, country-level | HDX country.csv |
Bundled in repo, ~565 KB CSV + 2.2 MB GeoJSON |
| EU regions | Eurostat NUTS-2024 admin-2 (~300 polys) | HDX nuts_2024.zip |
Bundled, ~1.8 MB CSV + 240 KB GeoJSON |
| US states | GADM 4.1 USA admin-1 (50 + DC + PR) | HDX gadm1.csv (filtered) |
Bundled, ~680 KB CSV + 270 KB GeoJSON |
| World regions | GADM 4.1 worldwide admin-2 (47k polys) | HDX gadm2.zip (12 shards, ~44.7 GB raw) |
Cloudflare R2 — too big to ship in the repo |
Per click the page repaints the choropleth from a single source's row, surfaces a top-10 most-connected list, and renders a likelihood legend.
Static HTML/CSS/JS. No build step, no backend, no framework. Everything is browser-side.
┌─────────────────────────────┐
User opens sci-map.example.org │ index.html │
──────────────────────────────▶ │ loads css/style.css │
│ + mapbox-gl-js (CDN) │
│ + js/config.js │ ← edit me to fork
│ + js/main.js │
└──────────────┬──────────────┘
│
┌──────────────────────────┼──────────────────────────┐
▼ ▼ ▼
Mapbox tiles Bundled data/*.csv Cloudflare R2
(basemap) (levels 0-2, ~5 MB) sci-data bucket:
• gadm2_world_v5.geojson (17 MB,
loaded once when level3 active)
• gadm2_v2/_meta.json (legend
thresholds, loaded once)
• gadm2_v2/<GID_2>.json.gz
(~190 KB, fetched per click
on World regions)
The full GADM2 SCI pairs table is ~900 million rows and ~40 GB raw. Even the bin-encoded per-source files together come to ~6.3 GB across 37,160 files — too big for GitHub Pages, and we want a CDN with free egress (R2's killer feature relative to S3) since each click is a fresh fetch.
Each gadm2_v2/<GID_2>.json.gz decodes to:
{
"ref": 4359,
"a": { "FRIEND_GID_2": 0..7, ... }, // bin index under "Custom" thresholds
"b": { "FRIEND_GID_2": 0..7, ... }, // bin index under "Even" thresholds
"top": [
{ "g": "USA.45.13_1", "s": 1040824, "a": 7, "b": 7 },
...
]
}ref— this source's 20th-percentilescaled_sci. Used as the "1×" reference for bin labels.a— bin assigned under fixed multipliers ofref:[1, 5, 10, 25, 50, 100, 250]×.b— bin assigned under global-quantile cuts on theratio > 1distribution:[1, 5, 12, 23, 47, 119, 528]×. Currently shipped but the UI toggle is hidden — Method A is the visible default.top— top 20 friends ranked by RAWscaled_sci. The page renders the first 10; names + countries are joined client-side from the boundary GeoJSON viaGID_2.
R2 serves objects exactly as uploaded. If you upload a gzipped file but only set Content-Type: application/json, the browser receives raw gzip bytes — .json() fails with SyntaxError: Unexpected token. The fix is to upload with both Content-Type: application/json and Content-Encoding: gzip. Then Cloudflare advertises gzip transfer encoding and every browser auto-decompresses transparently. The ETL section below uses these flags on every rclone invocation; if you skip them, the page will silently fail to repaint on World-region clicks.
The fastest way to look at the code, click around, or develop a PR. Nothing to sign up for.
git clone https://github.com/mikebailey/sci-world-map.git
cd sci-world-map
python3 -m http.server 4000
# open http://localhost:4000That's it. The page fetches the basemap tiles from Mapbox using the bundled token (which is URL-restricted to a small allowlist that includes http://localhost:4000, so it works) and the GADM2 data from the canonical Cloudflare R2 bucket. Both are public read-only resources — no auth needed on your side.
What you can do without further setup:
- Browse + click any country, EU region, US state, or world admin-2 region.
- Edit
index.html,css/style.css,js/main.js, etc. and hit refresh. - Open
js/config.jsand flipDISABLE_BASEMAP: trueto preview the no-basemap fallback look. - Submit PRs.
What you can't do without further setup:
- Modify the GADM2 dataset on R2 (read-only access — see ETL if you want to rebuild from scratch on your own bucket).
- Deploy your changes to a public site (you'd need your own Mapbox token + R2 — see Fork & deploy).
If you want to use your own Mapbox token and R2 bucket while still serving from localhost, do steps 1 and 2 of the Fork section below (Mapbox token + R2 bucket), edit js/config.js, and run the same http.server command. No DNS, no GitHub Pages.
The cheapest way to ship this on a site you control — no accounts, no servers, no quotas to manage:
<iframe
src="https://sci-map.michaelbailey.org"
style="border:0; width:100%; height:100vh;"
loading="lazy"
allowfullscreen
title="Social Connectedness Index">
</iframe>The iframe makes its own resource fetches as sci-map.michaelbailey.org, so the embedding host's domain doesn't need to be on either the Mapbox URL allowlist or the R2 CORS allowlist. Tile and R2 usage count against the canonical deploy's quotas, not yours.
If you'd rather copy the static files into your own site instead of iframing (so you can style around them, or so your visitors don't hit a third-party domain), see Fork & deploy — that is the same procedure, just minus the GitHub Pages part.
This walkthrough is for hosting your own copy on its own domain, with its own Mapbox + R2 quotas. Roughly 30 minutes end to end. Written so an LLM coding agent can follow each step verbatim without further prompts.
- A GitHub account with
ghCLI authenticated (gh auth statusreturns OK). - A Mapbox account (free tier; no credit card needed for the free 50k map loads/month).
- A Cloudflare account (free tier; no credit card needed unless you exceed R2's free 10 GB).
rcloneinstalled locally (brew install rclone,apt install rclone, or rclone.org/install).gitand a recent Python 3 (3.9+).
- Go to account.mapbox.com and create an account if you don't have one.
- Click Create a token.
- Name:
sci-map-public(anything — this name only appears in your dashboard). - Public scopes: check ONLY these three. Uncheck everything else.
styles:readstyles:tilesfonts:read
- Secret scopes: leave everything unchecked.
- URL restrictions: add the URLs your deploy will be served from. Examples:
http://localhost:4000(for local dev)http://127.0.0.1:4000(some browsers prefer this form)https://your-deploy-domain.example.com(your canonical deploy)- Any other site that will embed this as a script tag, not an iframe. (Iframed deploys only need the iframe URL allowlisted, not the parent page's URL.)
- Click Create token.
- Copy the
pk....string. Paste it intojs/config.js:MAPBOX_TOKEN: "pk.YOURTOKENHERE",
The token is committed to git and shipped to every visiting browser. That is OK because (a) the scopes are read-only and (b) the URL restrictions mean anyone who copies it can't use it from a domain you don't control.
- Sign in at dash.cloudflare.com. If you don't have an account, create one (no card needed).
- In the left nav, click R2. If this is your first time, click Enable R2 and follow the prompts (it'll ask for a payment method to verify your identity but won't charge anything under the 10 GB / 10M ops/month free tier).
- Click Create bucket.
- Name:
sci-data(or anything; remember it). - Location: leave on Automatic.
- Name:
- Open the bucket → Settings tab.
- Under Public access, click Allow access → confirm. This exposes the bucket at a URL like
https://pub-XXXXXXXXXXXX.r2.dev/. Copy that URL. - Paste the URL into
js/config.js:R2_BASE: "https://pub-XXXXXXXXXXXX.r2.dev",
R2's CORS allowlist gates browser fetches from JavaScript. Without CORS, the page can fetch the boundary file and meta but per-click region fetches will be blocked.
-
In the bucket Settings → CORS Policy, click Add CORS policy.
-
Paste this (edit the
AllowedOriginslist to your own domains):[ { "AllowedOrigins": [ "http://localhost:4000", "http://127.0.0.1:4000", "https://your-deploy-domain.example.com" ], "AllowedMethods": ["GET", "HEAD"], "AllowedHeaders": ["*"], "ExposeHeaders": ["ETag", "Content-Length"], "MaxAgeSeconds": 3600 } ] -
Save.
-
In the R2 dashboard → Manage R2 API Tokens (top right).
-
Click Create Account API token (NOT a User-level token — Account tokens are what work with
rclone). -
Permissions: Object Read & Write.
-
Specify bucket: select your bucket (e.g.
sci-data) — single-bucket scoping is fine. -
TTL: "Forever" is OK; pick a calendar date if you prefer to rotate.
-
Click Create.
-
The dashboard now shows: an Access Key ID, a Secret Access Key, and an Endpoint URL. Copy all three.
-
Configure
rclone:mkdir -p ~/.config/rclone cat > ~/.config/rclone/rclone.conf <<EOF [r2] type = s3 provider = Cloudflare access_key_id = <PASTE ACCESS KEY ID> secret_access_key = <PASTE SECRET ACCESS KEY> endpoint = <PASTE ENDPOINT URL> region = auto acl = private EOF
-
Sanity check:
rclone ls r2:sci-data --s3-no-check-bucket # (empty bucket → no output; non-empty → list of objects. No error = good.)The
--s3-no-check-bucketflag is required with Cloudflare's single-bucket-scoped tokens — without it, rclone tries toListBucketsfirst, which the token isn't allowed to do, and the command fails 403.
You have two choices.
Option A (easiest, no ETL): mirror Mike's canonical files. Roughly 6.5 GB of one-time download + 6.5 GB of upload to your R2.
# These URLs are public, no auth needed.
PUB=https://pub-5433ddd592ff4ca4829ed8c8b77d58d6.r2.dev
# Boundary
curl -fLo gadm2_world_v5.geojson "$PUB/gadm2_world_v5.geojson"
rclone copyto gadm2_world_v5.geojson r2:sci-data/gadm2_world_v5.geojson \
--s3-no-check-bucket --header-upload "Content-Type: application/json"
# Meta + 37,160 per-source SCI JSONs (you'll want to script this with rclone
# sync from a temporary mirror; see etl/README.md for the upload patterns)This is fine if you trust the canonical version; it gives you an identical instance under your own quotas. The downside is no auditability — you have no idea how the bins were chosen without re-running the ETL.
Option B (auditable): rebuild from scratch. See Rebuild the GADM2 dataset below. ~1 hour wall clock on a recent laptop. End-to-end reproducible from HDX.
python3 -m http.server 4000
# open http://localhost:4000 — the World regions tab should now hit YOUR R2If the page loads but clicks on World regions silently do nothing, open DevTools → Network and look for failed R2 fetches. The most common causes:
- Per-source files served without
Content-Encoding: gzip→SyntaxError: Unexpected token. Re-upload with both headers. - CORS denied → your deploy origin isn't in the R2 bucket's CORS allowlist.
- 404 → file isn't actually there (or the path in
R2_GADM2_PATHdoesn't match where you uploaded).
The page is a static site. Any static host works (GitHub Pages, Cloudflare Pages, Netlify, S3+CloudFront, ...).
GitHub Pages with custom domain (the canonical setup):
gh repo create your-org/sci-map --public --source=. --remote=origin --push
# Repo Settings → Pages → Source = "Deploy from a branch", Branch = main / root
# Custom domain: enter sci-map.yourdomain.com → "Save"
# Add a CNAME DNS record at your registrar:
# sci-map.yourdomain.com → your-org.github.io
# Wait ~1 min for DNS + Pages to settle, then visit https://sci-map.yourdomain.comThe repo already includes a .nojekyll (so Pages doesn't try to Jekyll-process the JSON files) and a CNAME (which you'll edit to your domain). Don't forget to:
- Add your final
https://sci-map.yourdomain.comto both the Mapbox URL allowlist and the R2 CORS allowlist before going live.
End-to-end pipeline that takes you from HDX downloads to the 37,160 R2 objects that drive the World-regions tab. See etl/README.md for the line-by-line script. Summary:
-
Download the SCI shards. HDX dataset
social-connectedness-index, resourcegadm2.zip→ redirects to Google Drive file id1M3XTjZG_bgzGkEZ1tJgZ6qLcuPJU5Ck4. 4.98 GB zip → 12 country-sharded CSVs, ~44.7 GB raw.pip3 install --user gdown gdown 1M3XTjZG_bgzGkEZ1tJgZ6qLcuPJU5Ck4 -O gadm2.zip unzip gadm2.zip -d shards/ -x "__MACOSX/*" -
Download the GADM 4.1 worldwide GeoPackage (for the boundary file).
curl -fLo gadm_410-gpkg.zip https://geodata.ucdavis.edu/gadm/gadm4.1/gadm_410-gpkg.zip unzip gadm_410-gpkg.zip
-
Build the boundary (
gadm2_world_v5.geojson). Streams L2 features viaogr2ogr, dissolves duplicates onGID_2, simplifies and topology-cleans with mapshaper. Seeetl/README.mdfor the exact incantation. Output: ~17 MB GeoJSON, 47,211 polygons. -
Run the SCI ETL.
pip3 install --user duckdb orjson # Drop the canonical bin thresholds in place so the script reuses them # (re-deriving Method B's global quantiles from a 2M-row sample is # deterministic but takes ~1 extra minute): mkdir -p out_v2 cp etl/_meta.json out_v2/_meta.json python3 -u etl/etl_v3.py 2>&1 | tee etl.log
~20 min wall clock. Peak RAM ~6 GB. Disk spill peaks ~35 GB. Output:
out_v2/with 37,160 gzipped per-source JSONs. -
Upload to R2. Both headers are required:
# Per-source JSONs + meta rclone sync out_v2/ r2:sci-data/gadm2_v2/ \ --s3-no-check-bucket \ --header-upload "Content-Type: application/json" \ --header-upload "Content-Encoding: gzip" \ --transfers 32 --checkers 32 # Boundary file (no Content-Encoding because the .geojson is plain text) rclone copyto gadm2_world_v5.geojson r2:sci-data/gadm2_world_v5.geojson \ --s3-no-check-bucket \ --header-upload "Content-Type: application/json"
After this, the World regions tab on your deploy will paint from your own R2 bucket.
Both Mapbox and Cloudflare R2 charge per usage with generous free tiers. R2 is essentially free at any plausible scale because R2 charges zero for egress (S3's biggest pain point). Mapbox is the only material cost.
| Resource | Free per month | Above-tier price |
|---|---|---|
| Mapbox map loads | 50,000 | $5/k (next 50k), $4/k (next 100k), $3/k (next 800k), $2.50/k (1M–5M) |
| Mapbox vector tiles | 200,000 | $0.25/k (next 1.8M), $0.20/k (next 2M), $0.15/k (4M+) |
| R2 storage | 10 GB | $0.015/GB/mo |
| R2 Class B (reads) | 10M ops | $0.36/M ops |
| R2 Class A (writes) | 1M ops | $4.50/M ops |
| R2 egress | unlimited | (no charge ever) |
Per-user-per-month assumptions used in the table below:
- 1 page-load → 1 Mapbox map load
- ~40 Mapbox vector-tile requests (modest pan/zoom on the light-v11 basemap)
- ~7 R2 GET requests (boundary + meta + ~5 region clicks)
| Users/mo | Mapbox cost | R2 cost | Total/mo |
|---|---|---|---|
| 1,000 | $0 | $0 | $0 |
| 10,000 | $50 | $0 | $50 |
| 50,000 | ~$450 | $0 | ~$450 |
| 100,000 | ~$1,100 | $0 | ~$1,100 |
| 500,000 | ~$4,800 | $0 | ~$4,800 |
| 1,000,000 | ~$9,300 | $0 | ~$9,300 |
Tile requests dominate the Mapbox bill — at 100k users/mo, ~77% of the Mapbox cost is tiles, not loads. The 40-tiles-per-user assumption is the squishiest number; lighter usage cuts it in half, heavier (lots of pan/zoom across continents) roughly doubles it.
Mapbox does not expose configurable spend alerts or a usage API. The only programmatic signal you get is an automatic email when any service first crosses its free tier in a billing period (50k map loads or 200k vector tiles). The email is on by default — you don't set it up — but the warning lands at 100% of free tier, not at a buffer you've chosen. If you want earlier warning, check the Statistics page in the Mapbox dashboard manually (it lags ~24 h). The "Killing the basemap" section below explains what to do when the email arrives.
The escape hatch is to drop the Mapbox basemap and render just the boundary polygons on a plain background. Cosmetically worse but Mapbox cost goes to ~$0 at any scale (no tile requests = no tile bill; no map load = no map-load bill). Three independent triggers can flip the page into no-basemap mode:
- Manual config kill-switch. Set
DISABLE_BASEMAP: trueinjs/config.jsand redeploy. Slow path (push + Pages build) but it's the most durable — useful for permanent toggles or for testing the fallback look locally. - Reactive client fallback (already wired). If a user's browser hits a Mapbox 401/403/429 mid-session,
main.jsflips the no-basemap session flag and reloads the tab into the fallback view. Covers individual users immediately, no infrastructure needed. - Runtime feature flag on R2.
index.htmlfetches<R2_BASE>/feature-flags.jsonon every page load (beforemain.jsruns). If that file saysdisable_basemap: true, the page initializes in no-basemap mode without ever asking Mapbox for tiles. Updates propagate to all users within ~60s (the R2 cache TTL).
The runtime flag is written by you, not by an automated poller. Mapbox does not expose a usage API and does not offer configurable spend alerts. The only programmatic signal they emit is the automatic "free tier exceeded" email sent to the account owner the first time any service crosses its free tier in a billing period (50k map loads or 200k vector tiles). That email is your trigger.
Workflow:
-
You get the automatic "free tier exceeded" email from Mapbox. Nothing to set up — it's already on. The email is the latest possible warning; from this point every additional load is billable, so you have to decide quickly whether to pay through the rest of the month or kill the basemap.
-
If you want to stop the meter, run from this repo:
bin/disable-basemap.sh
That uploads a
feature-flags.jsonto your R2 bucket withdisable_basemap: true. Every page load from then on serves the no-basemap fallback. Optional argument is a reason string that gets stored in the flag for auditability:bin/disable-basemap.sh "manual:free_tier_hit_2026-06". -
On the 1st of next month, run
bin/enable-basemap.shto clear the flag.
Both scripts use rclone against your [r2] remote and overwrite the flag in place. R2's cache header is set to 60 s, so the change reaches users within a minute.
If bin/ isn't handy, you can also edit feature-flags.json by hand in the Cloudflare R2 dashboard → bucket → click the file → Edit metadata + content. The page reads it identically.
A few things that would give you earlier warning if you want more headroom than Mapbox's free-tier email provides:
- Check the Mapbox Statistics dashboard manually every few days; it lags ~24h but shows the trend before you hit free tier.
- The in-browser reactive fallback (layer #1, above) still catches the first user who actually hits a 429 — they're flipped into no-basemap mode within their own session. So even with no monitoring at all, the failure mode degrades to "one user sees a brief blank screen during the reload" rather than a broken canvas for everyone.
Contributions welcome, especially:
- New region layers (e.g. UK NUTS3, Canadian census divisions, world admin-3).
- Performance improvements on the World-regions tab (currently the bin maps are dense — there's an obvious 50% size cut by packing both bins into a single byte).
- Better top-10 disambiguation (we already drop empty/
"?"names and append country, but heuristics for very-common toponyms could be sharper). - A "deep link" mechanism so a click can be captured in the URL and shared.
PRs should:
- Keep the page a single static HTML file with no build step. (Bundling is fine to add if it's a clear win, but the current zero-dependency setup is a feature.)
- Not bake the
MAPBOX_TOKENorR2_BASEinto anything other thanjs/config.js. - Update
etl/and this README if you change the on-the-wire payload shape.
MIT. See LICENSE.
The SCI data is published by Meta's Data for Good program on HDX under Meta's Open Data Commons Attribution License — review their terms separately if you redistribute the underlying numbers.