Skip to content

nwyin/billwatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

billwatch

A cached, token-efficient CLI for querying US state legislative data. Designed for LLM consumption following AXI principles — the CLI is the data layer, the LLM is the intelligence layer.

What it does

Given a state and a keyword, billwatch returns legislative bills in a compact token-efficient format (TOON) with a help[] block pointing at the next useful command. Everything runs against a local SQLite cache with an FTS5 full-text index, so every query is offline and instant.

Data source

Bills come from OpenStates monthly Postgres dumps. OpenStates publishes a free dump covering every state + DC at data.openstates.org/postgres/. You restore it to a local Postgres, run billwatch import openstates once, and every state's bills, sponsors, abstracts, and full text land in the local SQLite cache. After that, all queries run offline — no API keys, no rate limits, no HTTP calls.

Refreshing is a monthly job: grab the next dump, re-run billwatch import openstates, and the cache is current. A live-refresh layer on top of LegiScan for sub-monthly freshness is planned but not yet implemented — tracked in the project's GitHub issues.

Install

git clone <this repo> billwatch
cd billwatch
uv sync --extra openstates --extra dev

No API keys to configure. The only prerequisite is a local Postgres to restore the dump into (see the import openstates workflow below).

Quick start

# what's cached, last sync
uv run billwatch

# one-time bulk import — every state, offline
uv run billwatch import openstates \
    --pg-url postgresql://openstates:test@localhost:54329/openstates \
    --with-text

# search (FTS5 against the local cache — 0 network calls)
uv run billwatch search NY "artificial intelligence"
uv run billwatch search IL "ai"

# single bill details
uv run billwatch bill NY S6953
uv run billwatch bill IL HB1806

# full bill text
uv run billwatch text NY S6953
uv run billwatch text IL HB1806

# browse cached bills
uv run billwatch ls            # all cached states
uv run billwatch ls NY         # all cached NY bills
uv run billwatch ls NY --status "signed by governor" --limit 20

Commands

billwatch (no args) — Ambient context

Prints version, cached state summary, and help suggestions. Run this first in any new conversation to see what's already available locally.

billwatch v0.1.0

cached[2]{state,bills,synced}:
  IL,108,2h ago
  NY,198,2h ago

help[]:
  Run 'billwatch search IL "tax"' to search cached bills
  Run 'billwatch ls' to list cached states
  Run 'billwatch --help' for all commands

billwatch search STATE "KEYWORD"

Runs an FTS5 search over the cached bills for the given state. Exits with cache_miss if the state has no cached bills yet — populate with billwatch import openstates --state XX first.

Options:

  • --since N — only show bills with last_action_date in the last N days
  • --status S — filter by status substring (case-insensitive), e.g. --status "signed"
  • --fields f1,f2 — add fields to the output (sponsors, url, description, session, bill_type)
  • --full — show all fields
  • --json — emit valid JSON instead of TOON (useful for piping into jq)

Query syntax notes:

  • Multi-word queries are automatically phrase-quoted for FTS5, so billwatch search NY "artificial intelligence" matches the exact phrase.
  • Single-word queries match any bill whose title, description, or bill number contains the word.
  • Malformed FTS5 queries (e.g. bare operators, unmatched quotes) return zero results rather than crashing.

billwatch bill STATE BILL_NUMBER

Shows full metadata for a single bill: sponsor, status, last action date, description, URL, etc. Description is truncated to 500 chars unless --full is passed. Returns not_found if the bill isn't in the cache.

uv run billwatch bill NY S6953
uv run billwatch bill IL HB1806 --full --json

billwatch text STATE BILL_NUMBER

Returns the complete bill text as plaintext from the cache. Default truncates to 4000 chars with a size hint; --full returns the complete text. Requires the import to have been run with --with-text.

uv run billwatch text NY S6953
uv run billwatch text IL HB1806 --full

Pipe through grep / head / tail to narrow in on specific sections:

uv run billwatch text NY S6953 --full | grep -A 3 "DEFINITIONS"

billwatch ls [STATE]

Without a state: lists all cached states with bill counts and freshness (relative time). With a state: lists the cached bills for that state, sorted by last_action_date desc.

Options (when a state is given):

  • --limit N — show N bills (default 20, 0 = all)
  • --status S — filter by status substring
  • --since N — bills with action in last N days
  • --fields, --full, --json — same as search

billwatch import openstates

Bulk-imports bills from an OpenStates monthly Postgres dump into the local SQLite cache. Run this once per month and you have every state's legislature indexed locally with zero ongoing API calls.

Workflow:

# 1. Download the monthly dump (~10 GB)
curl -O https://data.openstates.org/postgres/monthly/2026-04-public.pgdump

# 2. Start a local Postgres and restore
docker run -d --name openstates -e POSTGRES_PASSWORD=test \
    -e POSTGRES_USER=openstates -e POSTGRES_DB=openstates \
    -p 54329:5432 postgres:17
docker cp 2026-04-public.pgdump openstates:/tmp/dump.pgdump
docker exec openstates pg_restore --no-owner --no-privileges \
    -d openstates -U openstates /tmp/dump.pgdump

# 3. Install the optional psycopg dependency (already done if you ran `uv sync --extra openstates`)
uv sync --extra openstates --extra dev

# 4. Import one state (fast, ~30s for most states)
uv run billwatch import openstates \
    --pg-url postgresql://openstates:test@localhost:54329/openstates \
    --state NY \
    --with-text

# or: import every state at once (much slower, larger cache)
uv run billwatch import openstates \
    --pg-url postgresql://openstates:test@localhost:54329/openstates \
    --with-text

Options:

  • --pg-url URL — Postgres connection string. Alternatively set OPENSTATES_PG_URL env var.
  • --state XX — filter to one state (2-letter code). Default imports all states.
  • --limit N — max bills to import (useful for dry runs).
  • --with-text — also import full bill text from opencivicdata_searchablebill.raw_text. Slower but worth it.

What gets imported: bill identifier, title, status (derived from latest_passage_date + latest_action_description), sponsors, abstracts (as description), source URLs, session, and optionally raw text. Classification and related metadata are preserved in the cached data blob.

Output format — TOON

billwatch defaults to TOON (Token-Optimized Object Notation), a CSV-flavored format that's roughly 40% fewer tokens than JSON. A list looks like:

bill[3]{id,title,status,last_action}:
  S6953,Relates to the training and use of artificial intelligence frontier models,Signed by Governor,2025-12-19
  S7599,Relates to automated decision-making by government agencies,Signed by Governor,2025-10-07
  S822,Relates to the disclosure of automated employment decision-making tools and maintaining an artificial intelligence inventory,Signed by Governor,2025-12-19

Values containing commas are double-quoted (CSV-style with inner quotes doubled). Newlines are escaped as \n.

A single object:

bill{bill_number,title,status,last_action_date}:
  S6953
  Relates to the training and use of artificial intelligence frontier models
  Signed by Governor
  2025-12-19

If you need real JSON, pass --json.

Errors

Errors go to stdout (not stderr) as structured TOON with an exit code of 1:

error{code,message}:
  cache_miss
  No cached bills for CA. Run 'billwatch import openstates --state CA' to populate the cache.

Error codes you'll see:

  • cache_miss — the requested state has no cached bills; run import openstates first
  • not_found — bill lookup didn't match anything in the cache
  • invalid_state — unknown 2-letter state code
  • invalid_sourceimport <source> used an unknown source (only openstates is supported)
  • missing_pg_urlimport openstates without --pg-url or OPENSTATES_PG_URL
  • missing_dependencypsycopg isn't installed; run uv sync --extra openstates

No prompts, no stderr noise, clean exit codes — safe to pipe and script.

Cache

Stored at ~/.billwatch/cache.db. SQLite with two tables:

  • bills — normalized bill records keyed on (state, bill_number, session)
  • bill_texts — plaintext bill bodies, joined by internal rowid

Plus a bills_fts FTS5 virtual table for local full-text search over cached bills (title, description, bill_number).

The cache is append/replace-only: import openstates upserts whatever's in the current dump. Delete the file to reset.

What billwatch is NOT

  • Not a summarizer. It doesn't categorize, rank, or generate text. If you ask "which IL bills have teeth?" billwatch returns the raw bill texts — an LLM (or a human) does the reasoning.
  • Not a live data source. The cache is only as fresh as the last OpenStates dump you imported (monthly cadence). For sub-monthly freshness, a LegiScan live-refresh layer is tracked in the GitHub issues.
  • Not magic about false positives. Keyword search hits anything containing the word. Many bills tagged "artificial intelligence" are budget line items or unrelated statutes that mention AI in passing. Expect to filter.

Running tests

uv run pytest tests/ -x -q
uvx ruff check src/ tests/
uvx ruff format --check src/ tests/ --line-length 144

All tests are offline — no network access required. The OpenStates importer tests mock psycopg cursors directly, so the suite runs even without psycopg installed.

Architecture sketch

              ┌────────────────────────┐
              │ OpenStates monthly dump│  user restores to local Postgres
              └───────────┬────────────┘
                          │ billwatch import openstates
                          ▼
              ┌────────────────────────┐
              │    SQLite cache        │  bills, bill_texts, bills_fts (FTS5)
              │    ~/.billwatch        │
              └───────────┬────────────┘
                          │ every query hits the cache
                          ▼
              ┌────────────────────────┐
              │          CLI           │  search, bill, text, ls, import
              └────────────────────────┘

Only two Python modules matter at runtime: openstates.py (one-shot bulk importer) and cache.py (SQLite + FTS5 reads). The CLI is a thin Click wrapper on top.

Design notes

Things that are deliberate and worth understanding before changing:

  • No ORM anywhere. SQLite uses raw sqlite3 with parameterized queries; OpenStates Postgres access uses raw psycopg with parameterized queries. Keeps the schema visible, debugging tractable, and dependencies minimal.
  • Unified bill dict shape. Bills normalize to {bill_id, state, bill_number, title, description, status, last_action_date, url, sponsors, data, fetched_at, session, source}. The cache, formatter, and CLI treat this as the canonical shape — any new source (e.g., a future LegiScan adapter) should produce dicts that match.
  • bill_id = None is a feature. The cache's upsert_bill dispatches on whether the caller supplied an integer bill_id or passed None. OpenStates passes None so SQLite auto-assigns a rowid, and the semantic identity lives in (state, bill_number, session).
  • FTS5 update ordering matters. bills_fts is an external-content FTS5 table pointing at bills. Updating a row means: DELETE FROM bills_fts WHERE rowid = ? first (while the old content is still in bills), then UPDATE bills, then INSERT INTO bills_fts with the new values. Deleting after the update would leave stale terms indexed. See upsert_bill in cache.py.
  • Batched enrichment in the importer. The OpenStates importer fetches bills in one query, then batches sponsors/abstracts/sources/texts in 500-row chunks using WHERE bill_id = ANY(%s). Limits round-trips and memory for a 10 GB dump.
  • Optional dependencies load lazily. psycopg is only installed via the [openstates] extra and is imported under try/except ImportError. Normal users don't pay for a dep they won't use, and the test suite runs without it because the tests mock psycopg cursors.
  • Errors on stdout, not stderr. All command failures emit structured TOON error{code,message} on stdout with exit code 1. This makes billwatch safe to pipe and script — consumers can distinguish success from failure by exit code without losing the error payload.
  • Cache-only at query time. search, bill, text, and ls never hit the network. If the cache is stale or missing data, the answer is to re-run import openstates, not to fall back to a live scrape.

Extending billwatch

The current architecture supports exactly one data source (OpenStates bulk import). The planned extension is a LegiScan live-refresh layer that sits alongside the importer and freshens individual bills between monthly dumps — see the LegiScan issue in the GitHub tracker for scope.

When that's built, the seams are:

  1. A new legiscan.py module that returns the unified bill dict shape for a (state, bill_number) lookup.
  2. A refresh hook in bill and text commands that calls LegiScan when a cached entry is older than some TTL.
  3. A quota counter for the LegiScan API (the prior api_usage table was removed when the old scrapers were ripped out — reintroduce it if needed).

Until that lands, any query that needs fresh data should be answered by re-running the monthly import.

See also

  • PROMPT.md — usage guidance for LLM sessions using billwatch
  • AXI principleshttps://axi.md

About

Cached, token-efficient CLI for US state legislative data — an example of AXI-inspired (agent-ergonomic) CLI design

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages