Skip to content

mjbommar/openmpsc-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OpenMPSC Data

Open dataset of regulatory proceedings from the Michigan Public Service Commission (MPSC), spanning November 1987 through April 2026.

This repository holds the static data snapshot that accompanies the paper OpenMPSC: An Open Dataset of Michigan Public Service Commission Regulatory Proceedings. The same snapshot is bundled in the paper repository under data/; this repo exists as a stand-alone, citable distribution point.

For continuously updated, daily-refreshed access (including full PDFs and extracted text), use the public REST API and web interface at https://openmpsc.com.

Snapshot

  • Snapshot date: 2026-04-30
  • Coverage: 1987-11-12 to 2026-04-29
  • License: CC BY 4.0

Headline numbers

Cases 6,136
Filings 164,865
Orders 12,685
Public comments 12,309
Hearings 2,635
Party records 12,308
Commission meetings 672
Total PDF pages 3,812,492
Total PDF storage (live) ~149 GB
Extracted plain text (live) ~5.4 GB

(PDFs and full text are not redistributed in this snapshot; they are retrievable from the API.)

Files

All CSVs are zstd -19 compressed. Decompress with zstd -d <file>.csv.zst or stream with zstdcat <file>.csv.zst | ....

File Rows Description
data/summary_stats.csv.zst 30 Headline metrics; every count cited in the paper.
data/taxonomy_categories.csv.zst 14 Filing taxonomy top-level categories with counts.
data/taxonomy_subcategories.csv.zst 205 Subcategory codes with counts.
data/case_types.csv.zst 41 All MPSC case types with case counts, filing counts, and filings-per-case.
data/cases.csv.zst 6,136 One row per case: number, subject, industry, type, status flag, open date, lead company, child counts.
data/filings.csv.zst 164,865 One row per filing: filing number, case type, filing type, description, file date, filer, classification (category / subcategory / party type), num_pages, file size.
data/orders.csv.zst 12,685 One row per order: order number, title, order date, file size.
data/comments.csv.zst 12,309 Public comments: commenter name, anonymity flag, comment text, submission timestamp, has-attachment flag.
data/hearings.csv.zst 2,635 Hearings: hearing type, hearing date, start time, virtual flag, cancellation flag.
data/parties.csv.zst 12,308 Party records: party name, role, attorney firm.
data/meetings.csv.zst 672 Commission meeting documents: id, filename, dates, type, page count, YouTube URL. Extracted PDF text excluded for size.

Known limitations of this snapshot

This snapshot was generated from an export pipeline that flattens the database to per-table CSVs without join keys or several optional columns. Specifically:

  • case_number is empty in filings, orders, comments, hearings, and parties. The snapshot is sufficient for per-table aggregate analysis (counts, distributions, classifications) but does not support case-level joins.
  • orders.num_pages is empty in this export (live data has it).
  • comments omits case_number, organization_name, commenter_city, commenter_state, and subject; only commenter_name, is_anonymous, comment_text, submitted_at, and has_attachment are populated.
  • hearings omits case_number, title, end_time, location, and alj_name; aggregate counts and the type / date / cancelled-flag analyses still hold.
  • parties omits case_number and attorney_name; role distributions and unique-party counts still hold.
  • cases.is_closed is 0 for all rows in this export; the live API exposes the correct status flag (96.6% of cases are closed in the live system).
  • cases.close_date and cases.parent_case_number are empty.

For analyses that require case-level joins, the populated fields above, or the closed-case flag, use the public REST API at https://openmpsc.com/api/v1/. A future revision of this snapshot will restore the join keys.

Schema notes

  • All dates are ISO 8601 (YYYY-MM-DD or full timestamp).
  • Empty strings in date columns mean "not set" (treated as NULL).
  • is_anonymous / is_virtual / is_cancelled / has_attachment are 0 / 1 integer flags.
  • case_number follows the MPSC convention (e.g., U-21990) — but is empty in this snapshot for every table other than cases.csv.zst and case_types.csv.zst (see Known limitations above).
  • filing_category is one of 14 codes (the 12-category taxonomy plus UNK and XXX placeholders for low-confidence classifications); filing_subcategory mostly follows the CAT-SUB pattern (e.g., TES-DIR for direct testimony), with a small tail of malformed LLM outputs that are retained rather than silently re-mapped.
  • party_type on filings is one of Company, Intervenor, Staff, AG (Attorney General), Public, or Unknown.

Quick start

git clone https://github.com/mjbommar/openmpsc-data
cd openmpsc-data

# decompress one table
zstd -d data/cases.csv.zst

# or stream into pandas
python -c "
import pandas as pd, zstandard as zstd, io
with open('data/filings.csv.zst', 'rb') as f:
    raw = zstd.ZstdDecompressor().decompress(f.read())
df = pd.read_csv(io.BytesIO(raw))
print(df.shape, df.columns.tolist())
"

Citation

If you use this dataset, please cite the accompanying paper:

@misc{bommarito2026openmpsc,
  title  = {{OpenMPSC}: An Open Dataset of {M}ichigan Public Service Commission Regulatory Proceedings},
  author = {Bommarito, Michael J.},
  year   = {2026},
  url    = {https://openmpsc.com}
}

License

The derived dataset in this repository (the structure of these CSVs, the LLM-generated classification labels, and the snapshot organization) is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

The underlying filings, orders, and public comments were collected from publicly accessible MPSC sources. Michigan's Freedom of Information Act establishes a right of public inspection and copying for non-exempt agency records, but does not by itself convey a redistribution license; individual documents may be subject to authorial copyright or other downstream restrictions. We redistribute these records in good faith for non-commercial research and public-interest use, and downstream users should evaluate their own use independently.

LLM-generated classifications in filings.csv.zst are research aids and do not constitute legal analysis or official MPSC categorization.

Issues and updates

For bugs in the data, schema questions, or requests, open an issue on this repo. For the live, daily-updated archive (and full PDF / text retrieval), see https://openmpsc.com.

About

Static data snapshot for the OpenMPSC paper — Michigan Public Service Commission regulatory proceedings, 1987–2026 (CC BY 4.0). Live archive: https://openmpsc.com

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors