gov-doc-parser

Parse and extract structured data from UK government documents — GOV.UK, Hansard, ICO, FCA, BAILII, and ATRS. Research and governance analysis toolkit.

Install

pip install gov-doc-parser

Zero external dependencies — pure Python stdlib.

Quick start

from gov_doc_parser import GovDocParser

parser = GovDocParser()

# Parse any UK gov source
doc = parser.parse(html_text, source="ico")
print(doc.title, doc.date, doc.metadata)

# Auto-detect from URL
doc = parser.parse(html, source="auto", url="https://ico.org.uk/...")

# Extract AI references with sentiment
result = parser.parse_full(html, source="govuk")
for ref in result.ai_references:
    print(f"[{ref.sentiment}] {ref.term}: {ref.context[:100]}")
# [regulatory] algorithm: ...automated decision-making must comply with UK GDPR...
# [negative] artificial intelligence: ...AI found to be unlawful under Equality Act...

# Parse ATRS record
doc, atrs = parser.parse_atrs(atrs_html)
print(atrs.system_name, atrs.governance_score)  # 0-100 transparency score
print(atrs.dpia_completed, atrs.human_review, atrs.legal_basis)

# Batch
results = parser.batch_parse([
    {"html": govuk_html, "source": "govuk"},
    {"html": ico_html, "source": "ico"},
])

Supported sources

Source	Extracts
GOV.UK	Title, date, department, document type, sections
Hansard	Date, house, speaker, debate text, AI mentions
ICO	Enforcement type, penalty amount, decision text
FCA	Document type (Dear CEO/PS/CP), FCA reference
BAILII	Citation, court, judge, judgment text
ATRS	System name, risk tier, DPIA status, governance score

CLI

gov-doc-parser document.html --source ico
gov-doc-parser document.html --source govuk --ai-refs
gov-doc-parser atrs_record.html --atrs
gov-doc-parser document.html --json

Linda Oraegbunam | LinkedIn | GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
src/gov_doc_parser		src/gov_doc_parser
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gov-doc-parser

Install

Quick start

Supported sources

CLI

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gov-doc-parser

Install

Quick start

Supported sources

CLI

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages