You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PollutionData — Canadian GHG & Pollutant Releases (Sample)
configs
config_name
data_files
default
split
path
train
pollutiondata_sample.jsonl
PollutionData — Canadian GHG & Pollutant Releases (Sample)
Sample of 5,000 Canadian facility-level environmental records across 3,005 facilities.
Mix of pollutant releases (NPRI) and greenhouse gas emissions (GHGRP). Last 2
years of data from the full PollutionData API.
Want the full dataset? 1.08M+ records, every NPRI/GHGRP-reporting facility
in Canada since 1993, queryable by facility / pollutant / industry / region.
Get an API key →
What's in here
Field
Type
Description
source_id
string
Originating system (npri or ghgrp)
entity_type
string
release (NPRI pollutant release) or emission (GHGRP greenhouse gas report)
record_id
string
Stable per-record identifier from the source
municipality
string
Facility location
event_date
string (ISO date, year-precision)
Reporting year (YYYY-01-01)
category
string
Pollutant / GHG type
facility_id
string
Stable facility identifier (NPRI/GHGRP)
total_co2e
float
Total CO2-equivalent emissions in tonnes (where applicable)
data
object
Full raw record from the source (JSON) — quantities, NAICS, NPRI codes
Composition (sample)
entity_type
rows
release (NPRI pollutants)
4,000
emission (GHGRP GHGs)
1,000
3,005 unique facilities represented.
Use cases
ESG due diligence — counterparty and supply-chain emissions screening
Carbon accounting — Scope 3 input data for upstream supplier emissions
Climate research — emissions trends, polluter benchmarking, regional analysis
RAG / agents — ground LLM responses in actual Canadian industrial emissions data
Compliance & benchmarking — compare facility performance against industry peers
Journalism / NGO reporting — identify largest polluters in a region or sector
Schema exploration before committing to API integration
Limitations of this sample
Snapshot only — frozen 2026-05-25. New reports added annually as facilities file.
5K rows — full dataset is 1.08M+. Sample is enough to test schema, prompts, prototypes; not enough for production analytics.
No querying — can't filter by pollutant, NAICS, facility, or year. Use the API for that.
Year-precision dates — NPRI/GHGRP are annual reports, not transactional records.
Data sourced from Canadian federal environmental disclosure programs: NPRI
(National Pollutant Release Inventory) and GHGRP (Greenhouse Gas Reporting
Program). All source data is under Open Government Licence — Canada. Sample
redistribution is permitted under CC-BY-4.0.