Skip to content

nimbusdata/pollutiondata-sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

license cc-by-4.0
language
en
fr
size_categories
1K<n<10K
tags
canada
environment
climate
emissions
ghg
npri
esg
sustainability
rag
llm-training
pretty_name PollutionData — Canadian GHG & Pollutant Releases (Sample)
configs
config_name data_files
default
split path
train
pollutiondata_sample.jsonl

PollutionData — Canadian GHG & Pollutant Releases (Sample)

Sample of 5,000 Canadian facility-level environmental records across 3,005 facilities. Mix of pollutant releases (NPRI) and greenhouse gas emissions (GHGRP). Last 2 years of data from the full PollutionData API.

Want the full dataset? 1.08M+ records, every NPRI/GHGRP-reporting facility in Canada since 1993, queryable by facility / pollutant / industry / region. Get an API key →

What's in here

Field Type Description
source_id string Originating system (npri or ghgrp)
entity_type string release (NPRI pollutant release) or emission (GHGRP greenhouse gas report)
record_id string Stable per-record identifier from the source
municipality string Facility location
event_date string (ISO date, year-precision) Reporting year (YYYY-01-01)
category string Pollutant / GHG type
facility_id string Stable facility identifier (NPRI/GHGRP)
total_co2e float Total CO2-equivalent emissions in tonnes (where applicable)
data object Full raw record from the source (JSON) — quantities, NAICS, NPRI codes

Composition (sample)

entity_type rows
release (NPRI pollutants) 4,000
emission (GHGRP GHGs) 1,000

3,005 unique facilities represented.

Use cases

  • ESG due diligence — counterparty and supply-chain emissions screening
  • Carbon accounting — Scope 3 input data for upstream supplier emissions
  • Climate research — emissions trends, polluter benchmarking, regional analysis
  • RAG / agents — ground LLM responses in actual Canadian industrial emissions data
  • Compliance & benchmarking — compare facility performance against industry peers
  • Journalism / NGO reporting — identify largest polluters in a region or sector
  • Schema exploration before committing to API integration

Limitations of this sample

  • Snapshot only — frozen 2026-05-25. New reports added annually as facilities file.
  • 5K rows — full dataset is 1.08M+. Sample is enough to test schema, prompts, prototypes; not enough for production analytics.
  • No querying — can't filter by pollutant, NAICS, facility, or year. Use the API for that.
  • Year-precision dates — NPRI/GHGRP are annual reports, not transactional records.

Production access

License & attribution

Data sourced from Canadian federal environmental disclosure programs: NPRI (National Pollutant Release Inventory) and GHGRP (Greenhouse Gas Reporting Program). All source data is under Open Government Licence — Canada. Sample redistribution is permitted under CC-BY-4.0.

Contact

Mirror locations

The same sample is published in three places — pick whichever you prefer:

About

Sample of Canadian pollutiondata data. Full dataset via API at pollutiondata.ca

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors