Skip to content

v0.3.0

Compare
Choose a tag to compare
@grgmiller grgmiller released this 29 Dec 19:56
· 195 commits to main since this release
939dd3b

Updates PUDL dependency (#318 )

  • Updates pudl dependency from v2022.11.30 to v2023.12.01, which includes a number of updates to the database structure and naming conventions (see pudl release notes)
  • Changes source of PUDL database download to AWS rather than Zenodo, providing faster access to PUDL data releases
  • PUDL’s CEMS database now includes data from AK, HI, and PR, which should improve hourly emissions data coverage for plants in AK and HI
  • A cleaned and standardized version of the EPA-EIA power sector data crosswalk is now included in the pudl database, meaning we no longer have to manually load and standardize this data
  • Emissions control equipment data from EIA-860 is now included in the pudl database, meaning we no longer need to manually load and standardize this data
  • Leading zeros removed from boiler_ids, which should improve mapping between boiler tables
  • The EIA-923 generation and fuel allocation process is now fully integrated into PUDL
  • Fixes an issue where certain plants in NY state were being assigned the wrong BA code.

Adds 2022 data (#322)

  • Integrates Final release input data from the EIA and EPA for 2022
  • Adds 2022 OGE outputs

Manual reference table update (#322)

  • Most reference tables did not require updating
  • NOX and SO2 emissions factors: added new factors for boiler configurations that had not previously been included in the table.
  • Balancing Areas: Added retirement dates for the CFE (July 2018), GLHB (September 2022), GRIF (November 2023) balancing areas
  • Added new EPA-EIA plant and unit crosswalks based on 2022 data
  • Added several new mappings between utilities and balancing areas

Infrastructure Updates

  • Updates Python dependency from 3.10 to 3.11
  • Refactors and packages OGE codebase so that functions, reference tables, and data from OGE can be imported into other projects. This package will go live on PyPi soon. (#323)
  • Re-organizes location of data files. The data/manual files have been renamed to reference_tables and moved to src/oge, while all downloads, output files, and result files will now be saved in the user’s home directory in a folder called open_grid_emissions_data (#324)
  • Adds support for pipenv environment management in addition to conda (#313)
  • Changes PUDL and gridemissions dependencies to forks within the singularity-energy organization, rather than forked versions that lived in individual authors’ github accounts.
  • Moves documentation from separately-maintained repo into the OGE repo (#303)
  • Changes code formatting from black to ruff and adds formatting checks that must pass before merging code (#317)

Other bug/data quality fixes

  • Ensure complete as possible EPA-EIA power sector data crosswalk by combining pudl-standardized PSDC, plant code mappings from eGRID, and our own manual crosswalking.
  • Add handling for negative fuel consumption reported in EIA-923
  • Stop dropping missing and zero values to help ensure complete timeseries
  • Previously, we had dropped data from CEMS that reflected units that only reported steam generation but no electricity generation. Based on an updated understanding of this data, we no longer drop this data from OGE.
  • Fixes bug in EIA-923 generation and fuel allocation process that was resulting in certain reported fuel consumption data being dropped for plants that retire mid-year
  • Updates manual timestamp corrections to EIA-930 data for 2022 and on CAISO data (#300), 2021 and on TEPC data (#322)

Adds new data validation checks

  • Flags when different plant primary fuel identification methods result in different primary fuel assignments: Exports the primary_fuel_table with all intermediate columns to outputs to help with validation. Adds a new validation check to flag when the plant primary fuel assigned by the pipeline does not match the capacity-based primary fuel assignment. (#296)
  • Flags when subplants only contain a single combined cycle component: Combined cycle generators contain a steam part (CA) and turbine part (CT) that are linked together. Thus, our subplant groups that contain one part of a combined cycle plant should always in theory contain the other part as well. This PR adds a test that checks that both parts exist in a subplant if one exists. Besides CT and CA prime movers, there is also CS prime movers which represent a "single shaft" combined cycle unit where the steam and turbine parts share a single generator. These prime movers are allowed to be by themselves in a subplant, as are CC prime movers, which represent a "total unit." This PR adds a prime_mover_code column to the subplant crosswalk table to help validating this.(#297)
  • Checks for complete monthly data within a single year: Checks that 12 monthly “report_date”s exist for each plant/subplant, and also checks that the number of missing monthly datapoints matches the number of missing datapoints in the input data from CEMS and EIA-923.
  • Checks for complete hourly timestamps within a single year or single month: If the period is a 'year', checks that the length of the timeseries is 8760 (for a non-leap year) or 8784 (for a leap year). If the period is a 'month', checks that the length of the timeseries is equal to the length of the complete date_range between the earliest and latest timestamp in a month.(#299)
  • Exports a new output table that identifies whether input data (and non-zero input data) exists for each plant in EIA-923 and/or CEMS.