Ensure complete `subplant_id` mapping #49

grgmiller · 2022-06-07T17:32:26Z

Currently, subplant IDs are only created for units that exist both in CEMS and EIA-923, meaning that there are certain generators/units that have a subplant ID of NaN.

Ensure that all merge and groupby functions that use subplant_id as one of the keys are not dropping observations with missing subplant values.
Although the primary purpose of the subplant ID is to group CEMS units with EIA generators and boilers, it could also be useful for grouping EIA boilers and generators that do not exist in CEMS. We should update the pudl.analysis.epa_crosswalk code to generate subplant IDs for all boilers/generators that exist in the EIA data, regardless of whether data exists in CEMS.
If there are any remaining missing subplant values, we should perhaps fill these missing values with a code of 99 so that there is a non-missing code that would not overlap with any subplant ids already assigned during the crosswalk process.

The text was updated successfully, but these errors were encountered:

grgmiller · 2022-06-14T19:42:45Z

I should check whether subplant ids are used at all in the clean_eia923() function, but adding subplant ids for EIA-only data might become irrelevant (at least for the initial public release) if we are grouping this data by BA-fuel anyway.

grgmiller · 2022-07-07T17:13:30Z

It appears that certain plants/generators that exist in both CEMS EIA-860 are missing from the crosswalk.

One reason for this might be that we currently inner join the CEMS ids with EIA ids from EIA-923 and not EIA-860, but it is possible that EIA-860 is more complete.

gailin-p · 2022-07-07T17:14:18Z

One example of a missing plant is plant_id_eia=2379, which has two generators according to EIA-860 (CA1 and CA2).

grgmiller · 2022-07-07T18:58:54Z

So at least part of the issue was that when we were filtering the CEMS data using the EPA crosswalk, certain units were being dropped because of a mismatch in unitid: In the CEMS data, we had stripped leading zeros from the id, but in the crosswalk, we did not, which was leading to those plants being dropped. I've now fixed that issue.

grgmiller · 2022-07-20T22:34:57Z

Maybe we can get this fixed in PUDL: catalyst-cooperative/pudl#1769
It also looks like EPA is getting ready to release a new version of the crosswalk, which may improve the coverage for subplant mapping: USEPA/camd-eia-crosswalk#25 (comment)

gailin-p · 2022-09-27T17:51:19Z

Fuel category differences within subplants with `subplant_id=NaN`

In some cases, generators in a single plant missing from subplant_crosswalk have a mix of renewable and fossil fuel types. This occurs in 74 subplant-months in plants 141, 621, 1943, 2240, 10025, 10823, and 58236. In these cases, all generators in the plant which are not in subplant_crosswalk are assigned the same subplant, subplant_id=NaN.

In #230, we propsoed that subplants within a plant should not share the same CEMS profile (hourly shaping method partial_cems_plant) when they have different primary fuel types, since this resulted in one case where all nuclear generation from a large nuclear power plant plant_id_eia=2410 was being assigned to the 3 hours where a backup diesel generator was on and reporting to CEMS. However, because renewable and fossil generators are combined in each of the subplants listed above, the renewable and fossil generators cannot be assigned different profiles.

If the renewable and fossil generators were assigned different subplants, we could safely use partial_cems_plant to shape the subplant with the fossil generators and a residual profile method to shape the subplant with the renewable generators. This would be conceptually more correct than choosing one method to apply to a sublant with mixed fossil and renewable generation.

To fix this, we would need to update subplant crosswalk (see @grgmiller 's comments above, we could potentially do this in PUDL) to assign different subplant IDs to generators within a plant whose fuel types differ.

gailin-p · 2022-09-27T17:52:05Z

adding subplant ids for EIA-only data might become irrelevant (at least for the initial public release) if we are grouping this data by BA-fuel anyway.

Since hourly data is shaped at the subplant level, I think this does end up affecting currently released data.

grgmiller · 2022-09-28T16:06:04Z

I think that one way to fix this issue would be to take advantage of the existing unit_id_pudl identifiers created by the pudl data pipeline (see the "Unit mapping through network analysis" section of this blog post for more information). These unit_id_pudl are created using the same network analysis that is used for the subplant_id mapping, but only based on EIA data. However, in order to use these unit_id_pudl alongside the subplant_id, the two would likely need to be harmonized (or potentially just used as two separate keys). See catalyst-cooperative/pudl#1769 for more background on this harmonization issue.

grgmiller · 2022-09-29T00:09:09Z

As noted in catalyst-cooperative/pudl#1769 (comment), I've actually noticed that the current subplant id mapping is not behaving as expected (mapping units to generators and boilers) because it ignores all of the boiler-generator associations.

grgmiller added methodology Improve methodology data cleaning Cleaning and standardizing data labels Jun 7, 2022

grgmiller added this to the Initial Public Release milestone Jun 7, 2022

grgmiller self-assigned this Jun 7, 2022

grgmiller mentioned this issue Jul 8, 2022

fix missing subplant ids #136

Merged

grgmiller mentioned this issue Jul 17, 2022

hourly profile updates #142

Merged

grgmiller modified the milestones: Initial Public Release, Version 2 Release Jul 24, 2022

gailin-p mentioned this issue Sep 28, 2022

Update partial CEMS shaping for mixed fuel plants #238

Merged

This was referenced Oct 1, 2022

Update subplant mapping #239

Merged

Update pudl subplant crosswalk code #242

Open

grgmiller added the crosswalk improve crosswalking between data sources label Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure complete `subplant_id` mapping #49

Ensure complete `subplant_id` mapping #49

grgmiller commented Jun 7, 2022 •

edited

Loading

grgmiller commented Jun 14, 2022

grgmiller commented Jul 7, 2022

gailin-p commented Jul 7, 2022 •

edited

Loading

grgmiller commented Jul 7, 2022

grgmiller commented Jul 20, 2022

gailin-p commented Sep 27, 2022 •

edited

Loading

gailin-p commented Sep 27, 2022

grgmiller commented Sep 28, 2022

grgmiller commented Sep 29, 2022

Ensure complete subplant_id mapping #49

Ensure complete subplant_id mapping #49

Comments

grgmiller commented Jun 7, 2022 • edited Loading

grgmiller commented Jun 14, 2022

grgmiller commented Jul 7, 2022

gailin-p commented Jul 7, 2022 • edited Loading

grgmiller commented Jul 7, 2022

grgmiller commented Jul 20, 2022

gailin-p commented Sep 27, 2022 • edited Loading

Fuel category differences within subplants with subplant_id=NaN

gailin-p commented Sep 27, 2022

grgmiller commented Sep 28, 2022

grgmiller commented Sep 29, 2022

Ensure complete `subplant_id` mapping #49

Ensure complete `subplant_id` mapping #49

grgmiller commented Jun 7, 2022 •

edited

Loading

gailin-p commented Jul 7, 2022 •

edited

Loading

gailin-p commented Sep 27, 2022 •

edited

Loading

Fuel category differences within subplants with `subplant_id=NaN`