Spikes in emission rate data #230

gailin-p · 2022-09-13T17:00:16Z

Some BAs have negative spikes in emission rate data. (negative meaning they deviate negatively from the trend)

Eg, PJM:

WACM:

Both the above figures compare real-time (red) to OGE (blue). Both show adjusted emission rate (CO2/MWh). In the OGE data, the relevant variable is generated_co2_rate_lb_per_mwh_for_electricity_adjusted in power_sector_data.

In some small BAs, rate spikes are expected (eg, DEAA where there is only one natural_gas plant and so plant startup leads to enormous positive BA-level emission rate spikes). However, we don't expect spikes that are abrupt decreases in rate (eg WACM), and we don't expect spikes in large BAs (eg, PJM)

The text was updated successfully, but these errors were encountered:

grgmiller · 2022-09-13T18:26:46Z

Ok so I went digging specifically into the PJM data and this is what I found:

There are some pretty big spikes in the nuclear net generation data in PJM:

So how is this getting introduced? Through the residual profile shaping. This is because the cems profile for nuclear in PJM includes some large spikes in generation that are getting subtracted from the nuclear generation in EIA-930 (although I thought that we had implemented a check that prevents such shifted profiles from being used)

So how is there nuclear data in CEMS? Nuclear plants don't report data to CEMS! What's happening is that the PSEG Salem Generation Station in New Jersey (plant ID 2410) is a nuclear plant has an oil-fired generator (likely some sort of backup generator) that does report data to CEMS, and seems to have been briefly fired up three times in 2020. When we aggregate the CEMS data for use in the residual calculation, we do this by the plant primary fuel, and not the generator-specific fuel, which means this intermittent petroleum generation at this nuclear plant is treated as nuclear generation.

It seems like this issue might also be relevant to the Turkey Point nuclear plant (id 621) in FPL. This may also affect other fuel types that don't report to CEMS, but may have a backup generator that does (solar, wind, hydro, nuclear).

So how do we fix this? The option that first comes to mind is that for fuel types that don't report to CEMS (solar, wind, hydro, nuclear), we set the CEMS profiles to zero before calculating the residual profile. This would prevent any backup generators from influencing the residual profile. However, we'd want to think carefully about this approach to ensure that we are not double-counting or under counting any data.

It actually looks like there are a number of plants with a primary clean fuel, but which have at least one generator that runs on a non-clean fuel:

gailin-p · 2022-09-13T20:09:37Z

I thought that we had implemented a check that prevents such shifted profiles from being used

In impute_hourly_profiles.select_best_available_profile, it looks like we 1) check residual profiles for negative values and 2) check the shifted profile to not use it if it's greater than EIA 930. It seems like that check isn't working as expected here

we set the CEMS profiles to zero before calculating the residual profile.

I think this would work, and it shouldn't introduce issues with double counting or not counting data, since the residual profile is not used directly as data, only to shape data from EIA-923. We would just have to be sure not to delete the data from the actual CEMS table, only from the version passed to the residual profile calculation
While this would work, I think we should only do it if the problem persists after fixing the above bug.

grgmiller · 2022-09-13T20:45:42Z

I think another root issue here that I'm just realizing is that even if the CEMS data included some small amount of backup generation from a diesel generator, the magnitude of that should be pretty small and not affect the residual profile that much. Looking at the data here though, the diesel generation spikes are many orders of magnitude higher than the total reported nuclear generation.

This raises the possibility that the source of this issue is actually outlier data in CEMS (#50)

Zooming in on the eia930 profile for nuclear, I'm also noticing something else funny, which is that the nuclear profile seems to be pretty variable:

However, looking at the reported EIA-930 data for nuclear in January 2020 reveals what we would expect: that nuclear should be pretty flat:

I'm thinking that this is likely a result of the physics reconciliation code, and makes me think that we need to adjust the weighting on the reconciliation so that certain fuel types like nuclear can't be adjusted in this way, or go back to using the raw generation data for the residual calculation (#104).

gailin-p · 2022-09-26T15:20:33Z

It turns out that this is not an issue with shaping at all -- our filters work correctly. Although the cems_profile, shifted_residual_profile, and scaled_residual_profile are all broken by the extreme CEMS values in the three problem hours in PJM, the selected profile in those hours is eia930_profile. This profile still has the issues @grgmiller noted above (gridemissions physics-based cleaning has introduced unrealistic variability into the 930 profile for nuclear), it does not have the extreme spikes. The profile column in outputs/2020/hourly_profiles_2020.csv supports this conclusion, with no extreme spikes for PJM nuclear.

~~The spikes are instead introduced directly by the CEMS data from plant 2410.~~ This is supported by the following state of current output and result data:

the spikes do not show up in shaped 923 data (outputs/2020/shaped_eia923_data_2020.csv)
the spikes do show up in plant-level outputs (results/2020/plant_data/hourly/us_units/individual_plant_data.csv`)

~~There are two possible solutions to this:~~
* CEMS data cleaning. A simple outlier filter would discard these three days. However, we would need to be careful to not discard correct fossil generation from renewable generators, which may still be outliers (if most hours have zero CEMS generation) but may represent real backup generation.
* Discard CEMS data from renewable generators. I think this is the incorrect choice, since the backup generation is real (see plants listed by @grgmiller above). Although I do think this backup generation requires more thought. eg: do our gross-to-net conversion methods work for backup generation from mostly renewable plants? Should net generation ever be positive from a backup generator?

gailin-p · 2022-09-26T18:34:06Z

It turns out that the spikes are actually introduced by scaling partial CEMS data; specifically, the partial_cems_plant category.

The data

The original CEMS data for plant 2410 is fine, showing between 1 and 9 MWh of generation for each of the three hours in 2020 where the backup generator is on:

Fig 1: CEMS generation (blue) for 2020 for plant 2410. A backup generator appears to be on and reporting to CEMS for a total of 3 hours. This data is from our output/cems_2020.csv intermediate output file.

However, in our final output, we've scaled all April, May, and June generation from 2410 into those 4 hours, resulting in the following profile:

Fig 2: Plant 2410 generation from our results/plant_data/individual_plant_data.csv result file.

The issue

Plant 2410 is shaped using the partial_cems_plant methodology. This category of data includes plants where one or more subplants in a plant report to CEMS and one or more remaining subplants do not. In this case, we shape the non-reporters using the reporter shape. For plant 2410, all of the 2020 nuclear generation appears to be allocated to the three hours with non-zero CEMS generation for the one reporting subplant in 2410.

The fix

I think the fix here is to make partial_cems_plant sensitive to fuel type: subplants of one fuel type should not be shaped if the available CEMS data is from a different fuel type. This change should be made in data_cleaning.identify_partial_cems_plants.

Side note: I confirmed that this issue does not appear to result in double-counting. After checking our nuclear generation totals against eGRID nuclear generation totals for PJM, it looks like generation from 2410 is not being double-counted.

grgmiller · 2022-09-27T03:38:03Z

I think the fix here is to make partial_cems_plant sensitive to fuel type: subplants of one fuel type should not be shaped if the available CEMS data is from a different fuel type. This change should be made in data_cleaning.identify_partial_cems_plants.

Agreed! Let's definitely implement this

grgmiller · 2022-09-29T17:51:17Z

Is there a validation check that we could add to the pipeline to automatically screen for spikes like this and alert us?

gailin-p added bug Something isn't working data repair Interpolating or extrapolating data that we don't actually have labels Sep 13, 2022

grgmiller added this to the v0.2.0 milestone Sep 13, 2022

gailin-p self-assigned this Sep 26, 2022

gailin-p mentioned this issue Sep 27, 2022

Ensure complete subplant_id mapping #49

Open

3 tasks

gailin-p mentioned this issue Sep 28, 2022

Update partial CEMS shaping for mixed fuel plants #238

Merged

This was referenced Oct 1, 2022

Update subplant mapping #239

Merged

Adjust physics-based reconciliation of EIA-930 data #240

Open

grgmiller mentioned this issue Oct 26, 2022

v0.1.2 #251

Merged

gailin-p closed this as completed in 283cce2 Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spikes in emission rate data #230

Spikes in emission rate data #230

gailin-p commented Sep 13, 2022 •

edited

Loading

grgmiller commented Sep 13, 2022

gailin-p commented Sep 13, 2022

grgmiller commented Sep 13, 2022

gailin-p commented Sep 26, 2022 •

edited

Loading

gailin-p commented Sep 26, 2022 •

edited

Loading

grgmiller commented Sep 27, 2022

grgmiller commented Sep 29, 2022

Spikes in emission rate data #230

Spikes in emission rate data #230

Comments

gailin-p commented Sep 13, 2022 • edited Loading

grgmiller commented Sep 13, 2022

gailin-p commented Sep 13, 2022

grgmiller commented Sep 13, 2022

gailin-p commented Sep 26, 2022 • edited Loading

gailin-p commented Sep 26, 2022 • edited Loading

The data

The issue

The fix

grgmiller commented Sep 27, 2022

grgmiller commented Sep 29, 2022

gailin-p commented Sep 13, 2022 •

edited

Loading

gailin-p commented Sep 26, 2022 •

edited

Loading

gailin-p commented Sep 26, 2022 •

edited

Loading