Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spikes in emission rate data #230

Closed
gailin-p opened this issue Sep 13, 2022 · 7 comments
Closed

Spikes in emission rate data #230

gailin-p opened this issue Sep 13, 2022 · 7 comments
Assignees
Labels
bug Something isn't working data repair Interpolating or extrapolating data that we don't actually have

Comments

@gailin-p
Copy link
Collaborator

gailin-p commented Sep 13, 2022

Some BAs have negative spikes in emission rate data. (negative meaning they deviate negatively from the trend)

Eg, PJM:
Screen Shot 2022-09-13 at 12 48 40 PM

WACM:
Screen Shot 2022-09-13 at 12 50 11 PM

Both the above figures compare real-time (red) to OGE (blue). Both show adjusted emission rate (CO2/MWh). In the OGE data, the relevant variable is generated_co2_rate_lb_per_mwh_for_electricity_adjusted in power_sector_data.

In some small BAs, rate spikes are expected (eg, DEAA where there is only one natural_gas plant and so plant startup leads to enormous positive BA-level emission rate spikes). However, we don't expect spikes that are abrupt decreases in rate (eg WACM), and we don't expect spikes in large BAs (eg, PJM)

@gailin-p gailin-p added bug Something isn't working data repair Interpolating or extrapolating data that we don't actually have labels Sep 13, 2022
@grgmiller
Copy link
Collaborator

Ok so I went digging specifically into the PJM data and this is what I found:

There are some pretty big spikes in the nuclear net generation data in PJM:
image

So how is this getting introduced? Through the residual profile shaping. This is because the cems profile for nuclear in PJM includes some large spikes in generation that are getting subtracted from the nuclear generation in EIA-930 (although I thought that we had implemented a check that prevents such shifted profiles from being used)
image

So how is there nuclear data in CEMS? Nuclear plants don't report data to CEMS! What's happening is that the PSEG Salem Generation Station in New Jersey (plant ID 2410) is a nuclear plant has an oil-fired generator (likely some sort of backup generator) that does report data to CEMS, and seems to have been briefly fired up three times in 2020. When we aggregate the CEMS data for use in the residual calculation, we do this by the plant primary fuel, and not the generator-specific fuel, which means this intermittent petroleum generation at this nuclear plant is treated as nuclear generation.

It seems like this issue might also be relevant to the Turkey Point nuclear plant (id 621) in FPL. This may also affect other fuel types that don't report to CEMS, but may have a backup generator that does (solar, wind, hydro, nuclear).

So how do we fix this? The option that first comes to mind is that for fuel types that don't report to CEMS (solar, wind, hydro, nuclear), we set the CEMS profiles to zero before calculating the residual profile. This would prevent any backup generators from influencing the residual profile. However, we'd want to think carefully about this approach to ensure that we are not double-counting or under counting any data.

It actually looks like there are a number of plants with a primary clean fuel, but which have at least one generator that runs on a non-clean fuel:
image
image

@gailin-p
Copy link
Collaborator Author

I thought that we had implemented a check that prevents such shifted profiles from being used

In impute_hourly_profiles.select_best_available_profile, it looks like we 1) check residual profiles for negative values and 2) check the shifted profile to not use it if it's greater than EIA 930. It seems like that check isn't working as expected here

we set the CEMS profiles to zero before calculating the residual profile.

I think this would work, and it shouldn't introduce issues with double counting or not counting data, since the residual profile is not used directly as data, only to shape data from EIA-923. We would just have to be sure not to delete the data from the actual CEMS table, only from the version passed to the residual profile calculation
While this would work, I think we should only do it if the problem persists after fixing the above bug.

@grgmiller
Copy link
Collaborator

I think another root issue here that I'm just realizing is that even if the CEMS data included some small amount of backup generation from a diesel generator, the magnitude of that should be pretty small and not affect the residual profile that much. Looking at the data here though, the diesel generation spikes are many orders of magnitude higher than the total reported nuclear generation.
image

This raises the possibility that the source of this issue is actually outlier data in CEMS (#50)

Zooming in on the eia930 profile for nuclear, I'm also noticing something else funny, which is that the nuclear profile seems to be pretty variable:
image

However, looking at the reported EIA-930 data for nuclear in January 2020 reveals what we would expect: that nuclear should be pretty flat:
image

I'm thinking that this is likely a result of the physics reconciliation code, and makes me think that we need to adjust the weighting on the reconciliation so that certain fuel types like nuclear can't be adjusted in this way, or go back to using the raw generation data for the residual calculation (#104).

@grgmiller grgmiller added this to the v0.2.0 milestone Sep 13, 2022
@gailin-p gailin-p self-assigned this Sep 26, 2022
@gailin-p
Copy link
Collaborator Author

gailin-p commented Sep 26, 2022

It turns out that this is not an issue with shaping at all -- our filters work correctly. Although the cems_profile, shifted_residual_profile, and scaled_residual_profile are all broken by the extreme CEMS values in the three problem hours in PJM, the selected profile in those hours is eia930_profile. This profile still has the issues @grgmiller noted above (gridemissions physics-based cleaning has introduced unrealistic variability into the 930 profile for nuclear), it does not have the extreme spikes. The profile column in outputs/2020/hourly_profiles_2020.csv supports this conclusion, with no extreme spikes for PJM nuclear.

The spikes are instead introduced directly by the CEMS data from plant 2410. This is supported by the following state of current output and result data:

  • the spikes do not show up in shaped 923 data (outputs/2020/shaped_eia923_data_2020.csv)
  • the spikes do show up in plant-level outputs (results/2020/plant_data/hourly/us_units/individual_plant_data.csv`)

There are two possible solutions to this:
* CEMS data cleaning. A simple outlier filter would discard these three days. However, we would need to be careful to not discard correct fossil generation from renewable generators, which may still be outliers (if most hours have zero CEMS generation) but may represent real backup generation.
* Discard CEMS data from renewable generators. I think this is the incorrect choice, since the backup generation is real (see plants listed by @grgmiller above). Although I do think this backup generation requires more thought. eg: do our gross-to-net conversion methods work for backup generation from mostly renewable plants? Should net generation ever be positive from a backup generator?

@gailin-p
Copy link
Collaborator Author

gailin-p commented Sep 26, 2022

It turns out that the spikes are actually introduced by scaling partial CEMS data; specifically, the partial_cems_plant category.

The data

The original CEMS data for plant 2410 is fine, showing between 1 and 9 MWh of generation for each of the three hours in 2020 where the backup generator is on:

Screen Shot 2022-09-26 at 2 29 04 PM
Fig 1: CEMS generation (blue) for 2020 for plant 2410. A backup generator appears to be on and reporting to CEMS for a total of 3 hours. This data is from our output/cems_2020.csv intermediate output file.

However, in our final output, we've scaled all April, May, and June generation from 2410 into those 4 hours, resulting in the following profile:

Screen Shot 2022-09-26 at 2 33 00 PM
Fig 2: Plant 2410 generation from our results/plant_data/individual_plant_data.csv result file.

The issue

Plant 2410 is shaped using the partial_cems_plant methodology. This category of data includes plants where one or more subplants in a plant report to CEMS and one or more remaining subplants do not. In this case, we shape the non-reporters using the reporter shape. For plant 2410, all of the 2020 nuclear generation appears to be allocated to the three hours with non-zero CEMS generation for the one reporting subplant in 2410.

The fix

I think the fix here is to make partial_cems_plant sensitive to fuel type: subplants of one fuel type should not be shaped if the available CEMS data is from a different fuel type. This change should be made in data_cleaning.identify_partial_cems_plants.

Side note: I confirmed that this issue does not appear to result in double-counting. After checking our nuclear generation totals against eGRID nuclear generation totals for PJM, it looks like generation from 2410 is not being double-counted.

@grgmiller
Copy link
Collaborator

I think the fix here is to make partial_cems_plant sensitive to fuel type: subplants of one fuel type should not be shaped if the available CEMS data is from a different fuel type. This change should be made in data_cleaning.identify_partial_cems_plants.

Agreed! Let's definitely implement this

@grgmiller
Copy link
Collaborator

Is there a validation check that we could add to the pipeline to automatically screen for spikes like this and alert us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data repair Interpolating or extrapolating data that we don't actually have
Projects
None yet
Development

No branches or pull requests

2 participants