-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update partial CEMS shaping for mixed fuel plants #238
Conversation
(hourly_validation and map_visualization) * update 930 time lag notebook for new dir structure
notebook was used during issue 230 investigation notebook is from gailin/clean_cems branch, which can now be deleted
* A renewable generator (hydro, wind, solar, nuclear, geothermal) won't use the `partial_cems_plant` hourly shaping methodology * A subplant that contains generators of mixed fuel types will choose the hourly shaping method of the generator with the largest generation
Other changes: There are some notebook changes included here that are not directly related to #230:
|
So the original partial plant methodology was designed such that different parts of a plant couldn't end up in separate files: If all subplants had However, with this current issue, we've discovered that we don't always want to shape an entire plant with partial cems data because of the spike issue. Currently, this PR now is splitting up the data so that (in the case of 2410 for example) the nuclear portion would end up in the shaped fleet data and the diesel portion would end up in the individual data. If we went with option 1, do you know how much CEMS data we'd be ignoring (ie is it a handful of backup generators with negligible generation, or would this affect a wider set of generation)? I'm kind of leaning toward 2 as the simpler fix for now. There are also a couple of other options that we could consider: |
The new methodology proposed in this PR results in 106 plant-months (over 12 unique plants) with split EIA and CEMS/partial_CEMS hourly data sources. The generation from these plants is split pretty evenly over CEMS and EIA hourly data sources, with about 21,000,000 MWh generation in CEMS subplants and 29,000,000 in EIA subplants. The affected plants are: [557, 621, 645, 1355, 2410, 2707, 2953, 6074, 8223, 10029, 10823, 58236]. (clarification note: these are plants with split methodologies between subplants, which is a different set of plants than the plants with NaN-id subplants with split methodologies within a subplant.) |
725e39a added a new hourly plant-level output file, partial_plant_data.csv, which contains the CEMS and partial CEMS plant data for plants with one or more subplants without hourly data. |
The more I think about it, I'm thinking that perhaps we should revert 725e39a and just keep the outputs split among the two files with a disclaimer in the documentation (and maybe even on the download page) that data might be split between two files. Here's my thought process:
|
Most of the reorganization is actually unrelated -- I just moved the writing of eia923 to just after the data is finished because I found it confusing for it to be exported later in the pipeline when it hasn't actually been modified since line 172. The only other pipeline change is to not delete it so it can be passed to the plant writing function.
I see your point here, though it doesn't feel too problematic to me (it'll still get zipped up with the rest of the hourly plant data). I think if we want to bundle the partial hourly plant data with the regular plant data, we should write a new data_quality_metrics file that reports the plant-months for which hourly plant data is partial. There's no other way to get that info without the |
Oh sorry about that - I realize now that in a previous iteration of the code the
Doesn't |
Ok, I like the idea of re-combining the hourly plant-level output files and directing users to plant_metadata to identify which plants have hourly data split between the plant file and the synthetic plant file.
Great point! This seems like a good spot for this data. However, Two options for fixing:
|
…grid-emissions into gailin/issue230
I think either option 1 or 2 would work, and we should do whatever is easiest to implement at this point (based on my educated guess, 2 might be easier since it involves less segmenting of the data, but I could be mistaken). The |
Rename col to in plant_metadata
07853b7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think we're ready to merge
Summary of changes
Close #230 by changing hourly data source identification methods:
partial_cems_plant
hourly data source methodologysubplant_id=NaN
, which can have generators of mixed fuel types; see comments on Ensure completesubplant_id
mapping #49)The post-fix hourly emission rate for PJM is below. The abrupt drops to zero (seen in #230) are gone.
![Screen Shot 2022-09-28 at 9 22 30 AM](https://user-images.githubusercontent.com/12755256/192795423-39cafb93-8da7-42f0-bf97-054f7392018a.png)
The PJM nuclear emission rate, below, shows the three hours where the generator is on as higher-than-zero emission rates:
![Screen Shot 2022-09-28 at 9 22 47 AM](https://user-images.githubusercontent.com/12755256/192797317-8c3a0774-2f33-4bcf-9f84-8ae649c4b17e.png)
Open question: subplants in hourly
plant_data
resultsPlant 2410, the nuclear plant originally responsible for the PJM data issues, has one subplant made up of a CEMS-reporting diesel generator and one subplant with its nuclear generation. Generation from the CEMS-reporting subplant is in
plant_data/hourly/individual_plant_data.csv
, while generation from the nuclear subplant is inshaped_fleet_data.csv
. This is confusing, since a user looking only atindividual_plant_data.csv
would think that plant 2410 had only ~10 MWh of generation in 2020, but if they looked at the annual data, they'd see annual generation of 16,000,000 MWh.Proposed solutions:
individual_plant_data.csv
if one or more of their subplants hashourly_data_source=eia
individual_plant_data.csv
may not contain all subplants, and indicate how users can identify whether a plant has complete hourly dataI think 1 is the better solution, but it does mean removing some hourly data that we're actually pretty confident in. (eg, for 2410, we do know that the diesel generator ran for those 3 hours).