Add operating and retirement dates to plant static attributes #367

rouille · 2024-05-22T22:01:03Z

Purpose

Add operating and retirement dates to plant static attributes.

A bug is fixed when calculating the nameplate capacity at the plant level

This PR also fixes an issue for years < 2013 where missing BA codes were being assigned, resulting in inaccurate BA-level results.

What the code is doing

Create a new function that adds the operating and retirement dates of a plant to the plant static attributes data frame. The operating date of a plant is taken as the earliest date among all generators' operating date over all report dates. Likewise, the retirement date of a plant is taken as the latest date among all generators' retirement date over all report dates.

Testing

Successfully ran the 2013 pipeline.

Where to look

the new add_plant_operating_and_retirement_dates in the oge.helpers module.
the oge.column_checks module where the new fields were added. Note that I added some missing datetime to the list of columns defined in the apply_dtypes function. These columns won't be converted.

Usage Example/Visuals

Review estimate

10min

Future work

N/A

Checklist

Update the documentation to reflect changes made in this PR
Format all updated python files using black
Clear outputs from all notebooks modified
Add docstrings and type hints to any new functions created

grgmiller

See requested changes.

Before we merge this, it may be helpful to do a couple quick validations:

Are there any missing operating dates? If so, we will just need to understand how to deal with those in MISO
Load up some EIA-923 data for this year and just quickly check if there are any plants that we marked as retired prior to 2022 that reported 923 data in 2022... sometimes plants do continue to report, so this may be okay, but we should at least manually double check that our algorithm didn't mistakenly mark a plant as retired if at least one generator is still going

grgmiller · 2024-05-23T03:41:10Z

src/oge/column_checks.py

+        "generator_operating_date",
+        "generator_retirement_date",
+        "current_planned_generator_operating_date",
+        "operating_date",


to follow the pudl naming conventions, and for clarity, let's call these "plant_operating_date" and "plant_retirement_date"

grgmiller · 2024-05-23T03:45:12Z

src/oge/helpers.py

+        generators_dates.groupby("plant_id_eia")[
+            ["generator_operating_date", "generator_retirement_date"]
+        ]
+        .agg({"generator_operating_date": "min", "generator_retirement_date": "max"})


While this will work for the operating date, this will not work for the retirement date. For example, what if a plant has 10 generators and only one retires? This would currently say the entire plant is retired.

For the retirement date, one way to do this would be to check for plants where there are no NA retirement dates across all generators, and then take the max of that.

Looking at the sample outputs you posted, it currently shows that plant 3 "Barry" retired in 2015, but this plant is still operational

One easy way to do this would be to load the "operational_status' column and just identify where all generators are retired as of the latest_validated_year

grgmiller · 2024-05-23T03:45:25Z

src/oge/helpers.py

+        pd.DataFrame: original data frame with additional 'operating_date' and
+            'retirement_date' column.
+    """
+    generators_dates = load_data.load_pudl_table(


"generator_dates"

grgmiller · 2024-05-23T03:50:15Z

src/oge/helpers.py

+            'retirement_date' column.
+    """
+    generators_dates = load_data.load_pudl_table(
+        "denorm_generators_eia",


this table will contain values for each year reported, so before we run our min and max operations, we need to drop duplicates. Before we drop duplicates though, we may need to do a groupby([plant_id, generator_id]).ffill() and .bfill() to make sure that we have complete values for all years

We may also want to filter to only include data up to the latest_validated_year

This is something I was working on in GRETA, but a similar pattern may work here:

min_operating = oge.load_data.load_pudl_table( "generators_eia860", year=earliest_data_year, end_year=latest_validated_year, columns=[ "report_date", "plant_id_eia", "generator_id", "minimum_load_mw", "capacity_mw", "summer_capacity_mw", "winter_capacity_mw", ], ).sort_values(by=["plant_id_eia","generator_id","report_date"], ascending=True) # fill missing capacity values capacity_columns = ["minimum_load_mw", "capacity_mw", "summer_capacity_mw", "winter_capacity_mw"] for col in capacity_columns: min_operating[col] = min_operating.groupby(["plant_id_eia","generator_id"])[col].bfill() min_operating[col] = min_operating.groupby(["plant_id_eia","generator_id"])[col].ffill() # keep only the most recent year of data min_operating = min_operating.drop_duplicates(subset=["plant_id_eia","generator_id"], keep="last")

Thanks. Implemented.

rouille · 2024-05-23T16:31:32Z

Screen shot with new implementation

grgmiller

Looks good, thanks.
One small request would be to change the order of the columns so that we are grouping data together and make it easier to read. My suggestion for column order would be:

      "plant_id_eia", #identification columns
      "plant_name_eia",
       "capacity_mw", # what type of plant is this
      "plant_primary_fuel",
      "fuel_category",
      "fuel_category_eia930",
      "state", # where is it located
        "county",
        "city",
      "ba_code",
      "ba_code_physical",
     "latitude",
      "longitude",
      "plant_operating_date", #operational status columns
      "plant_retirement_date",
      "distribution_flag", #other random metadata
      "timezone",
      "data_availability",
      "shaped_plant_id",

rouille · 2024-05-23T18:42:38Z

The calculation of the nameplate capacity was bugged and is fixed in the katest commit

rouille · 2024-05-23T18:42:58Z

Looks good, thanks. One small request would be to change the order of the columns so that we are grouping data together and make it easier to read. My suggestion for column order would be:

      "plant_id_eia", #identification columns
      "plant_name_eia",
       "capacity_mw", # what type of plant is this
      "plant_primary_fuel",
      "fuel_category",
      "fuel_category_eia930",
      "state", # where is it located
        "county",
        "city",
      "ba_code",
      "ba_code_physical",
     "latitude",
      "longitude",
      "plant_operating_date", #operational status columns
      "plant_retirement_date",
      "distribution_flag", #other random metadata
      "timezone",
      "data_availability",
      "shaped_plant_id",

Done

grgmiller

See comment about the nameplate capacity fix.

grgmiller · 2024-05-24T16:32:55Z

src/oge/helpers.py

+    )["capacity_mw"].ffill()
+
+    # keep only the most recent year of data
+    generator_capacity = generator_capacity.drop_duplicates(


In the case of nameplate capacity, I think that we only want to keep the specific data year, not the latest validated year. Nameplate capacity can chance over time if the generator is repowered, so this value might be annually varying.

It's still good that we load all years and do the fill in case there is missing capacity data in a specific year.

Sounds good. Implemented.

grgmiller

Capacity changes look good

rouille requested a review from grgmiller May 22, 2024 22:01

rouille self-assigned this May 22, 2024

rouille marked this pull request as ready for review May 23, 2024 02:30

grgmiller requested changes May 23, 2024

View reviewed changes

feat: add operating and retirement dates to plant static attributes

32a0bf0

rouille force-pushed the ben/dates branch from 5b2f99e to 32a0bf0 Compare May 23, 2024 05:21

grgmiller approved these changes May 23, 2024

View reviewed changes

refactor: change order of columns in plant attributes

afa1147

rouille force-pushed the ben/dates branch from a763976 to 3526b60 Compare May 23, 2024 19:35

grgmiller reviewed May 24, 2024

View reviewed changes

fix: prevent multi counting of capacity

b26bf17

rouille force-pushed the ben/dates branch from 3526b60 to b26bf17 Compare May 24, 2024 17:32

grgmiller approved these changes May 24, 2024

View reviewed changes

grgmiller added 3 commits May 24, 2024 15:22

fix missing BA codes pre 2013

5bcaf23

fix ruff formatting

6022958

address missing ba codes

c225312

grgmiller merged commit 6f4c9e3 into historical_coverage_feature May 24, 2024
1 check passed

grgmiller deleted the ben/dates branch May 24, 2024 23:59

grgmiller mentioned this pull request Jul 31, 2024

Historical coverage feature / v0.5.0 #386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add operating and retirement dates to plant static attributes #367

Add operating and retirement dates to plant static attributes #367

rouille commented May 22, 2024 •

edited by grgmiller

Loading

grgmiller left a comment

grgmiller May 23, 2024

rouille May 23, 2024

grgmiller May 23, 2024

grgmiller May 23, 2024

rouille May 23, 2024

rouille May 23, 2024

grgmiller May 23, 2024

rouille May 23, 2024

grgmiller May 23, 2024

grgmiller May 23, 2024

grgmiller May 23, 2024

rouille May 23, 2024

rouille commented May 23, 2024

grgmiller left a comment •

edited

Loading

rouille commented May 23, 2024

rouille commented May 23, 2024

grgmiller left a comment

grgmiller May 24, 2024

grgmiller May 24, 2024

rouille May 24, 2024

grgmiller left a comment

Add operating and retirement dates to plant static attributes #367

Add operating and retirement dates to plant static attributes #367

Conversation

rouille commented May 22, 2024 • edited by grgmiller Loading

Purpose

What the code is doing

Testing

Where to look

Usage Example/Visuals

Review estimate

Future work

Checklist

grgmiller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rouille commented May 23, 2024

grgmiller left a comment • edited Loading

Choose a reason for hiding this comment

rouille commented May 23, 2024

rouille commented May 23, 2024

grgmiller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grgmiller left a comment

Choose a reason for hiding this comment

rouille commented May 22, 2024 •

edited by grgmiller

Loading

grgmiller left a comment •

edited

Loading