Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create notebook looking at subplant id across years #351

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

rouille
Copy link
Collaborator

@rouille rouille commented Mar 12, 2024

Purpose

Compare mapping of (plant_id_eia, generator_id) to subplant_id across years. Closes CAR-3518

What the code is doing

Load the sub-plant crosswalk files for years 2019 through 2022. For each year combination, the intersection of (plant_id_eia, generator_id) is retrieved and occurrences where the subplant_id values differ are saved as a pandas data frame.

Testing

N/A

Where to look

A new notebook named validate_subplant_crosswalk encloses this short analysis

Usage Example/Visuals

See notebook outputs here, e.g.,

Number of difference in subplant_id for same (plant_id_eia, generator_id) combination
2019-2020: 448
2019-2021: 895
2019-2022: 1193
2020-2021: 488
2020-2022: 831
2021-2022: 435

Review estimate

10min

Future work

N/A

Checklist

  • Update the documentation to reflect changes made in this PR
  • Format all updated python files using black
  • Clear outputs from all notebooks modified
  • Add docstrings and type hints to any new functions created

@rouille rouille self-assigned this Mar 12, 2024
Copy link
Collaborator

@grgmiller grgmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Ben - this is really helpful. Looks like our subplants are not staying static across the years, which will be an issue for using these for GRETA. This will likely be one of the issues we'll want to prioritize fixing once we start on that. It looks like maybe we currently assign subplant IDs based on sorting the generator IDs at each plant, as we had been previously been assuming that new generators would have a higher number than existing generators. However, at least looking at the first few cases, it looks like a new generator is often added out of order (eg 5.1, 8B) which then pushes the remaining subplant numbers. There might also be an issue with subplant grouping (ie I'd assume that generator 8A and 8B would be part of the same subplant). Let's find some time to discuss this later this week in the context of other potential data quality issues.

@rouille
Copy link
Collaborator Author

rouille commented Mar 12, 2024

I have cleared the outputs of the notebook to comply with the guideline

@rouille rouille merged commit 8b5d1c5 into development Mar 15, 2024
2 checks passed
@rouille rouille deleted the ben/crosswalk branch March 15, 2024 16:39
@rouille rouille mentioned this pull request Apr 4, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants