Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data quality metric outputs #217

Merged
merged 2 commits into from
Sep 7, 2022
Merged

Conversation

grgmiller
Copy link
Collaborator

@grgmiller grgmiller commented Sep 4, 2022

As part of the research paper for OGEI, I wanted to update some of the output data quality metrics and figured these changes would be useful to merge for our next release.

This PR makes the following changes:

  • Combines diba_imputation_performance.csv and national_imputation_performance.csv into a single file wind_solar_profile_imputation_performance.csv which contains one column for the DIBA results and another column for the national results.
  • Adds a new output cems_pollutant_measurement_quality.csv which summarizes what percentage of CO2 mass, SO2 mass, and NOx mass reported in CEMS was directly measured versus imputed. This metric is based on the reported "mass_measurement_code" reported for each observation in the CEMS data.
  • Adds additional summary columns to input_data_source.csv and hourly_profile_method.csv for CO2e, SO2, and NOx (both totals and _for_electricity values)
  • Breaks out the EIA column in the input_data_source.csv to distinguish between monthly-reported EIA data and annually reported EIA data.
  • Re-orders certain rows in the outputs so that they are generally arranged with "better" sources/methods at the top and "worse" ones at the bottom.

NOTE: I am requesting this to be merged into a new development branch where we can stage cumulative updates before releasing a new version on main

@grgmiller grgmiller marked this pull request as draft September 4, 2022 05:10
@grgmiller grgmiller marked this pull request as ready for review September 6, 2022 03:50
Copy link
Collaborator

@gailin-p gailin-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that this works with data_pipeline.py --small True? Some previous per-method data quality metrics failed because the --small run doesn't use all of the methods, so just want to confirm that this avoids those issues

@grgmiller
Copy link
Collaborator Author

I have confirmed that this works with --small in 2020

@gailin-p gailin-p merged commit 1811766 into development Sep 7, 2022
@gailin-p gailin-p deleted the update_quality_metrics branch September 7, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants