This notebook is meant to start exploring issue 240 (https://github.com/singularity-energy/open-grid-emissions/issues/240).

We want to investigate how much the physics-based reconciliation is modifying the original net generation profiles, especially in ways that seem inconsistent with the original data (e.g. modifying a flat nuclear profile to be load following).  

To do this, we are loading the raw EIA-930 data and the reconciled data and comparing them side by side.  

We first calculate the correlation between each timeseries in each month to identify particularly eggregious examples where the shape of the modified profile does not resemble the shape of the raw profile (e.g. correlation near zero or negative)

We then visualize these individual timeseries to see what's going on. In some cases, these low correlations are resulting from spikes being cleaned, but in others, the reconciliation process is just modifying the profile in an unacceptable way.

The next step is to think about if we can adjust the reconciliation parameters to prevent this issue.

In [None]:
# import packages
import pandas as pd
import os
import plotly.express as px

%reload_ext autoreload
%autoreload 2

# # Tell python where to look for modules.
import sys
sys.path.append('../../../open-grid-emissions/src/')

import download_data
import load_data
from column_checks import get_dtypes
from filepaths import *
import impute_hourly_profiles
import data_cleaning
import output_data
import emissions
import validation
import gross_to_net_generation
import eia930

year = 2020
path_prefix = f"{year}/"

In [None]:
# load the raw and cleaned eia930 data to compare
raw_930_file = outputs_folder(f"{path_prefix}/eia930/eia930_raw.csv")
clean_930_file = outputs_folder(f"{path_prefix}/eia930/eia930_elec.csv")

eia930_raw = eia930.load_chalendar_for_pipeline(raw_930_file, year=year)
eia930_data = eia930.load_chalendar_for_pipeline(clean_930_file, year=year)

eia930_merged = eia930_raw.merge(eia930_data, how="left", on=["ba_code","fuel_category_eia930","datetime_utc","datetime_local","report_date"], suffixes=("_raw","_cleaned"))

In [None]:
# calculate how well correlated the raw and cleaned data is
correlations = eia930_merged.groupby(["ba_code","fuel_category_eia930","report_date"], dropna=False)[["net_generation_mwh_930_raw","net_generation_mwh_930_cleaned"]].corr().reset_index()
correlations = correlations[correlations["level_3"] == "net_generation_mwh_930_raw"]
correlations = correlations.drop(columns=["level_3","net_generation_mwh_930_raw"])
correlations = correlations.rename(columns={"net_generation_mwh_930_cleaned":"correlation_with_raw"})
correlations = correlations[correlations["report_date"].dt.year == 2020]
correlations

In [None]:
ba = "PJM"
fuel = "coal"

correlations[(correlations["ba_code"] == ba) & (correlations["fuel_category_eia930"] == fuel)]

In [None]:
correlations[correlations["correlation_with_raw"] < 0.1]

In [None]:
ba = "BPAT"
fuel = "nuclear"

data_to_plot = eia930_merged[(eia930_merged["ba_code"] == ba) & (eia930_merged["fuel_category_eia930"] == fuel)]

px.line(data_to_plot, x="datetime_local", y=["net_generation_mwh_930_raw","net_generation_mwh_930_cleaned"])