## About this notebook

When imputing missing hourly profiles for wind and solar data in a region, one of our methods averages national-level data. However, when aggrgeating wind and solar data across timezones, it is unclear whether wind and solar patterns on a particular day are more closely related to the local time (ie are driven by diurnal cycles) or utc time (ie are driven by macro/national weather patterns).

For solar, it seems intuitive that solar energy would be related to local time, since solar irradiance depends on the position of the sun locally. 

However, for wind, it is unclear whether wind patterns are more related to macro trends or day/night swings in temperature.

In [None]:
# import packages
import pandas as pd

%reload_ext autoreload
%autoreload 2

# # Tell python where to look for modules.
import sys

sys.path.append("../../../open-grid-emissions/src/")


import eia930

from filepaths import *

year = 2020
path_prefix = f"{year}/"

In [None]:
# load eia930 data

# If running small, we didn't clean the whole year, so need to use the Chalender file to build residual profiles.
clean_930_file = f"{outputs_folder()}{path_prefix}/eia930/eia930_elec.csv"
eia930_data = eia930.load_chalendar_for_pipeline(clean_930_file, year=year)
# until we can fix the physics reconciliation, we need to apply some post-processing steps
eia930_data = eia930.remove_imputed_ones(eia930_data)
eia930_data = eia930.remove_months_with_zero_data(eia930_data)

## Explore Wind data

In [None]:
fuel = "wind"
report_date = "2020-11-01"

df_temporary = eia930_data.copy()[
    (eia930_data["fuel_category_eia930"] == fuel)
    & (eia930_data["report_date"] == report_date)
]

# strip the time zone information so we can group by local time
df_temporary["datetime_local"] = df_temporary["datetime_local"].astype(str).str[:-6]
df_temporary["datetime_utc"] = df_temporary["datetime_utc"].astype(str).str[:-6]

df_temporary

In [None]:
# how well correlated are profiles across utc time
df_temporary.pivot(
    index="datetime_utc", columns="ba_code", values="net_generation_mwh_930"
).corr().mean().mean()

In [None]:
# how well correlated are profiles across local time
df_temporary.pivot(
    index="datetime_local", columns="ba_code", values="net_generation_mwh_930"
).corr().mean().mean()