### What is the causal effect of heat waves on electricity demand in regions with high penetration of solar power?

California's largest source of renewable energy comes from solar and it is the nation's top solar energy producer, making it a worthy state to examine the implications of extreme weather events, that are increasing due to the impacts of climate change, on energy distribution & usage. 

In [20]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

In [22]:
def eda(data):
    df = data.copy()
    df = df[['Region', 'UTC time', 'Local date', 'Hour', 'Local time', 'Time zone',
                         'DF', 'D', 'Sum (NG)', 'NG: COL', 'NG: NG', 'NG: NUC',
                         'NG: OIL', 'NG: WAT', 'NG: SUN', 'NG: WND', 'NG: OTH', 'CO2 Emissions Generated']]
    df['Local date'] = pd.to_datetime(df['Local date'], format='%d%b%Y')
    df['day'] = df['Local date'].dt.day
    df['month'] = df['Local date'].dt.month
    df['year'] = df['Local date'].dt.year
    df['dotw'] = df['Local date'].dt.dayofweek
    df = df[df['year'] >= 2019]
    df['Hour'] = df['Hour'] - 1
    df = df[~((df['day'] == 18) & (df['month'] == 11) & (df['year'] == 2024))]
    df = df[~((df['day'] == 17) & (df['month'] == 11) & (df['year'] == 2024))]
    df['D'] = pd.to_numeric(df['D'].str.replace(',', '', regex=False), errors='coerce')
    return df

In [24]:
california = eda(pd.read_csv('california.csv', low_memory=False))

In [26]:
summer_2024 = pd.read_csv('2024_Summer.csv')
summer_2023 = pd.read_csv('2023_Summer.csv')
summer_2022 = pd.read_csv('2022_Summer.csv')
summer_2021 = pd.read_csv('2021_Summer.csv')
summer_2024_cleaned = summer_2024[['DATE', 'TAVG', 'TMAX', 'TMIN']]
summer_2023_cleaned = summer_2023[['DATE', 'TAVG', 'TMAX', 'TMIN']]
summer_2022_cleaned = summer_2022[['DATE', 'TAVG', 'TMAX', 'TMIN']]
summer_2021_cleaned = summer_2021[['DATE', 'TAVG', 'TMAX', 'TMIN']]

In [27]:
all_summer_data = pd.concat([summer_2024_cleaned, 
                             summer_2023_cleaned, 
                             summer_2022_cleaned, 
                             summer_2021_cleaned], 
                            ignore_index=True)

In [30]:
all_summer_data

Unnamed: 0,DATE,TAVG,TMAX,TMIN
0,2024-06-01,72.0,85.0,59.0
1,2024-06-02,69.0,84.0,56.0
2,2024-06-03,65.0,74.0,58.0
3,2024-06-04,77.0,94.0,58.0
4,2024-06-05,83.0,99.0,69.0
...,...,...,...,...
446747,2021-09-11,68.0,73.0,62.0
446748,2021-09-12,64.0,71.0,58.0
446749,2021-09-13,59.0,64.0,55.0
446750,2021-09-14,61.0,68.0,53.0


In [36]:
# Ensure the 'Local date' in energy data is in date format (strip time if necessary)
california['Local date'] = pd.to_datetime(california['Local date']).dt.date

# Ensure the 'DATE' in temperature data is in date format
all_summer_data['DATE'] = pd.to_datetime(all_summer_data['DATE']).dt.date

# Merge the datasets on the date columns
# Since we want to retain all rows in filtered_data and append the temperature data, use a 'left' join
combined_data = pd.merge(
    california,
    all_summer_data,
    left_on='Local date',
    right_on='DATE',
    how='right'
)

# Drop the duplicate 'DATE' column from all_summer_data if needed
combined_data.drop(columns=['DATE'], inplace=True)

# Display the combined dataset
combined_data

Unnamed: 0,Region,UTC time,Local date,Hour,Local time,Time zone,DF,D,Sum (NG),NG: COL,...,NG: WND,NG: OTH,CO2 Emissions Generated,day,month,year,dotw,TAVG,TMAX,TMIN
0,CAL,01Jun2024 8:00:00,2024-06-01,0,01Jun2024 1:00:00,Pacific,28610,30166,22918,277,...,4422,773,3966,1,6,2024,5,72.0,85.0,59.0
1,CAL,01Jun2024 9:00:00,2024-06-01,1,01Jun2024 2:00:00,Pacific,27237,28869,22251,353,...,4597,1064,3638,1,6,2024,5,72.0,85.0,59.0
2,CAL,01Jun2024 10:00:00,2024-06-01,2,01Jun2024 3:00:00,Pacific,26090,27778,21138,290,...,4664,464,3532,1,6,2024,5,72.0,85.0,59.0
3,CAL,01Jun2024 11:00:00,2024-06-01,3,01Jun2024 4:00:00,Pacific,25390,26822,20367,277,...,4682,277,3410,1,6,2024,5,72.0,85.0,59.0
4,CAL,01Jun2024 12:00:00,2024-06-01,4,01Jun2024 5:00:00,Pacific,25109,26051,19174,301,...,4423,496,2990,1,6,2024,5,72.0,85.0,59.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10722043,CAL,16Sep2021 3:00:00,2021-09-15,19,15Sep2021 20:00:00,Pacific,41981,41497,30099,1803,...,4060,1705,7794,15,9,2021,2,61.0,67.0,57.0
10722044,CAL,16Sep2021 4:00:00,2021-09-15,20,15Sep2021 21:00:00,Pacific,40055,39643,28193,1803,...,4059,1342,7664,15,9,2021,2,61.0,67.0,57.0
10722045,CAL,16Sep2021 5:00:00,2021-09-15,21,15Sep2021 22:00:00,Pacific,37468,37327,25969,1804,...,4217,1060,7319,15,9,2021,2,61.0,67.0,57.0
10722046,CAL,16Sep2021 6:00:00,2021-09-15,22,15Sep2021 23:00:00,Pacific,34322,34385,23023,1805,...,3990,772,6683,15,9,2021,2,61.0,67.0,57.0
