# 2024 날씨 빅데이터 콘테스트 

## 전력 분야 - 기상에 따른 공동주택 전력수요 예측 개선 

In [2]:
import pandas as pd
import os 

import matplotlib.pyplot as plt

In [3]:
df = pd.read_csv('./data/electric_train_cp949.csv', encoding='cp949', index_col=0)
df['electric_train.tm'] = pd.to_datetime(df['electric_train.tm'])

print(df.shape)

(7593355, 16)


In [4]:
elec_cols = ['electric_train.'+ a for a in ['tm', 'hh24', 'weekday', 'week_name', 'sum_qctr', 'n', 'sum_load', 'n_mean_load', 'elec']]
df_elec = df[elec_cols]
df_elec.head()

Unnamed: 0,electric_train.tm,electric_train.hh24,electric_train.weekday,electric_train.week_name,electric_train.sum_qctr,electric_train.n,electric_train.sum_load,electric_train.n_mean_load,electric_train.elec
1,2021-01-01 01:00:00,1,4,0,6950,11,751.32,68.606449,99.56
2,2021-01-01 02:00:00,2,4,0,6950,11,692.6,68.606449,91.78
3,2021-01-01 03:00:00,3,4,0,6950,11,597.48,68.606449,79.17
4,2021-01-01 04:00:00,4,4,0,6950,11,553.48,68.606449,73.34
5,2021-01-01 05:00:00,5,4,0,6950,11,526.24,68.606449,69.73


In [5]:
weat_cols = ['electric_train.'+ a for a in ['num', 'stn', 'nph_ta', 'nph_hm', 'nph_ws_10m', 'nph_rn_60m', 'nph_ta_chi']]
df_weat = df[weat_cols]
df_weat.head()

Unnamed: 0,electric_train.num,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi
1,4821,884,2.2,62.7,1.8,0.0,-1.0
2,4821,884,2.3,63.1,2.1,0.0,-0.6
3,4821,884,2.2,62.4,2.5,0.0,-1.3
4,4821,884,1.7,63.5,1.7,0.0,-0.2
5,4821,884,1.7,63.0,1.6,0.0,-0.8


In [6]:
reset_order_cols = elec_cols + weat_cols

df_new = df[reset_order_cols]

df_new.head()

Unnamed: 0,electric_train.tm,electric_train.hh24,electric_train.weekday,electric_train.week_name,electric_train.sum_qctr,electric_train.n,electric_train.sum_load,electric_train.n_mean_load,electric_train.elec,electric_train.num,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi
1,2021-01-01 01:00:00,1,4,0,6950,11,751.32,68.606449,99.56,4821,884,2.2,62.7,1.8,0.0,-1.0
2,2021-01-01 02:00:00,2,4,0,6950,11,692.6,68.606449,91.78,4821,884,2.3,63.1,2.1,0.0,-0.6
3,2021-01-01 03:00:00,3,4,0,6950,11,597.48,68.606449,79.17,4821,884,2.2,62.4,2.5,0.0,-1.3
4,2021-01-01 04:00:00,4,4,0,6950,11,553.48,68.606449,73.34,4821,884,1.7,63.5,1.7,0.0,-0.2
5,2021-01-01 05:00:00,5,4,0,6950,11,526.24,68.606449,69.73,4821,884,1.7,63.0,1.6,0.0,-0.8


In [7]:
df_train = df[df['electric_train.tm'] < '2023-01-01']

targets = ['electric_train.sum_qctr', 'electric_train.n', 'electric_train.sum_load', 'electric_train.n_mean_load']

df_val = df[df['electric_train.tm'] >= '2023-01-01'].drop(targets, axis=1)
# df_val는 다른 features에서 계산을 통해 구한 후 채워넣야 할듯? <-- 훈련 데이터에서는 사용하기 때문 (내 생각.....)

In [8]:
target = 'electric_train.elec'

In [9]:
targets.append(target)
targets = ['electric_train.tm'] + targets


In [10]:
df_targets = df_train[targets]

df_targets

Unnamed: 0,electric_train.tm,electric_train.sum_qctr,electric_train.n,electric_train.sum_load,electric_train.n_mean_load,electric_train.elec
1,2021-01-01 01:00:00,6950,11,751.32,68.606449,99.56
2,2021-01-01 02:00:00,6950,11,692.60,68.606449,91.78
3,2021-01-01 03:00:00,6950,11,597.48,68.606449,79.17
4,2021-01-01 04:00:00,6950,11,553.48,68.606449,73.34
5,2021-01-01 05:00:00,6950,11,526.24,68.606449,69.73
...,...,...,...,...,...,...
7593350,2022-12-31 19:00:00,34200,23,6851.72,225.461986,132.13
7593351,2022-12-31 20:00:00,34200,23,6779.84,225.461986,130.74
7593352,2022-12-31 21:00:00,34200,23,6802.40,225.461986,131.18
7593353,2022-12-31 22:00:00,34200,23,6706.68,225.461986,129.33


In [11]:
df_3am = df_new[df_new['electric_train.tm'].dt.hour == 3]
df_3am.head()

Unnamed: 0,electric_train.tm,electric_train.hh24,electric_train.weekday,electric_train.week_name,electric_train.sum_qctr,electric_train.n,electric_train.sum_load,electric_train.n_mean_load,electric_train.elec,electric_train.num,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi
3,2021-01-01 03:00:00,3,4,0,6950,11,597.48,68.606449,79.17,4821,884,2.2,62.4,2.5,0.0,-1.3
27,2021-01-02 03:00:00,3,5,1,6950,11,565.64,68.606449,74.95,4821,884,4.6,58.8,1.5,0.0,2.5
51,2021-01-03 03:00:00,3,6,1,6950,11,561.52,68.606449,74.41,4821,884,2.2,77.6,1.5,0.0,0.0
75,2021-01-04 03:00:00,3,0,0,6950,11,559.44,68.606449,74.13,4821,884,4.0,61.2,1.3,0.0,2.8
99,2021-01-05 03:00:00,3,1,0,6950,11,540.8,68.606449,71.66,4821,884,4.6,70.0,1.6,0.0,3.3


In [12]:
df_1am = df_new[df_new['electric_train.tm'].dt.hour == 0]
a= df_1am[df_1am['electric_train.tm'].between('2020-01-01','2020-01-02')]

a.head()

Unnamed: 0,electric_train.tm,electric_train.hh24,electric_train.weekday,electric_train.week_name,electric_train.sum_qctr,electric_train.n,electric_train.sum_load,electric_train.n_mean_load,electric_train.elec,electric_train.num,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi
8784,2020-01-02,24,3,0,42250,58,4924.76,79.776051,106.44,5565,184,6.3,66.3,1.3,0.0,5.3
35088,2020-01-02,24,3,0,16750,19,2017.52,97.138634,109.31,5566,184,6.3,66.3,1.3,0.0,5.3
61392,2020-01-02,24,3,0,14600,11,2837.8,241.745562,106.72,5567,330,4.6,64.8,1.7,0.0,3.0
105216,2020-01-02,24,3,0,16100,14,2947.68,186.873616,112.67,9735,165,5.8,76.4,1.0,0.0,-0.4
114000,2020-01-02,24,3,0,7750,10,1790.8,156.399551,114.5,9736,774,-1.9,84.2,0.5,0.0,-1.9


In [13]:
df_val.head()

Unnamed: 0,electric_train.num,electric_train.tm,electric_train.hh24,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi,electric_train.weekday,electric_train.week_name,electric_train.elec
35064,5565,2023-01-01,24,184,4.8,66.9,2.9,0.0,2.1,6,1,99.64
61368,5566,2023-01-01,24,184,4.8,66.9,2.9,0.0,2.1,6,1,104.26
96432,8994,2023-01-01,24,261,-3.1,91.7,0.2,0.0,-3.1,6,1,107.96
140280,9736,2023-01-01,24,774,-1.4,86.8,0.9,0.0,-1.4,6,1,111.51
166584,9758,2023-01-01,24,168,3.9,68.7,1.6,0.0,-2.2,6,1,106.43


In [14]:
df_1am = df_val[df_val['electric_train.tm'].dt.hour == 0]

df_1am.head()

Unnamed: 0,electric_train.num,electric_train.tm,electric_train.hh24,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi,electric_train.weekday,electric_train.week_name,electric_train.elec
35064,5565,2023-01-01,24,184,4.8,66.9,2.9,0.0,2.1,6,1,99.64
61368,5566,2023-01-01,24,184,4.8,66.9,2.9,0.0,2.1,6,1,104.26
96432,8994,2023-01-01,24,261,-3.1,91.7,0.2,0.0,-3.1,6,1,107.96
140280,9736,2023-01-01,24,774,-1.4,86.8,0.9,0.0,-1.4,6,1,111.51
166584,9758,2023-01-01,24,168,3.9,68.7,1.6,0.0,-2.2,6,1,106.43


In [15]:
df_1am = df_val[df_val['electric_train.tm'].dt.hour == 1]

df_1am.head()

Unnamed: 0,electric_train.num,electric_train.tm,electric_train.hh24,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi,electric_train.weekday,electric_train.week_name,electric_train.elec


In [16]:
df_1am = df_val[df_val['electric_train.tm'].dt.hour == 2]

df_1am.head()

Unnamed: 0,electric_train.num,electric_train.tm,electric_train.hh24,electric_train.stn,electric_train.nph_ta,electric_train.nph_hm,electric_train.nph_ws_10m,electric_train.nph_rn_60m,electric_train.nph_ta_chi,electric_train.weekday,electric_train.week_name,electric_train.elec
