# 2024 날씨 빅데이터 콘테스트 

## 전력 분야 - 기상에 따른 공동주택 전력수요 예측 개선 

In [2]:
import pandas as pd
import os 

import matplotlib.pyplot as plt
plt.rcParams['font.family'] ='Malgun Gothic'
plt.rcParams['axes.unicode_minus'] =False

import seaborn as sns
import numpy as np

import warnings
warnings.filterwarnings(action='ignore')

In [3]:
df = pd.read_csv('./data/electric_test_cp949.csv', encoding='cp949', index_col=0)
df['electric_test.tm'] = pd.to_datetime(df['electric_test.tm'])

print(df.shape)

(2838239, 11)


In [4]:
df.head()

Unnamed: 0,electric_test.num,electric_test.tm,electric_test.hh24,electric_test.stn,electric_test.nph_ta,electric_test.nph_hm,electric_test.nph_ws_10m,electric_test.nph_rn_60m,electric_test.nph_ta_chi,electric_test.weekday,electric_test.week_name
1,2385,2023-01-01 01:00:00,1,303,7.8,61.5,6.7,0.0,4.2,6,1
2,2385,2023-01-01 02:00:00,2,303,7.9,60.6,7.6,0.0,4.0,6,1
3,2385,2023-01-01 03:00:00,3,303,8.2,61.9,8.7,0.0,4.1,6,1
4,2385,2023-01-01 04:00:00,4,303,8.4,60.9,9.2,0.0,4.3,6,1
5,2385,2023-01-01 05:00:00,5,303,8.5,60.9,9.5,0.0,4.3,6,1


In [5]:
print(len(df['electric_test.num'].unique()))

324


In [6]:
df['electric_test.num'].unique()

array([ 2385,  4816, 16140,  9735,  9884,  9736,  9885, 16143, 16292,
       18527,  5565, 11376, 18528,  4821,  5566, 18529,  5567,  8994,
       14805, 16593, 18232, 18381, 18530, 18679, 18828, 18977, 13614,
       18233, 18382, 18531, 18680, 18829, 18978, 19127, 19276, 13615,
       18085, 18234, 18383, 18532, 18681, 18830, 18979, 19128, 19277,
       19426, 10487, 10934, 11083, 17639, 17937, 18086, 18235, 18384,
       18533, 18682, 18831, 18980, 19129, 19725, 10935, 11084, 11233,
       12276, 16001, 16895, 17938, 18087, 18236, 18385, 18534, 18683,
       18832, 18981, 19279, 10787, 10936, 11085, 11234, 16896, 17343,
       17790, 17939, 18088, 18237, 18386, 18535, 18684, 18833, 18982,
       10937, 11086, 13470, 13619, 16301, 17642, 17791, 17940, 18089,
       18238, 18387, 18536, 18685, 18834, 18983, 16153, 16302, 17196,
       17494, 17643, 17792, 17941, 18090, 18239, 18388, 18537, 18686,
       18835, 18984, 19133, 19282, 19431, 19580, 19878, 13174, 13323,
       14366, 16303,

In [7]:
df.columns

Index(['electric_test.num', 'electric_test.tm', 'electric_test.hh24',
       'electric_test.stn', 'electric_test.nph_ta', 'electric_test.nph_hm',
       'electric_test.nph_ws_10m', 'electric_test.nph_rn_60m',
       'electric_test.nph_ta_chi', 'electric_test.weekday',
       'electric_test.week_name'],
      dtype='object')

In [8]:
elec_cols = ['electric_test.'+ a for a in ['tm', 'hh24', 'weekday', 'week_name']]

weat_cols = ['electric_test.'+ a for a in ['num', 'stn', 'nph_ta', 'nph_hm', 'nph_ws_10m', 'nph_rn_60m', 'nph_ta_chi']]

reset_order_cols = elec_cols + weat_cols

df_new = df[reset_order_cols]
colunms = {}
for col in reset_order_cols:
    colunms[col] = col.split('.')[1]

df_new = df_new.rename(columns=colunms)

df_new['year'] = df_new['tm'].dt.year
df_new['month'] = df_new['tm'].dt.month
df_new['day'] = df_new['tm'].dt.day
df_new = df_new.sort_values(by='tm')

# 시즌을 결정하는 함수
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    elif month in [9, 10, 11]:
        return 'Fall'

# 'season' 컬럼 추가
df_new['season'] = df_new['tm'].dt.month.apply(get_season)

df_new.head()

Unnamed: 0,tm,hh24,weekday,week_name,num,stn,nph_ta,nph_hm,nph_ws_10m,nph_rn_60m,nph_ta_chi,year,month,day,season
1,2023-01-01 01:00:00,1,6,1,2385,303,7.8,61.5,6.7,0.0,4.2,2023,1,1,Winter
884761,2023-01-01 01:00:00,1,6,1,18536,889,-0.6,53.2,0.7,0.0,1.5,2023,1,1,Winter
893521,2023-01-01 01:00:00,1,6,1,18685,415,-3.2,62.2,0.6,0.0,2.1,2023,1,1,Winter
902281,2023-01-01 01:00:00,1,6,1,18834,108,-1.3,49.9,0.9,0.0,1.2,2023,1,1,Winter
2803200,2023-01-01 01:00:00,1,6,1,12619,151,6.7,61.7,4.2,0.0,-2.3,2023,1,1,Winter


In [13]:
cols_for_test = [
        'tm', 'year', 'season', 'month','day', 'weekday', 'hh24','week_name', 
        'num',
        'stn', 'nph_ta','nph_hm', 'nph_ws_10m',
       'nph_rn_60m', 'nph_ta_chi']

df_test = df_new[cols_for_test]

df_test.head()

Unnamed: 0,tm,year,season,month,day,weekday,hh24,week_name,num,stn,nph_ta,nph_hm,nph_ws_10m,nph_rn_60m,nph_ta_chi
1,2023-01-01 01:00:00,2023,Winter,1,1,6,1,1,2385,303,7.8,61.5,6.7,0.0,4.2
884761,2023-01-01 01:00:00,2023,Winter,1,1,6,1,1,18536,889,-0.6,53.2,0.7,0.0,1.5
893521,2023-01-01 01:00:00,2023,Winter,1,1,6,1,1,18685,415,-3.2,62.2,0.6,0.0,2.1
902281,2023-01-01 01:00:00,2023,Winter,1,1,6,1,1,18834,108,-1.3,49.9,0.9,0.0,1.2
2803200,2023-01-01 01:00:00,2023,Winter,1,1,6,1,1,12619,151,6.7,61.7,4.2,0.0,-2.3


In [15]:
df_test['tm'].min(), df_test['tm'].max()

(Timestamp('2023-01-01 01:00:00'), Timestamp('2024-01-01 00:00:00'))

In [16]:
df_test.isnull().sum()

tm            0
year          0
season        0
month         0
day           0
weekday       0
hh24          0
week_name     0
num           0
stn           0
nph_ta        0
nph_hm        0
nph_ws_10m    0
nph_rn_60m    0
nph_ta_chi    0
dtype: int64