# Category 5

`Individual House Hold Electric Power Consumption Dataset`을 활용한 예측

ABOUT THE DATASET

Original Source:
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption

The original 'Individual House Hold Electric Power Consumption Dataset'
has Measurements of electric power consumption in one household with
a one-minute sampling rate over a period of almost 4 years.

Different electrical quantities and some sub-metering values are available.

For the purpose of the examination we have provided a subset containing
the data for the first 60 days in the dataset. We have also cleaned the
dataset beforehand to remove missing values. The dataset is provided as a
csv file in the project.

The dataset has a total of 7 features ordered by time.
==============================================================================

INSTRUCTIONS

Complete the code in following functions:
1. windowed_dataset()
2. solution_model()

The model input and output shapes must match the following
specifications.

1. Model input_shape must be (BATCH_SIZE, N_PAST = 24, N_FEATURES = 7),
   since the testing infrastructure expects a window of past N_PAST = 24
   observations of the 7 features to predict the next 24 observations of
   the same features.

2. Model output_shape must be (BATCH_SIZE, N_FUTURE = 24, N_FEATURES = 7)

3. DON'T change the values of the following constants
   N_PAST, N_FUTURE, SHIFT in the windowed_dataset()
   BATCH_SIZE in solution_model() (See code for additional note on
   BATCH_SIZE).
4. Code for normalizing the data is provided - DON't change it.
   Changing the normalizing code will affect your score.

HINT: Your neural network must have a validation MAE of approximately 0.055 or
less on the normalized validation dataset for top marks.

WARNING: Do not use lambda layers in your model, they are not supported
on the grading infrastructure.

WARNING: If you are using the GRU layer, it is advised not to use the
'recurrent_dropout' argument (you can alternatively set it to 0),
since it has not been implemented in the cuDNN kernel and may
result in much longer training times.

# 1.import

In [1]:
import urllib
import os
import zipfile
import pandas as pd

import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv1D, LSTM
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint

# 2. Load dataset

In [2]:
def download_and_extract_data():
    url = 'https://storage.googleapis.com/download.tensorflow.org/data/certificate/household_power.zip'
    urllib.request.urlretrieve(url, 'household_power.zip')
    with zipfile.ZipFile('household_power.zip', 'r') as zip_ref:
        zip_ref.extractall()

In [3]:
download_and_extract_data()

In [5]:
df = pd.read_csv('household_power_consumption.csv', 
                 sep=',',
                 infer_datetime_format=True,
                 index_col='datetime', header=0)
df.head()

Unnamed: 0_level_0,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2006-12-16 17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
2006-12-16 17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2006-12-16 17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
2006-12-16 17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
2006-12-16 17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


# 3. Preprocess

## 3-1. Normalization

In [6]:
def normalize_series(data, min, max):
    data = data - min
    data = data / max
    return data

In [7]:
N_FEATURES = len(df.columns)

# 데이터프레임을 numpy array으로 가져와 data에 대입
data = df.values

# 데이터 정규화
data = normalize_series(data, data.min(axis=0), data.max(axis=0))
data

array([[0.43377912, 0.47826087, 0.04036551, ..., 0.        , 0.01282051,
        0.85      ],
       [0.55716135, 0.49885584, 0.0355582 , ..., 0.        , 0.01282051,
        0.8       ],
       [0.55867127, 0.56979405, 0.03420739, ..., 0.        , 0.02564103,
        0.85      ],
       ...,
       [0.03710095, 0.        , 0.05983313, ..., 0.        , 0.        ,
        0.        ],
       [0.03559103, 0.        , 0.06515693, ..., 0.        , 0.        ,
        0.        ],
       [0.03774806, 0.        , 0.06730234, ..., 0.        , 0.01282051,
        0.        ]])

In [9]:
pd.DataFrame(data).head()

Unnamed: 0,0,1,2,3,4,5,6
0,0.433779,0.478261,0.040366,0.435644,0.0,0.012821,0.85
1,0.557161,0.498856,0.035558,0.549505,0.0,0.012821,0.8
2,0.558671,0.569794,0.034207,0.549505,0.0,0.025641,0.85
3,0.560181,0.574371,0.035995,0.549505,0.0,0.012821,0.85
4,0.374461,0.604119,0.043703,0.371287,0.0,0.012821,0.85


### 3-2. 데이터 분할

In [10]:
split_time = int(len(data)*0.8)

In [11]:
x_train = data[:split_time]
x_valid = data[split_time:]

### 3-3. Windowed Dataset 생성

In [12]:
def windowed_dataset(series, batch_size, n_past=24, n_future=24, shift=1):
  ds = tf.data.Dataset.from_tensor_slices(series)
  ds = ds.window(size=(n_past + n_future), shift=shift, drop_remainder=True)
  ds = ds.flat_map(lambda w: w.batch(n_past + n_future))
  ds = ds.shuffle(len(series))
  ds = ds.map(
      lambda w: (w[:n_past], w[n_past:])
  )
  return ds.batch(batch_size).prefetch(1)

In [13]:
BATCH_SIZE = 32 
N_PAST = 24 
N_FUTURE = 24 
SHIFT = 1 

train_set , valid_set 생성

In [14]:
train_set = windowed_dataset(series=x_train, 
                             batch_size=BATCH_SIZE,
                             n_past=N_PAST, 
                             n_future=N_FUTURE,
                             shift=SHIFT)

valid_set = windowed_dataset(series=x_valid, 
                             batch_size=BATCH_SIZE,
                             n_past=N_PAST, 
                             n_future=N_FUTURE,
                             shift=SHIFT)

# 4. 모델 정의 

In [17]:
model = Sequential([
                    Conv1D(filters=32,
                           kernel_size=3,
                           padding='causal',
                           activation='relu',
                           input_shape=[N_PAST, 7]),
                    LSTM(32, return_sequences=True),
                    Dense(32, activation='relu'),
                    Dense(16, activation='relu'),
                    Dense(N_FEATURES)
])

# 5. ModelCheckpoint

In [18]:
checkpoint_path = 'my_checkpoint.ckpt'
checkpoint = ModelCheckpoint(filepath=checkpoint_path,
                             save_weights_only=True,
                             save_best_only=True,
                             monitor='val_loss',
                             verbose=1)

# 6. Compile


In [20]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0005)

model.compile(optimizer=optimizer,
              loss='mae',
              metrics=['mae'])

# 7. fit

In [21]:
model.fit(train_set,
          validation_data= (valid_set),
          epochs=20,
          callbacks=[checkpoint])

Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.05373, saving model to my_checkpoint.ckpt
Epoch 2/20

Epoch 00002: val_loss improved from 0.05373 to 0.05286, saving model to my_checkpoint.ckpt
Epoch 3/20

Epoch 00003: val_loss improved from 0.05286 to 0.05238, saving model to my_checkpoint.ckpt
Epoch 4/20

Epoch 00004: val_loss improved from 0.05238 to 0.05217, saving model to my_checkpoint.ckpt
Epoch 5/20

Epoch 00005: val_loss improved from 0.05217 to 0.05164, saving model to my_checkpoint.ckpt
Epoch 6/20

Epoch 00006: val_loss improved from 0.05164 to 0.05148, saving model to my_checkpoint.ckpt
Epoch 7/20

Epoch 00007: val_loss improved from 0.05148 to 0.05139, saving model to my_checkpoint.ckpt
Epoch 8/20

Epoch 00008: val_loss did not improve from 0.05139
Epoch 9/20

Epoch 00009: val_loss improved from 0.05139 to 0.05126, saving model to my_checkpoint.ckpt
Epoch 10/20

Epoch 00010: val_loss did not improve from 0.05126
Epoch 11/20

Epoch 00011: val_loss improved from 0.05

<keras.callbacks.History at 0x7fa38014ed50>

# 8. load weights

In [22]:
model.load_weights(checkpoint_path)

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7fa37a3e5b90>

# 모델 검증

In [24]:
model.evaluate(valid_set)



[0.05083582177758217, 0.05083581060171127]