# Keras - 03
---

- 집값 예측 데이터(`boston_housing`)를 이용하여 Keras 실습을 진행해보자.

In [6]:
from keras.datasets import boston_housing

(train_data, train_labels),(test_data, test_labels) = boston_housing.load_data()

In [7]:
print(train_data.shape)
print(test_data.shape)

(404, 13)
(102, 13)


In [8]:
print(train_data[0])

[  1.23247   0.        8.14      0.        0.538     6.142    91.7
   3.9769    4.      307.       21.      396.9      18.72   ]


- 데이터 전처리(Scaling)를 해보자.

In [9]:
mean = train_data.mean(axis=0)
train_data -= mean   # train_data = train_data - mean 과 같은 표현 
std = train_data.std(axis=0)
train_data /= std

train_data[0]

array([-0.27224633, -0.48361547, -0.43576161, -0.25683275, -0.1652266 ,
       -0.1764426 ,  0.81306188,  0.1166983 , -0.62624905, -0.59517003,
        1.14850044,  0.44807713,  0.8252202 ])

- 모델을 구성해보자.

In [10]:
from keras import models
from keras import layers

def build_model():
  model = models.Sequential()
  model.add(layers.Dense(64, activation='relu', input_shape=(train_data.shape[1], )))
  model.add(layers.Dense(64, activation='relu'))
  model.add(layers.Dense(1))
  model.compile(optimizer='rmsprop',
                loss='mse',
                metrics=['mse'])
  return model

- K-Folder 검증을 이용하여 훈련을 검증해보자.

In [11]:
import numpy as np 

k = 4                                       # folder 개수
num_val_samples = len(train_data) // k      # 한 folder의 데이터 수 
all_scores = []

for i in range(k):
  print('처리중인 폴드 #', i)

  # 검증 데이터의 준비 
  val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
  val_labels = train_labels[i * num_val_samples: (i + 1) * num_val_samples]

  # 학습 데이터의 준비
  data1 = train_data[:i * num_val_samples]
  data2 = train_data[(i + 1) * num_val_samples :]
  data1_labels = train_labels[:i * num_val_samples]
  data2_labels = train_labels[(i + 1) * num_val_samples :]

  partial_train_data = np.concatenate([data1, data2], axis=0)
  partial_train_labels = np.concatenate([data1_labels, data2_labels], axis=0)

  # 모델의 학습
  model = build_model()
  model.summary()

  model.fit(partial_train_data, partial_train_labels, epochs=500, batch_size=128, verbose=0)

  # 모델의 검증
  val_mse, val_mae = model.evaluate(val_data, val_labels)
  print(val_mse, val_mae)
  all_scores.append(val_mae)

처리중인 폴드 # 0
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 64)                896       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 1)                 65        
                                                                 
Total params: 5,121
Trainable params: 5,121
Non-trainable params: 0
_________________________________________________________________
6.92387056350708 6.92387056350708
처리중인 폴드 # 1
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 64)                896       
                                        

In [12]:
np.mean(all_scores)

10.594899296760559