# 高度なディープラーニングのベストプラクティス

- Keras Functional API    
- Keras のコールバックの使用    
- TensorBoardの操作    
- 最先端のモデルを開発するための重要なベストプラクティス    

今までは1対1のモデルだった。

- 多対１モデル    

例えば古着の市場価格を予測するモデルを考える。古着画像と、ユーザのメタデータ、ユーザーの書いたテキストデータを入力にして訓練したい。単純な方法は3つのモデルを別々に訓練して、それらの予測値の荷重平均を取ることだが、最適ではない。モデルによって抽出された情報は重複している可能性があるから。
もっと良いのは1つのモデルで同時に学習すること。

- １対多モデル    

小説のデータがあってジャンル分類し執筆した時期も推定したい場合。2つのモデルを訓練してもいいが、これら二つの属性は統計的に独立しているわけではないので、ジャンルと執筆時期を同時に予測するための学習を行えば、より高性能になるはず。ジャンルと執筆時期には相関関係があるため、小説の執筆時期がわかれば、ジャンル空間のより性格で豊かな表現をモデルが学習するのに役立つはず。
なんと！

- その他モデル

inceptionやresnetなど

- Keras Functional API


https://qiita.com/miyamotok0105/items/ccf6d0b52622d0697b0f


In [11]:
!pip list | grep Keras

Keras                    2.1.6      


# 多入力モデル

layers.addやlayers.concatenateで加算や連結といった、複数テンソルを結合できる層が使用できる。

今回は質問応答モデルで、自然言語での質問と、その質問に答えるための情報を提供するテキスト（ニュース記事など）という2つの入力。入力を元に答えを生成。

In [1]:
from keras.models import Model
from keras import layers
from keras import Input

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

text_input = Input(shape=(None,), dtype='int32', name='text')

embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)

question_input = Input(shape=(None,), dtype='int32', name='question')
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)

answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)

model = Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=['acc'])

Using TensorFlow backend.


In [2]:
#多入力モデルへのデータの供給
import numpy as np
num_samples = 1000
max_length = 100

text = np.random.randint(1, text_vocabulary_size,
                        size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size,
                            size=(num_samples, max_length))

answers = np.zeros(shape=(num_samples, answer_vocabulary_size))
indices = np.random.randint(0, answer_vocabulary_size, size=num_samples)
for i, x in enumerate(answers):
    x[indices[i]] = 1

#入力リストを使った適応
model.fit([text, question], answers, epochs=10, batch_size=128)



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fd1a19f09e8>

In [3]:
#入力ディクショナリを使用
model.fit({'text': text, 'question': question}, answers, epochs=10, batch_size=128)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fd1a19dae10>

# 多出力モデル

ソーシャルメディアへの投稿からユーザの年齢、性別、所得水準といった複数の属性を予測。

In [0]:
from keras import layers
from keras import Input
from keras.models import Model

vocabulary_size = 500
num_income_groups = 10

posts_input = Input(shape=(None,) ,dtype='int32', name='posts')

embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)
x = layers.Conv1D(128, 5, activation='relu', name='conv1')(embedded_posts)
x = layers.MaxPooling1D(5, name='pool1')(x)
x = layers.Conv1D(256, 5, activation='relu', name='conv2')(x)
x = layers.Conv1D(256, 5, activation='relu', name='conv3')(x)
#入力サイズ的にここまで畳み込めない
#各層に名前をつけて、どの層でエラーになってるか見ること
# x = layers.MaxPooling1D(5, name='pool2')(x)
# x = layers.Conv1D(256, 5, activation='relu', name='conv4')(x)
# x = layers.Conv1D(256, 5, activation='relu', name='conv5')(x)
x = layers.GlobalMaxPooling1D(name='gcp1')(x)
x = layers.Dense(128, activation='relu', name='fc1')(x)

age_prediction = layers.Dense(1, name='age')(x)
income_prediction = layers.Dense(num_income_groups,
                                activation='softmax',
                                name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)
model = Model(posts_input,
             [age_prediction, income_prediction, gender_prediction])

リストで渡すと損失値を総和を取る。

In [0]:
model.compile(optimizer='rmsprop',
             loss=['mse',
                  'categorical_crossentropy',
                  'binary_crossentropy'])

In [0]:
#多入力モデルへのデータの供給
import numpy as np
num_samples = 1000

posts = np.random.randint(1, size=(num_samples, 200))
age_targets = np.random.randint(1, size=(num_samples, 1))
income_targets = np.random.randint(1, size=(num_samples, 10))
gender_targets = np.random.randint(1, size=(num_samples, 1))


In [35]:
model.fit(posts, [age_targets, income_targets, gender_targets], epochs=1, batch_size=128)


Epoch 1/1


<keras.callbacks.History at 0x7fd122195b00>

In [0]:
model.compile(optimizer='rmsprop',
             loss={'age': 'mse',
                   'income': 'categorical_crossentropy',
                   'gender': 'binary_crossentropy'
             })

In [39]:
model.fit(posts, {'age': age_targets, 'income': income_targets, 'gender': gender_targets}, epochs=1, batch_size=128)


Epoch 1/1


<keras.callbacks.History at 0x7fd11eb61780>

損失関数の貢献度が不均衝(ふきんこう)である場合はモデルの表現は最も大きい損失値を持つタスクを優先する形で最適化される。    

この問題を解決するには損失に重要度を割り当てる。

損失値の尺度が異なる場合に特に役立つ。年齢回帰で平均二乗誤差MSEは一般的に3〜5。それに対し、交差エントロピーは最低0.1になる。損失値の重要度のバランスを取るために、交差エントロピーに10の重み、平均二乗誤差に0.25の重みを割り当てる。

In [0]:
model.compile(optimizer='rmsprop',
             loss=['mse',
                  'categorical_crossentropy',
                  'binary_crossentropy'],
             loss_weights=[0.25, 1., 10.])

In [41]:
#入力リストを使った適応
model.fit(posts, [age_targets, income_targets, gender_targets], epochs=1, batch_size=128)


Epoch 1/1


<keras.callbacks.History at 0x7fd11edc7e48>

In [0]:
model.compile(optimizer='rmsprop',
             loss={'age': 'mse',
                   'income': 'categorical_crossentropy',
                   'gender': 'binary_crossentropy'},
             loss_weights={'age':0.25 ,'income': 1.,'gender': 10.})

In [43]:
model.fit(posts, {'age': age_targets, 'income': income_targets, 'gender': gender_targets}, epochs=1, batch_size=128)


Epoch 1/1


<keras.callbacks.History at 0x7fd11e488b38>

# 層の有向非巡回グラフ


###Inceptionモジュール

In [49]:

from keras.datasets import mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [75]:
print(train_images.shape)
print(test_labels.shape)

(60000, 28, 28, 1)
(10000, 10)


In [79]:
from keras.models import Model
from keras import Input
from keras import layers
from keras import models
num_classes = 10
# img_input = Input(shape=train_images.shape[1:])
img_input = Input(shape=(28, 28, 1))
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv_input')(img_input)
# model = models.Sequential()
# model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))

#縦横出力フィルター計算式
#OH: (H+2P-FH/S) + 1
#OW: (W+2P-FW/S) + 1
#H:入力縦幅#P:パディング#FH:フィルター縦幅#S:ストライド

#OH:(28+2・0-1/2)+1=14.5
branch_a = layers.Conv2D(128, 1, activation='relu', padding='same', strides=2, name="conv_a1")(x)

#OH:(28+2・0-1/1)+1=28
branch_b = layers.Conv2D(128, 1, activation='relu', padding='same', name="conv_b1")(x)
#OH:(28+2・0-3/2)+1=13.5
branch_b = layers.Conv2D(128, 3, activation='relu', padding='same', strides=2, name="conv_b2")(branch_b)

#OH:(28+2・0-3/2)+1=13.5
branch_c = layers.AveragePooling2D(3, strides=2, padding='same', name="apool_c1")(x)
#OH:(13.5+2・0-3/1)+1=11.5
branch_c = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_c1")(branch_c)

branch_d = layers.Conv2D(128, 1, activation='relu', padding='same', name="conv_d1")(x)
branch_d = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_d2")(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', padding='same', strides=2, name="conv_d3")(branch_d)

#shapeが違うとエラー
#(None, 14, 14, 32), (None, 13, 13, 32), (None, 11, 11, 32), (None, 12, 12, 32)
output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)
print(output)
x = layers.Conv2D(32, (3, 3), activation='relu', name='conv_output')(output)
print(x)
x = layers.Flatten()(x)
x = layers.Dense(num_classes)(x)
x = layers.Dense(num_classes, activation='softmax', name='predictions')(x)
model = Model(img_input, [x], name='')

Tensor("concatenate_15/concat:0", shape=(?, 14, 14, 512), dtype=float32)
Tensor("conv_output_6/Relu:0", shape=(?, 12, 12, 32), dtype=float32)


In [80]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
 3776/60000 [>.............................] - ETA: 1:02 - loss: 0.0506 - acc: 0.9831

Epoch 3/5
 9024/60000 [===>..........................] - ETA: 56s - loss: 0.0314 - acc: 0.9891

Epoch 4/5
11456/60000 [====>.........................] - ETA: 54s - loss: 0.0227 - acc: 0.9924

Epoch 5/5
12608/60000 [=====>........................] - ETA: 53s - loss: 0.0168 - acc: 0.9944



<keras.callbacks.History at 0x7fd111170dd8>

# 残差接続

次はスキップ構造も

https://medium.com/@mikeliao/deep-layer-aggregation-combining-layers-in-nn-architectures-2744d29cab8

In [1]:

from keras.datasets import mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [0]:
from keras.models import Model
from keras import Input
from keras import layers
from keras import models
num_classes = 10

img_input = Input(shape=(28, 28, 1))
x = layers.Conv2D(128, 3, activation='relu', padding='same', name='conv_input')(img_input)
y = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_1")(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_2")(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_3")(y)
#残差
y = layers.add([y, x])

x = layers.Flatten()(y)
x = layers.Dense(num_classes)(x)
x = layers.Dense(num_classes, activation='softmax', name='predictions')(x)
model = Model(img_input, [x], name='')

In [3]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=1, batch_size=64)

Epoch 1/1


<keras.callbacks.History at 0x7f6f7e62ecc0>

importの仕方を変えた

In [0]:
from keras.models import Model
from keras import Input
from keras.layers import Conv2D, Add, Flatten, Dense
from keras import models
num_classes = 10

img_input = Input(shape=(28, 28, 1))
x = Conv2D(128, 3, activation='relu', padding='same', name='conv_input')(img_input)
y = Conv2D(128, 3, activation='relu', padding='same', name="conv_1")(x)
y = Conv2D(128, 3, activation='relu', padding='same', name="conv_2")(y)
y = Conv2D(128, 3, activation='relu', padding='same', name="conv_3")(y)
#残差
y = Add()([y, x])

x = Flatten()(y)
x = Dense(num_classes)(x)
x = Dense(num_classes, activation='softmax', name='predictions')(x)
model = Model(img_input, [x], name='')

In [7]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=1, batch_size=64)

Epoch 1/1


<keras.callbacks.History at 0x7f6f4fa28320>

In [8]:
from keras.models import Model
from keras import Input
from keras import layers
from keras import models
num_classes = 10

img_input = Input(shape=(28, 28, 1))
x = layers.Conv2D(128, 3, activation='relu', padding='same', name='conv_input')(img_input)
y = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_1")(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_2")(y)
y = layers.MaxPooling2D(2, strides=2)(y)
print(y)
#14,14,128と同じにする1*1convを使った線形ダウンサンプリング
residual = layers.Conv2D(128, 1, strides=2, padding='same')(x)
print(residual)
#残差
y = layers.add([y, residual])

x = layers.Flatten()(y)
x = layers.Dense(num_classes)(x)
x = layers.Dense(num_classes, activation='softmax', name='predictions')(x)
model = Model(img_input, [x], name='')

Tensor("max_pooling2d_1/MaxPool:0", shape=(?, 14, 14, 128), dtype=float32)
Tensor("conv2d_1/BiasAdd:0", shape=(?, 14, 14, 128), dtype=float32)


In [9]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=1, batch_size=64)

Epoch 1/1


<keras.callbacks.History at 0x7f077e009ef0>

# 訓練中にコールバックを使ってモデル制御

In [24]:
from keras.callbacks import EarlyStopping, TensorBoard, ModelCheckpoint
from keras.callbacks import LearningRateScheduler
from keras.callbacks import CSVLogger
from keras.callbacks import ReduceLROnPlateau

In [25]:
import keras

Callbacks_list = [
    #２エポック以上正解率が改善されないと訓練を中止
    keras.callbacks.EarlyStopping(
    monitor='val_acc',
    patience=1,
    ),
    #エポックごとに保存。lossが改善した場合を除いて保存しない。
    keras.callbacks.ModelCheckpoint(
    filepath="my_model.h5",
        monitor="val_loss",
        save_best_only=True,
    )
]

# model.compile(optimizer='rmsprop',
#               loss='categorical_crossentropy',
#               metrics=['accuracy'])
# model.fit(train_images, train_labels, epochs=1, batch_size=64, callbacks=Callbacks_list)

In [0]:
#損失率下げる
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                              patience=5, min_lr=0.001)
model.fit(X_train, Y_train, callbacks=[reduce_lr])

In [0]:
#mkdir my_log_dir
# tb_cb = keras.callbacks.TensorBoard(log_dir="~/tflog/", histogram_freq=1)
# cbks = [tb_cb]
# model.fit(x_train, y_train,
#                     batch_size=batch_size,
#                     epochs=epochs,
#                     verbose=1,
#                     ## add 1 line
#                     callbacks=cbks,
#                     validation_data=(x_test, y_test))

In [0]:
#tensorboard --logdir=~/tflog/

# 層の重みの共有


Functional APIでは層を再利用でき、重みは共有されることになる。
2つの文章の類似性を評価するモデルがあるとする。このモデルは、入力として2つの文章を受け取り、出力に０〜１のスコアを返す。AのBに対する類似性とBのAに対する類似性は同じなので共有するべき。    

Siamese LSTMまたは共有LSTM (shared LSTM)と呼んでる。


In [22]:
from keras.models import Model
from keras import Input
from keras import layers
from keras import models
num_classes = 10

lstm = layers.LSTM(32)

left_input = Input(shape=(None, 128))
left_output = lstm(left_input)

right_input = Input(shape=(None, 128))
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)

model = Model([left_input, right_input], predictions)
# model.fit([left_data, right_data], targets)


#バッチ正規化

バッチごとのデータの平均と分散の指数移動平均を内部で維持する。主な役割は残差接続と同じく、勾配の伝播を助けること。

In [21]:
from keras.models import Model
from keras import Input
from keras import layers
from keras import models
num_classes = 10

img_input = Input(shape=(28, 28, 1))
x = layers.Conv2D(128, 3, activation='relu', padding='same', name='conv_input')(img_input)
y = layers.BatchNormalization()(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_1")(x)
y = layers.BatchNormalization()(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same', name="conv_2")(y)
y = layers.MaxPooling2D(2, strides=2)(y)

x = layers.Flatten()(y)
x = layers.Dense(num_classes)(x)
x = layers.Dense(num_classes, activation='softmax', name='predictions')(x)
model = Model(img_input, [x], name='')

# dw畳み込み

dw畳み込みはXceptionで使われてる。
conv2dより軽量でパラメータ数が減る。入力の各チャネルで空間畳み込み演算を別々に実行した後に、pw畳み込み(1*1畳み込み)演算を通じて出力チャネルを連結する。
空間特徴量の学習とチャネルごとの特徴量の学習を切り離す。「入力の空間的な位置同士は高い相関にあるものの、異なるチャネル同士はほぼ独立している」と想定される場合に有効。
データが少ない時に有効な傾向。

In [18]:
import keras
from keras import layers
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import GlobalAveragePooling2D
from keras import backend as K
from keras.optimizers import SGD, Adadelta, Adagrad, Adam, Adamax, RMSprop, Nadam

height = 64
width = 64
channels = 3
num_classes = 10

model = Sequential()
model.add(layers.SeparableConv2D(32, 3, activation='relu', input_shape=(height, width, channels, )))
model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.MaxPooling2D(2))

model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
model.add(layers.MaxPooling2D(2))

model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
model.add(layers.GlobalAveragePooling2D())

model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(num_classes, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# ハイパーパラメータの最適化

https://medium.com/machine-learning-world/neural-networks-for-algorithmic-trading-hyperparameters-optimization-cb2b4a29b8ee

http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html#hyperparameter-optimization


In [11]:
!pip install hyperopt==0.1
!pip install networkx==1.11

Collecting hyperopt==0.1
[?25l  Downloading https://files.pythonhosted.org/packages/39/51/16e9edb51ffbf64bd80f41b7d30bc037aa8b157d430c276464c9b8768b67/hyperopt-0.1.tar.gz (98kB)
[K    100% |████████████████████████████████| 102kB 2.9MB/s 
Collecting nose (from hyperopt==0.1)
[?25l  Downloading https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl (154kB)
[K    100% |████████████████████████████████| 163kB 4.0MB/s 
[?25hCollecting pymongo (from hyperopt==0.1)
[?25l  Downloading https://files.pythonhosted.org/packages/91/0f/03409241acedb5e27c5ddfef86e1e6b88c1a3864be83d59d796cc06a52b2/pymongo-3.7.0-cp36-cp36m-manylinux1_x86_64.whl (408kB)
[K    100% |████████████████████████████████| 409kB 6.6MB/s 
Building wheels for collected packages: hyperopt
  Running setup.py bdist_wheel for hyperopt ... [?25l- \ | done
[?25h  Stored in directory: /content/.cache/pip/wheels/32/69/f5/3267146c22e76dbf8c5e13

In [12]:
from __future__ import print_function
import random
import numpy as np
from PIL import Image

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import GlobalAveragePooling2D
from keras import backend as K
from keras.optimizers import SGD, Adadelta, Adagrad, Adam, Adamax, RMSprop, Nadam

class ValiableInputModel():
    def __init__(self, params):
        self.params = params
        
        self.num_classes = 10
        self.input_width, self.input_height = None, None #入力層を可変
        self.input_shape = ()
        self.model = Sequential()
        pass

    def init_data(self, img_rows=28, img_cols=28):
            (x_train, y_train), (x_test, y_test) = mnist.load_data()
            if K.image_data_format() == 'channels_first':
                x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
                x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
                self.input_shape = (1, self.input_width, self.input_height)
            else:
                x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
                x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
                self.input_shape = (self.input_width, self.input_height, 1)
            
            #後処理
            x_train = x_train.astype('float32')
            x_test = x_test.astype('float32')
            x_train /= 255
            x_test /= 255
                        
            y_train = keras.utils.to_categorical(y_train, self.num_classes)
            y_test = keras.utils.to_categorical(y_test, self.num_classes)
            return (x_train, y_train), (x_test, y_test)

    def resize_image(self, img, image_width_size, image_height_size):
            img = img.reshape(img.shape[0], img.shape[1])
            img = Image.fromarray(np.uint8(img))
            img = img.resize((image_width_size, image_height_size))
            img = np.asarray(img)
            return img.reshape(image_width_size,image_height_size,1)
            
    def resize_data(self, x_train, x_test, image_width_size=28, image_height_size=28):
            #28 28の画像から任意サイズに変換
            x_train = np.array([self.resize_image(img, image_width_size, image_height_size) for img in x_train])
            x_test = np.array([self.resize_image(img, image_width_size, image_height_size) for img in x_test])
            return x_train, x_test            

#     def build_model(self):
#           self.model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=self.input_shape)) #input shape
#           self.model.add(Conv2D(64, (3, 3), activation='relu'))
#           self.model.add(MaxPooling2D(pool_size=(2, 2)))
#           self.model.add(Dropout(0.25))
#           self.model.add(GlobalAveragePooling2D())
#           self.model.add(Dense(128, activation='relu'))
#           self.model.add(Dropout(0.5))
#           self.model.add(Dense(self.num_classes, activation='softmax'))
#           self.model.compile(loss=keras.losses.categorical_crossentropy,
#                         optimizer=keras.optimizers.Adadelta(),
#                         metrics=['accuracy'])

    def build_model(self):
          #test1：最終レイヤーはgcp-fcに固定。これに合う前のレイヤーを探索する。
          #仮説：gcpは最終出力サイズが7 7くらいの小さいサイズになるまで縮小されてるとうまくいくのでは。
          #conv1-func1-conv2-func2-maxpool-gcp-fc
          #conv1-func1-maxpool-gcp-fc
          #conv1-func1-maxpool-conv2-func2-maxpool
          #test2：最終レイヤーをfcにしちゃえば精度出るのでは。
          c1_c = self.params['n_conv1_chanel']
          c1_k = self.params['n_conv1_kernel']
          func1 = self.params['func1']
          b_p1 = self.params['b_max_pooling1']
          m_p1 = self.params['n_max_pooling1']
          c2_c = self.params['n_conv2_chanel']
          c2_k = self.params['n_conv2_kernel']
          func2 = self.params['func2']
          b_p2 = self.params['b_max_pooling2']
          m_p2 = self.params['n_max_pooling2']
          c1n= self.params['conv1_layer_num']
          c2n = self.params['conv2_layer_num']
          opt = self.params['optimizer_name']
          lr = self.params['lr']
          
          if c1n == 1:
              self.model.add(Conv2D(c1_c, kernel_size=(c1_k, c1_k), activation=func1, input_shape=self.input_shape)) #input shape
          elif c1n == 2:
              self.model.add(Conv2D(c1_c, kernel_size=(c1_k, c1_k), activation=func1, input_shape=self.input_shape)) #input shape
              self.model.add(Conv2D(c1_c, (c1_k, c1_k), activation=func1))
          else:
              self.model.add(Conv2D(c1_c, kernel_size=(c1_k, c1_k), activation=func1, input_shape=self.input_shape)) #input shape
              self.model.add(Conv2D(c1_c, (c1_k, c1_k), activation=func1))
              self.model.add(Conv2D(c1_c, (c1_k, c1_k), activation=func1))
          if b_p1 == True:
              self.model.add(MaxPooling2D(pool_size=(m_p1, m_p1), padding='same'))
          if c2n == 1:
              self.model.add(Conv2D(c2_c, (c2_k, c2_k), activation=func2))
          elif c2n == 2:
              self.model.add(Conv2D(c2_c, (c2_k, c2_k), activation=func2))
              self.model.add(Conv2D(c2_c, (c2_k, c2_k), activation=func2))
          else:
              self.model.add(Conv2D(c2_c, (c2_k, c2_k), activation=func2))
              self.model.add(Conv2D(c2_c, (c2_k, c2_k), activation=func2))
              self.model.add(Conv2D(c2_c, (c2_k, c2_k), activation=func2))
          if b_p2 == True:
              self.model.add(MaxPooling2D(pool_size=(m_p2, m_p2), padding='same'))
          self.model.add(Dropout(0.25))
          self.model.add(GlobalAveragePooling2D())
          self.model.add(Dense(128, activation='relu'))
          self.model.add(Dropout(0.5))
          self.model.add(Dense(self.num_classes, activation='softmax'))
          #"Adam', 'AdaDelta', "SGD", "Adamax", "Adagrad",  "RMSprop", "Nadam"
          if opt == "SGD":
              self.model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(lr=lr), metrics=['accuracy'])
          elif opt == "Adam":
              self.model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(lr=lr), metrics=['accuracy'])
          elif opt == "AdaDelta":
              self.model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(lr=lr), metrics=['accuracy'])
          elif opt == "RMSprop":
              self.model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.RMSprop(lr=lr), metrics=['accuracy'])
          elif opt == "Adamax":
              self.model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adamax(lr=lr), metrics=['accuracy'])
          elif opt == "Nadam":
              self.model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Nadam(lr=lr), metrics=['accuracy'])
          elif opt == "Adagrad":
              self.model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adagrad(lr=lr), metrics=['accuracy'])

    def run_model(self, x_train, y_train, x_test, y_test, epochs = 100, batch_size = 128):  

          self.model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))
         
    def evaluate_model(self, x_test, y_test):
          score = self.model.evaluate(x_test, y_test, verbose=0)
          print('Test loss:', score[0])
          print('Test accuracy:', score[1])
          return score

In [13]:
from hyperopt import fmin, tpe, hp
import hyperopt.pyll
from hyperopt.pyll import scope

def main(params):
    epoch = 3
    n_out = 10
    batchsize = 100
    print("params:", params)
    
    model = ValiableInputModel(params)
    (x_train, y_train), (x_test, y_test) = model.init_data()
    x_train, x_test = model.resize_data(x_train, x_test, 30, 30)
    model.build_model()
    model.run_model(x_train, y_train, x_test, y_test, epochs=epoch)
    score = model.evaluate_model(x_test, y_test)
    return score[0]
    


In [14]:
if __name__ == '__main__':
    space = {
       'n_conv1_chanel': scope.int(hp.quniform('n_conv1_chanel', 32, 42, 52)),
       'n_conv1_kernel': scope.int(hp.quniform('n_conv1_kernel', 3, 4, 5)),
       'func1': hp.choice('func1', ('relu', 'sigmoid')),
       'b_max_pooling1': hp.choice('b_max_pooling1', ('True', 'False')),
       'n_max_pooling1': scope.int(hp.quniform('n_max_pooling1', 2, 3, 4)),
       'n_conv2_chanel': scope.int(hp.quniform('n_conv2_chanel', 32, 42, 52)),
       'n_conv2_kernel': scope.int(hp.quniform('n_conv2_kernel', 3, 4, 5)),
       'func2': hp.choice('func2', ('relu', 'sigmoid')),
       'b_max_pooling2': hp.choice('b_max_pooling2', ('True', 'False')),
       'n_max_pooling2': scope.int(hp.quniform('n_max_pooling2', 2, 3, 4)),
       'conv1_layer_num': scope.int(hp.quniform('conv1_layer_num', 1, 2, 3)),
       'conv2_layer_num': scope.int(hp.quniform('conv2_layer_num', 1, 2, 3)),
       'optimizer_name': hp.choice('optimizer_name',
                                   ('Adam', 'AdaDelta', "SGD", "Adamax", "Adagrad",  "RMSprop", "Nadam")),
       'lr': hp.uniform('lr', 0.005, 0.02),
             }
    best = fmin(main, space, algo=tpe.suggest, max_evals=50)
    print("best parameters", best)

params: {'b_max_pooling1': 'False', 'b_max_pooling2': 'False', 'conv1_layer_num': 0, 'conv2_layer_num': 3, 'func1': 'relu', 'func2': 'relu', 'lr': 0.008694985354686258, 'n_conv1_chanel': 52, 'n_conv1_kernel': 5, 'n_conv2_chanel': 52, 'n_conv2_kernel': 5, 'n_max_pooling1': 4, 'n_max_pooling2': 4, 'optimizer_name': 'Adamax'}
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
Train on 60000 samples, validate on 10000 samples
Epoch 1/3

Epoch 2/3
Epoch 3/3

Test loss: 2.301231113433838
Test accuracy: 0.1135
params: {'b_max_pooling1': 'True', 'b_max_pooling2': 'True', 'conv1_layer_num': 3, 'conv2_layer_num': 0, 'func1': 'relu', 'func2': 'relu', 'lr': 0.017776002890461117, 'n_conv1_chanel': 52, 'n_conv1_kernel': 5, 'n_conv2_chanel': 52, 'n_conv2_kernel': 5, 'n_max_pooling1': 4, 'n_max_pooling2': 4, 'optimizer_name': 'Adagrad'}
Train on 60000 samples, validate on 10000 samples
Epoch 1/3

Epoch 2/3
Epoch 3/3

Test loss: 14.680361024475097
Test accuracy: 0.0892
params: {'b_max_pooling1': 'False', 'b_max_pooling2': 'True', 'conv1_layer_num': 0, 'conv2_layer_num': 0, 'func1': 'sigmoid', 'func2': 'relu', 'lr': 0.014200335234618545, 'n_conv1_chanel': 52, 'n_conv1_kernel': 5, 'n_conv2_chanel': 52, 'n_conv2_kernel': 5, 'n_max_pooling1': 4, 'n_max_pooling2': 4, 'optimizer_name': 'RMSprop'}
Train on 60000 samples, validate on 10000 samples
Epoch 1/3

Epoch 2/3
Epoch 3/3



KeyboardInterrupt: ignored

# モデルのアンサンブル

- スタッキングとアンサンブルのkaggle titanic例    

https://github.com/miyamotok0105/kaggle_sample/blob/master/titanic/sample1.ipynb

- 犬猫のkaggle 例    

https://qiita.com/miyamotok0105/items/8d84eecc21a4d6fd77f9

- 予備知識    

https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/05_Ensemble_Learning.ipynb
https://www.kaggle.com/jananesekaran/99-45-cnn-batchnorm-ensembling
https://www.kaggle.com/daisukelab/example-of-weighted-ensemble
https://github.com/alno/kaggle-allstate-claims-severity/blob/master/keras_util.py






CIFAR10でやってみようと思う。一旦普通に回す。

In [1]:
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
import numpy as np

batch_size = 32
num_classes = 10
epochs = 1
data_augmentation = True
num_predictions = 20
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'keras_cifar10_trained_model.h5'

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255


Using TensorFlow backend.


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples


In [2]:


class Runner():
     
    def init_model(self):
        model = Sequential()
        model.add(Conv2D(32, (3, 3), padding='same',
                         input_shape=x_train.shape[1:]))
        model.add(Activation('relu'))
        model.add(Conv2D(32, (3, 3)))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        model.add(Conv2D(64, (3, 3), padding='same'))
        model.add(Activation('relu'))
        model.add(Conv2D(64, (3, 3)))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        model.add(Flatten())
        model.add(Dense(512))
        model.add(Activation('relu'))
        model.add(Dropout(0.5))
        model.add(Dense(num_classes))
        model.add(Activation('softmax'))

        opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
        model.compile(loss='categorical_crossentropy',
                      optimizer=opt,
                      metrics=['accuracy'])
        return model
      
    def fit_model(self, model, data_augmentation, x_train, y_train, x_test, y_test):
        if not data_augmentation:
            print('Not using data augmentation.')
            model.fit(x_train, y_train,
                      batch_size=batch_size,
                      epochs=epochs,
                      validation_data=(x_test, y_test),
                      shuffle=True)
        else:
            print('Using real-time data augmentation.')
            datagen = ImageDataGenerator(
                featurewise_center=False,  # set input mean to 0 over the dataset
                samplewise_center=False,  # set each sample mean to 0
                featurewise_std_normalization=False,  # divide inputs by std of the dataset
                samplewise_std_normalization=False,  # divide each input by its std
                zca_whitening=False,  # apply ZCA whitening
                zca_epsilon=1e-06,  # epsilon for ZCA whitening
                rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
                width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
                height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
                shear_range=0.,  # set range for random shear
                zoom_range=0.,  # set range for random zoom
                channel_shift_range=0.,  # set range for random channel shifts
                fill_mode='nearest',  # set mode for filling points outside the input boundaries
                cval=0.,  # value used for fill_mode = "constant"
                horizontal_flip=True,  # randomly flip images
                vertical_flip=False,  # randomly flip images
                rescale=None,  # set rescaling factor (applied before any other transformation)
                preprocessing_function=None,  # set function that will be applied on each input
                data_format=None,  # image data format, either "channels_first" or "channels_last"
                validation_split=0.0)  # fraction of images reserved for validation (strictly between 0 and 1)
            datagen.fit(x_train)
            model.fit_generator(datagen.flow(x_train, y_train,
                                             batch_size=batch_size),
                                epochs=epochs,
                                validation_data=(x_test, y_test),
                                workers=4)
        return model
            
    def evaluate_model(self, model):
        scores = model.evaluate(x_test, y_test, verbose=1)
        print('Test loss:', scores[0])
        print('Test accuracy:', scores[1])
        

    def evaluate_error(self, model):
        pred = model.predict(x_test, batch_size = 32)
        pred = np.argmax(pred, axis=1)
        pred = np.expand_dims(pred, axis=1) # make same shape as y_test
        error = np.sum(np.not_equal(pred, y_test)) / y_test.shape[0]  
        return error
      

In [3]:
runner = Runner()
model = runner.init_model()
model = runner.fit_model(model, False, x_train, y_train, x_test, y_test)
# runner.evaluate_model(model)
runner.evaluate_error(model) #todo:この数字は何？

Not using data augmentation.
Train on 50000 samples, validate on 10000 samples
Epoch 1/1



8.9865

In [4]:
#error 
pred = model.predict(x_test, batch_size = 32)
pred = np.argmax(pred, axis=1)
pred = np.expand_dims(pred, axis=1) # make same shape as y_test
error = np.sum(np.not_equal(pred, y_test)) / y_test.shape[0]  
print(error)

8.9865


In [5]:
#acc 
pred = model.predict(x_test, batch_size = 32)
pred = np.argmax(pred, axis=1)
pred = np.expand_dims(pred, axis=1) # make same shape as y_test

y_test_index = []
for y in y_test:
    index = np.where(np.array(y) == 1)
    y_test_index.append(list(index[0]))

print(pred.shape)
print(np.array(y_test_index).shape)
acc = np.sum(np.equal(pred, y_test_index)) / y_test.shape[0]  
print(acc)

(10000, 1)
(10000, 1)
0.4607


In [6]:
runner.evaluate_model(model)

Test loss: 1.4890676055908203
Test accuracy: 0.4607


Test accuracy: 0.6098ね


In [7]:
model1 = Sequential()
model1.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model1.add(Activation('relu'))
model1.add(Conv2D(32, (3, 3)))
model1.add(Activation('relu'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.25))

model1.add(Flatten())
model1.add(Dense(100))
model1.add(Activation('relu'))
model1.add(Dropout(0.5))
model1.add(Dense(num_classes))
model1.add(Activation('softmax'))

opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
model1.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

In [8]:
runner = Runner()
model1 = runner.fit_model(model1, False, x_train, y_train, x_test, y_test)
runner.evaluate_error(model1)

Not using data augmentation.
Train on 50000 samples, validate on 10000 samples
Epoch 1/1


9.0909

In [10]:
runner.evaluate_model(model1)

Test loss: 1.5545792999267578
Test accuracy: 0.459


In [9]:
#acc 
pred = model.predict(x_test, batch_size = 32)
pred1 = model1.predict(x_test, batch_size = 32)
final_pred = 0.25 * (pred + pred1)

final_pred = np.argmax(final_pred, axis=1)
final_pred = np.expand_dims(final_pred, axis=1) # make same shape as y_test

y_test_index = []
for y in y_test:
    index = np.where(np.array(y) == 1)
    y_test_index.append(list(index[0]))

print(final_pred.shape)
print(np.array(y_test_index).shape)
acc = np.sum(np.equal(final_pred, y_test_index)) / y_test.shape[0]  
print(acc)

(10000, 1)
(10000, 1)
0.4772



model：0.4607    
model1：0.459    
アンサンブル：0.4772    

微増。この方法がうまくいくのは分類機の性能が似通っている場合。加重平均を求めた方がいい。    

アンサンブルをうまく作成するのにランダムサーチかNelder-Meadといった単純な最適化アルゴリズムを使用すると良い

- kerasの良い例

kerasでこんないい感じに書けるのか。

https://towardsdatascience.com/ensembling-convnets-using-keras-237d429157eb

https://medium.com/randomai/ensemble-and-store-models-in-keras-2-x-b881a6d7693f