## AIF_sprint17-rnn

今回はIMDBデータセットのネガポジ判定を問題として扱う。

使用するモデルは**RNN(Recurrent Neural Network)**を使用する。

RNNは時系列データを扱うために、名前の通り前の時系列の出力を入力として使用する再帰的な箇所が存在し、これによって時系列（データの順序性）の特徴を抽出することができる。

このため、言語処理、音声認識、株価予測、動画データなどを扱うことができる。

In [2]:
import numpy as np
import pandas as pd

from keras.preprocessing import sequence
from keras.models import Sequential, Model
from keras.layers import Input, Dense, Embedding, SimpleRNN, LSTM, GRU

from keras.datasets import imdb

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [3]:
import gc

max_features = 10000
maxlen = 40
batch_size = 32

print('load dataset')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

train_size = 5000
test_size = 5000

x_train, y_train, x_test, y_test = x_train[:train_size], y_train[:train_size], x_test[:test_size], y_test[:test_size]

print('padding')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

load dataset
padding


### SimpleRNN

In [9]:
print('build model')
inp = Input(shape=(maxlen,), dtype='int32', name='main_input')
x = Embedding(max_features, 128)(inp)
simple_rnn_out = SimpleRNN(32, use_bias=False)(x)
predictions = Dense(1, use_bias=False, activation='sigmoid')(simple_rnn_out)
# predictions = Dense(1)(simple_rnn_out)
model = Model(inputs=inp, outputs=predictions)


print(model.summary())

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('train')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=5,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)


build model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
main_input (InputLayer)      (None, 40)                0         
_________________________________________________________________
embedding_6 (Embedding)      (None, 40, 128)           1280000   
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 32)                5120      
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 32        
Total params: 1,285,152
Trainable params: 1,285,152
Non-trainable params: 0
_________________________________________________________________
None
train
Train on 5000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test score: 0.8005634890913963
Test accuracy: 0.7404


### LSTM

In [10]:
# keras.layers.LSTM(units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False)

print('build model')
inp = Input(shape=(maxlen,), dtype='int32', name='main_input')
x = Embedding(max_features, 128)(inp)
simple_rnn_out = LSTM(32)(x)
predictions = Dense(1, activation='sigmoid')(simple_rnn_out)
model = Model(inputs=inp, outputs=predictions)

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('train')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=5,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

build model
train
Train on 5000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test score: 0.9504947330951691
Test accuracy: 0.7396


### GRU

In [11]:
print('build model')
inp = Input(shape=(maxlen,), dtype='int32', name='main_input')
x = Embedding(max_features, 128)(inp)
simple_rnn_out = GRU(32)(x)
predictions = Dense(1, activation='sigmoid')(simple_rnn_out)
model = Model(inputs=inp, outputs=predictions)

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('train')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=5,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

build model
train
Train on 5000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test score: 0.9379061033248901
Test accuracy: 0.7508


３モデルともあまり正答率に違いは見られなかったが、SGDは訓練データでの正答率が100%になってしまっているが、LSTMとGRUはまだ100%になっていない。

### Kerasの中間層の出力データを取得

今回はデータセットしてEmbedding層の出力である単語が分散ベクトル表現になったデータを使用する。

そのためKerasの中間層からの出力を取り出すコードを書く。

In [58]:
import gc

max_features = 10000
maxlen = 40
batch_size = 32

print('load dataset')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

train_size =10000
test_size = 10000

x_train, y_train, x_test, y_test = x_train[:train_size], y_train[:train_size], x_test[:test_size], y_test[:test_size]

print('padding')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('finished')

load dataset
padding
finished


### 中間層から分散表現のデータを抽出

In [59]:
print('build model')
inp = Input(shape=(maxlen,), dtype='int32', name='main_input')
x = Embedding(max_features, 128, name='EMB')(inp)
simple_rnn_out = SimpleRNN(32)(x)

predictions = Dense(1, activation='sigmoid')(simple_rnn_out)
model = Model(inputs=inp, outputs=predictions)

print(model.summary())

layer_name = 'EMB'
intermediate_layer_model = Model(inputs=model.input,
                                  outputs=model.get_layer(layer_name).output)
x_train_vector = intermediate_layer_model.predict(x_train)
x_test_vector = intermediate_layer_model.predict(x_test)

build model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
main_input (InputLayer)      (None, 40)                0         
_________________________________________________________________
EMB (Embedding)              (None, 40, 128)           1280000   
_________________________________________________________________
simple_rnn_4 (SimpleRNN)     (None, 32)                5152      
_________________________________________________________________
dense_10 (Dense)             (None, 1)                 33        
Total params: 1,285,185
Trainable params: 1,285,185
Non-trainable params: 0
_________________________________________________________________
None


In [60]:
x_train_vector.shape
x_train = x_train_vector

In [61]:
x_test_vector.shape
x_test = x_test_vector

In [62]:
x_train.shape

(10000, 40, 128)

データが抽出できたのでスクラッチではこのデータを利用する。

### Chainerでの実装

In [50]:
from chainer.datasets import tuple_dataset
import text_datasets
import chainer
from chainer import training
from chainer.training import extensions

import nets
from nlp_utils import convert_seq


def main():
    
    args={
        'gpu':-1,
        'dataset': 'imdb.binary',
        'model': 'rnn',
        'batchsize': 64,
        'epoch': 3,
        'out': 'result',
        'unit': 100,
        'layer':1,
        'dropout':0.4,
        'char_based': False
    }

    # Load a dataset
    if args['dataset'] == 'dbpedia':
        train, test, vocab = text_datasets.get_dbpedia(
            char_based=args['char_based'])
    elif args['dataset'].startswith('imdb.'):
        print("IMDB datasets")
        train, test, vocab = text_datasets.get_imdb(
            fine_grained=args['dataset'].endswith('.fine'),
            char_based=args['char_based'])
    elif args['dataset'] in ['TREC', 'stsa.binary', 'stsa.fine',
                          'custrev', 'mpqa', 'rt-polarity', 'subj']:
        train, test, vocab = text_datasets.get_other_text_dataset(
            args['dataset'], char_based=args['char_based'])

    print('# train data: {}'.format(len(train)))
    print('# test  data: {}'.format(len(test)))
    print('# vocab: {}'.format(len(vocab)))
    n_class = len(set([int(d[1]) for d in train]))
    print('# class: {}'.format(n_class))
    

    train_iter = chainer.iterators.SerialIterator(train[:1000], args['batchsize'])
    test_iter = chainer.iterators.SerialIterator(test[:1000], args['batchsize'],
                                                 repeat=False, shuffle=False)

    # return train_iter, test_iter
    # Setup a model
    if args['model'] == 'rnn':
        Encoder = nets.RNNEncoder
        print(type(Encoder))
    elif args['model'] == 'cnn':
        Encoder = nets.CNNEncoder
    elif args['model'] == 'bow':
        Encoder = nets.BOWMLPEncoder

    encoder = Encoder(n_layers=args['layer'], n_vocab=len(vocab),
                      n_units=args['unit'], dropout=args['dropout'])
    model = nets.TextClassifier(encoder, n_class)
    if args['gpu'] >= 0:
        # Make a specified GPU current
        chainer.backends.cuda.get_device_from_id(args['gpu']).use()
        model.to_gpu()  # Copy the model to the GPU
    

    # Setup an optimizer
    optimizer = chainer.optimizers.Adam()
    optimizer.setup(model)
    optimizer.add_hook(chainer.optimizer.WeightDecay(1e-4))


    # Set up a trainer
    updater = training.updaters.StandardUpdater(
        train_iter, optimizer,
        converter=convert_seq, device=args['gpu'])
    trainer = training.Trainer(updater, (args['epoch'], 'epoch'), out=args['out'])

    # Evaluate the model with the test dataset for each epoch
    trainer.extend(extensions.Evaluator(
        test_iter, model,
        converter=convert_seq, device=args['gpu']))

    # Take a best snapshot
    record_trigger = training.triggers.MaxValueTrigger(
        'validation/main/accuracy', (1, 'epoch'))
    trainer.extend(extensions.snapshot_object(
        model, 'best_model.npz'),
        trigger=record_trigger)

    # Write a log of evaluation statistics for each epoch
    trainer.extend(extensions.LogReport())
    trainer.extend(extensions.PrintReport(
        ['epoch', 'main/loss', 'validation/main/loss',
         'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))

    # Print a progress bar to stdout
    trainer.extend(extensions.ProgressBar())

    print("STRAT Training!")
    # Run the training
    trainer.run()
    print("Finished!")


if __name__ == '__main__':
    main()

IMDB datasets
read imdb
constract vocabulary based on frequency
# train data: 25000
# test  data: 25000
# vocab: 20000
# class: 2
<class 'type'>
STRAT Training!
epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
[J1           0.491309    0.118182              0.902344       1                         124.664       
[J2           0.02038     0.00029076            1              1                         259.623       
[J3           0.000178314  9.32843e-05           1              1                         385.878       
[JFinished!


### スクラッチによるSimpleRNNの実装

In [None]:
class Recurrent:
    """
    Vanilla RNNレイヤー
    バイアス項なし
    
    コードでは下記の変数名を使用する
    N : 文書（データ）数
    T : 時系列（今回の場合は何番目の単語か）
    D : 各単語のベクトルの次元数
    H : 隠れ層のユニット数
    """
    
    def __init__(self,W, U):
        
        self.W, self.U = W, U
        self.dW, self.dU = np.zeros_like(self.W), np.zeros_like(self.U)
        self.data_size, self.len_word, self.output_dim = None, None, None
        
        self.x, self.out = None, None

        
    def step_forward(self, x_current, prev_state):
        """
        input:
        x_current  : shape [N D] 文ごとの現在の順序の単語ベクトル
        prev_state : shape [N H] 前の単語までの状態（出力）
        
        return:
        current_state : shape [N H] 現在の単語までの状態（出力）　次の時系列で使用
        add_affines : [N H] 活性化関数のbackwardで使用する
        """
        # それぞれ重みを乗算して足し合わせる
        add_affines = np.dot(x_current, self.U) + np.dot(prev_state, self.W)
        # 活性化関数(tanh)を通す
        current_state = np.tanh(add_affines)
        
        return current_state, add_affines

        
    def forward(self, x):
        """
        x  : shape [N T D] 入力データ　（データ数　単語数　ベクトルの次元）
        """
        N, T, D = x.shape
        H = self.W.shape[0]
        
        # 前の単語のフォワードデータを格納する
        self.prev_state = np.zeros([T, N,  H])
        # tanhのbackwardで利用するため 加算レイヤーでの値を保持する
        self.add_affines = np.zeros(([T, N,  H]))
        
        # 初期値　もっと良い書き方ないか 
        initial_prev_state = np.zeros([N, H])
        
        # 単語数分繰り返す
        for t in range(T):
            if t == 0:
                self.prev_state[t], self.add_affines[t] = self.step_forward(x[:,t,:], initial_prev_state)
            else:
                self.prev_state[t], self.add_affines[t] = self.step_forward(x[:,t,:], self.prev_state[t-1])
        
        self.x = x
        
        # 今回はmany to one なので最後の状態のみを出力とする
        return self.prev_state[-1]
    

    def step_backward(self, t, dout=None):
        '''
        誤差逆伝播の１サイクル分
        一番最新（順番が最後）からさかのぼっていく
        dout : 後ろの層から来た誤差
        ds : 前のサイクルから来た状態の誤差
        '''    
        # tanhのbackward
        delta = (1 - np.tanh(self.add_affines[t])**2) * dout
        
        # W prev_s側
        self.dW += np.dot(self.prev_state[t-1].T, delta)
        ds  = np.dot(delta, self.W.T)
        
        # U x 側
        self.dU += np.dot(self.x[:, t,:].T, delta)
        dx = np.dot(delta, self.U.T) 
        
        return dx, ds
    
    
    def backward(self, dout=None):
        """
        back propagation through time
        """

        N, T, D = x.shape
        H = self.W.shape[0]
        
        if dout is None:
            dout = np.ones([N, H])
        
        dx = np.zeros_like(self.x)
        
        self.dW = np.zeros_like(self.W)
        self.dU = np.zeros_like(self.U)

        ds =dout
        for t in reversed(range(T)):
#         for t in reversed(range(T-5, T)):
            dx_once, ds = self.step_backward(t, ds)
        
        # XXX : 最後のxの出力が分からないので０で返す 
        # 現状は第一層なので捨てられるので問題ない
        # 単語数分やらないのにdxを算出できないのでは？
        return dx
                

In [27]:
# hidden_unit_num = 32

# test_U = np.random.randn(x_train[:10].shape[2], hidden_unit_num)
# test_W = np.random.randn(hidden_unit_num, hidden_unit_num)

# rnn_layer = Recurrent(test_W, test_U)
# rnn_layer.forward(x_train[:10]).shape

In [28]:
# dx = rnn_layer.backward(rnn_layer.forward(x_train_vector[:10]))
# dx.shape # 今の所０で返ってくる

In [51]:
class Affine_mod:
    """
    全結合層の修正版（バイアス項を除いただけ）
    """
    def __init__(self, W):
        self.W =W
        #self.b = b
        self.x = None
        self.dW = None
        #self.db = None

        
    def forward(self, x):
        self.x = x
        out = np.dot(self.x, self.W)

        return out

    
    def backward(self, dout):
        
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        #self.db = np.sum(dout, axis=0)

        return dx

In [63]:
# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # 親ディレクトリのファイルをインポートするための設定
import numpy as np
from collections import OrderedDict
from common.layers import *
from common.gradient import numerical_gradient


class MultiLayerNet:
    
    """全結合による多層ニューラルネットワーク
    Parameters
    ----------
    input_size : 入力サイズ（MNISTの場合は784）
    output_size : 出力サイズ（MNISTの場合は10）
    """
    def __init__(self, input_size, hidden_size_list, output_size):
        self.input_size = input_size
        self.output_size = output_size
        #self.hidden_size_list = hidden_size_list

        self.params = {}

        # 重みの初期化
        hidden_unit_num = 128
        self.params['U'] = np.random.randn(self.input_size, hidden_unit_num)/np.sqrt(self.input_size)
        self.params['W1'] = np.random.randn(hidden_unit_num, hidden_unit_num)  /np.sqrt(hidden_unit_num)
        self.params['W2'] = np.random.randn(hidden_unit_num, 2)/ np.sqrt(hidden_unit_num)

        # レイヤの生成
        self.layers = OrderedDict()
        self.layers['RNN'] = Recurrent(self.params['W1'], self.params['U'])
        self.layers['Affine'] = Affine_mod(self.params['W2'])
        self.last_layer = SoftmaxWithLoss()

    
    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        return x

    
    def loss(self, x, t):
        y_pred = self.predict(x)
        
        return self.last_layer.forward(y_pred, t)
    

    def accuracy(self, x, t):
        pred = self.predict(x)
        pred = np.argmax(pred, axis=1)
        if t.ndim != 1 : t = np.argmax(t, axis=1)

        accuracy = np.sum(pred == t) / float(x.shape[0])
        return accuracy
    
    
    def numerical_gradient(self, x, t):
            """勾配を求める（数値微分）
            Parameters
            ----------
            x : 入力データ
            t : 教師ラベル

            Returns
            -------
            各層の勾配を持ったディクショナリ変数
            """
            loss_W = lambda W: self.loss(x, t)

            grads = {}
            grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
            grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
            grads['U'] = numerical_gradient(loss_W, self.params['U'])

            return grads
        
    
    def gradient(self, x, t):
        """勾配を求める（誤差逆伝搬法）
        Parameters
        ----------
        x : 入力データ
        t : 教師ラベル
        Returns
        -------
        各層の勾配を持ったディクショナリ変数
        """
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.last_layer.backward(dout)

        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        # 設定
        grads = {}
        grads['W1'] = self.layers['RNN'].dW
        grads['W2'] = self.layers['Affine'].dW
        grads['U'] = self.layers['RNN'].dU

        return grads

In [53]:
def convert_onehot(y):
    y1 = y.copy()
    y1[y==0]=1
    y1[y==1]=0
    return np.array([y,y1]).T

In [65]:
from common.optimizer import Adam, SGD

t_train = convert_onehot(y_train) #onehot
t_test = convert_onehot(y_test) #onehot

network = MultiLayerNet(input_size=128, hidden_size_list=[32], output_size=2)
optimizer=Adam(lr=0.005)


train_size = x_train.shape[0]
test_size = x_test.shape[0]

batch_size = 1000

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

iteration = 100
for i in range(iteration):

    batch_mask_train = np.random.choice(train_size, batch_size)
    x_batch_train = x_train[batch_mask_train]
    t_batch_train = t_train[batch_mask_train]
    
    # 勾配
    grads = network.gradient(x_batch_train, t_batch_train)
    optimizer.update(network.params, grads)
    
    
#     # 更新
#     for key in ('W1', 'W2', 'U'):
#         network.params[key] -= learning_rate * grads[key]

    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        loss_train = network.loss(x_train, t_train)
        loss_test = network.loss(x_test, t_test)
        #train_acc_list.append(train_acc)
        #test_acc_list.append(test_acc)
        print("train_loss：",loss_train,"train_acc：",train_acc,"test_loss：",loss_test,"test_acc：",test_acc)

train_loss： 0.8139920403510761 train_acc： 0.5065 test_loss： 0.8168852435195034 test_acc： 0.5046
train_loss： 0.6722510805251091 train_acc： 0.5863 test_loss： 0.6792146453038588 test_acc： 0.5618
train_loss： 0.6780843111785009 train_acc： 0.5699 test_loss： 0.6869311588152331 test_acc： 0.5391
train_loss： 0.6713839766901157 train_acc： 0.5795 test_loss： 0.6909670273799128 test_acc： 0.5382
train_loss： 0.6506039664298588 train_acc： 0.6116 test_loss： 0.6799258460800051 test_acc： 0.5659
train_loss： 0.6455638532339809 train_acc： 0.6225 test_loss： 0.6772402524121597 test_acc： 0.5735
train_loss： 0.6497818934386594 train_acc： 0.6159 test_loss： 0.6910614726001247 test_acc： 0.5565
train_loss： 0.6366044452765692 train_acc： 0.633 test_loss： 0.6775841554086771 test_acc： 0.5758
train_loss： 0.6467760469764591 train_acc： 0.6164 test_loss： 0.6826727444833941 test_acc： 0.569
train_loss： 0.6188319703277056 train_acc： 0.6543 test_loss： 0.6608198261412277 test_acc： 0.6132


## 勾配チェック

In [66]:
def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)
    
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val # 値を元に戻す
        it.iternext()   
        
    return grad

In [70]:
network = MultiLayerNet(input_size=128, hidden_size_list=[32], output_size=2)
optimizer=Adam()

x_batch = x_train[:3]
t_batch = t_train[:3]

grad_numerical = network.numerical_gradient(x_batch, t_batch)
grad_backprop = network.gradient(x_batch, t_batch)
optimizer.update(network.params, grads)

for key in grad_numerical.keys():
    diff = np.average( np.abs(grad_backprop[key] - grad_numerical[key]) )
    print(key + ":" + str(diff))

W1:0.010221332142711263
W2:4.0218106116323866e-09
U:0.002551621590229474


### BPTTですべてループさせた場合

最初の時系列（単語）まで遡って勾配爆発、消失が起きるか確認したが、
ロスや正答率に変化はみられなかった。

In [68]:
from common.optimizer import Adam, SGD

t_train = convert_onehot(y_train) #onehot
t_test = convert_onehot(y_test) #onehot

network = MultiLayerNet(input_size=128, hidden_size_list=[32], output_size=2)
optimizer=Adam(lr=0.002)

iters_num = 100
train_size = x_train.shape[0]
test_size = x_test.shape[0]

batch_size = 1000

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):

    batch_mask_train = np.random.choice(train_size, batch_size)
    x_batch_train = x_train[batch_mask_train]
    t_batch_train = t_train[batch_mask_train]
    
    # 勾配
    grads = network.gradient(x_batch_train, t_batch_train)
    optimizer.update(network.params, grads)

    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        loss_train = network.loss(x_train, t_train)
        loss_test = network.loss(x_test, t_test)
        print("train_loss：",loss_train,"train_acc：",train_acc,"test_loss：",loss_test,"test_acc：",test_acc)

train_loss： 0.6984967825351835 train_acc： 0.505 test_loss： 0.6989442399108108 test_acc： 0.4971
train_loss： 0.6766165253785991 train_acc： 0.5699 test_loss： 0.6790924066496276 test_acc： 0.5703
train_loss： 0.6832224798699371 train_acc： 0.5748 test_loss： 0.6895291441466357 test_acc： 0.5767
train_loss： 0.6733013636880936 train_acc： 0.5845 test_loss： 0.6883546607438253 test_acc： 0.5347
train_loss： 0.6479155156152586 train_acc： 0.6278 test_loss： 0.6673016237179252 test_acc： 0.5832
train_loss： 0.644868876445985 train_acc： 0.6261 test_loss： 0.6716900961771798 test_acc： 0.5792
train_loss： 0.6279058093227331 train_acc： 0.647 test_loss： 0.6597144767566026 test_acc： 0.6033
train_loss： 0.6238428352705152 train_acc： 0.6498 test_loss： 0.6665348683067512 test_acc： 0.5934
train_loss： 0.6183937558690431 train_acc： 0.6587 test_loss： 0.6616180374014161 test_acc： 0.6136
train_loss： 0.6213556691457708 train_acc： 0.6533 test_loss： 0.6650325964289392 test_acc： 0.5974
