## 第8章: ニューラルネット  
第6章で取り組んだニュース記事のカテゴリ分類を題材として，ニューラルネットワークでカテゴリ分類モデルを実装する．なお，この章ではPyTorch, TensorFlow, Chainerなどの機械学習プラットフォームを活用せよ．

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from functools import reduce

In [3]:
# 2. 事例の抽出
news_corpora = pd.read_csv('./Section_6/NewsAggregatorDataset/newsCorpora.csv',sep='\t',header=None)
news_corpora.columns = ['ID','TITLE','URL','PUBLISHER','CATEGORY','STORY','HOSTNAME','TIMESTAMP']
publisher = ['Reuters', 'Huffington Post', 'Businessweek', 'Contactmusic.com', 'Daily Mail']
ls_is_specified = [news_corpora.PUBLISHER == p for p in publisher]
is_specified =reduce(lambda a, b: a | b, ls_is_specified)
df = news_corpora[is_specified]
#  3. 並び替え
df = df.sample(frac=1) # 全てをサンプリングするので、並び替えと等価
# 4.保存
train_df, valid_test_df = train_test_split(df, test_size=0.2) # 8:2
valid_df, test_df = train_test_split(valid_test_df, test_size=0.5) # 8:1:1
# train_df.to_csv('./Section_8/train.txt', columns = ['CATEGORY','TITLE'], sep='\t',header=False, index=False)
# valid_df.to_csv('./Section_8/valid.txt', columns = ['CATEGORY','TITLE'], sep='\t',header=False, index=False)
# test_df.to_csv('./Section_8/test.txt', columns = ['CATEGORY','TITLE'], sep='\t',header=False, index=False)
#  事例数の確認
df['CATEGORY'].value_counts()

b    5627
e    5279
t    1524
m     910
Name: CATEGORY, dtype: int64

70. 単語ベクトルの和による特徴量  
問題50で構築した学習データ，検証データ，評価データを行列・ベクトルに変換したい．例えば，学習データについて，すべての事例xiの特徴ベクトルxiを並べた行列Xと，正解ラベルを並べた行列（ベクトル）Yを作成したい．  

In [4]:
import gensim
import numpy as np

In [5]:
train=train_df.loc[:,['CATEGORY','TITLE']].reset_index()
valid=valid_df.loc[:,['CATEGORY','TITLE']].reset_index()
test=test_df.loc[:,['CATEGORY','TITLE']].reset_index()

model = gensim.models.KeyedVectors.load_word2vec_format('./Section_7/GoogleNews-vectors-negative300.bin', binary=True)

d = {'b':0, 't':1, 'e':2, 'm':3}
y_train = train.loc[:,"CATEGORY"].replace(d)
y_train.to_csv('./Section_8/y_train.txt',header=False, index=False)
y_valid = valid.loc[:,"CATEGORY"].replace(d)
y_valid.to_csv('./Section_8/y_valid.txt',header=False, index=False)
y_test = test.loc[:,"CATEGORY"].replace(d)
y_test.to_csv('./Section_8/y_test.txt',header=False, index=False)

In [6]:
num_class = len(d)

In [7]:
y_train.head()

0    1
1    2
2    2
3    1
4    2
Name: CATEGORY, dtype: int64

In [8]:
def write_X(file_name, df):
    with open(file_name,'w') as f:
        for text in df.loc[:,"TITLE"]:
            vectors = []
            for word in text.split():
                if word in model.vocab:
                    vectors.append(model[word])
            if (len(vectors)==0):
                vector = np.zeros(300)
            else:
                vectors = np.array(vectors)
                vector = vectors.mean(axis=0)
            vector = vector.astype(np.str).tolist()
            output = ' '.join(vector)+'\n'
            f.write(output)
write_X('./Section_8/X_train.txt', train)
write_X('./Section_8/X_valid.txt', valid)
write_X('./Section_8/X_test.txt', test)

71. 単層ニューラルネットワークによる予測  
問題70で保存した行列を読み込み，学習データについて以下の計算を実行せよ．

In [9]:
import tensorflow as tf

In [10]:
X_train = np.loadtxt("./Section_8/X_train.txt", delimiter=" ",dtype=np.float32)
X_train_tensor = tf.data.Dataset.from_tensor_slices(X_train)

X_valid = np.loadtxt("./Section_8/X_valid.txt", delimiter=" ",dtype=np.float32)
X_valid_tensor = tf.data.Dataset.from_tensor_slices(X_valid)

In [11]:
X_train.shape

(10672, 300)

In [12]:
len(list(X_train_tensor))

10672

In [13]:
batch_size = 4
w_shape=[X_train.shape[1],batch_size]

#ReLUではなくsoftmaxだが思い出すためにHeの初期値を利用
he_init = tf.cast(tf.sqrt(2./(w_shape[0]*w_shape[1])), dtype=tf.float32)
W = tf.Variable(tf.random.truncated_normal(w_shape, stddev=he_init), dtype=tf.float32)
b = tf.Variable(tf.zeros([batch_size]), dtype=tf.float32)

@tf.function
def softmax(X):
    return tf.nn.softmax(tf.matmul(X,W) + b)


In [14]:
# for X in X_train_tensor.batch(1).take(1):
#     tf.print(softmax(X))
# for X in X_train_tensor.batch(4).take(1):
#     tf.print(softmax(X))

72. 損失と勾配の計算  
学習データの事例x1と事例集合x1,x2,x3,x4に対して，クロスエントロピー損失と，行列Wに対する勾配を計算せよ．なお，ある事例xiに対して損失は次式で計算される．  
li=−log[事例xiがyiに分類される確率]  
ただし，事例集合に対するクロスエントロピー損失は，その集合に含まれる各事例の損失の平均とする．

In [15]:
y_train = np.loadtxt("./Section_8/y_train.txt",dtype=int)
y_train_onehot = np.identity(num_class)[y_train]
y_train_tensor = tf.data.Dataset.from_tensor_slices(y_train_onehot)

y_valid = np.loadtxt("./Section_8/y_valid.txt",dtype=int)
y_valid_onehot = np.identity(num_class)[y_valid]
y_valid_tensor = tf.data.Dataset.from_tensor_slices(y_valid_onehot)

In [16]:
y_train.shape

(10672,)

In [17]:
y_train_onehot.shape

(10672, 4)

In [18]:
y_train_tensor

<TensorSliceDataset shapes: (4,), types: tf.float64>

In [19]:
len(list(y_train_tensor))

10672

In [20]:
@tf.function
def crossentropyLoss(X,y):
    return tf.nn.softmax_cross_entropy_with_logits(logits=tf.matmul(X,W)+b, labels=y, name=None)
#     >>print (crossentropyLoss(X_train[:1],y_train_onehot[:1]))

#     return tf.nn.sparse_softmax_cross_entropy_with_logits(logits=tf.matmul(X,W)+b, labels=y, name=None)
#     >> print (crossentropyLoss(X_train[:1],y_train[:1]))

In [21]:
print (crossentropyLoss(X_train[:1],y_train_onehot[:1]))
print (crossentropyLoss(X_train[:4],y_train_onehot[:4]))

tf.Tensor([1.3660189], shape=(1,), dtype=float32)
tf.Tensor([1.366019  1.3712292 1.3590018 1.3420788], shape=(4,), dtype=float32)


In [22]:
# 確認
ans=[]
for s,i in zip(softmax(X_train[:4]),y_train[:4]):
  ans.append(-np.log(s[i]))
print (ans)

[1.3660189, 1.3712293, 1.3590018, 1.3420787]


73. 確率的勾配降下法による学習  
確率的勾配降下法（SGD: Stochastic Gradient Descent）を用いて，行列Wを学習せよ．なお，学習は適当な基準で終了させればよい（例えば「100エポックで終了」など）

In [23]:
# https://note.nkmk.me/python-tensorflow-keras-basics/

In [24]:
# Layer を継承するパターン
# class LogisticRegression(tf.keras.layers.Layer):
#     def __init__(self):
#         super(LogisticRegression, self).__init__()
#     def build(self, list_size):#[input_size, layer1_size, layer2_size(output_size)]
#         self.list_layer=[]
#         # prev_size = list_size[0]
#         for next_size in list_size[1:]:
#             self.list_layer.append(tf.keras.layers.Dense(next_size))
            
# #         # layer，parameterを定義
# #         self.list_W = []
# #         self.list_b = []
# #         prev_size = list_size[0]
# #         for next_size in list_size[1:]:
# #             he_init = tf.cast(tf.sqrt(2./(prev_size*next_size)), dtype=tf.float32)
# #             W = tf.Variable(tf.random.truncated_normal([prev_size,next_size], stddev=he_init), dtype=tf.float32)
# #             b = tf.Variable(tf.zeros([next_size]), dtype=tf.float32)
# #             self.list_W.append(W)
# #             self.list_b.append(b)

#     def __call__(self, X):
#         # layerを接続
#         next_input = X
#         for layer in self.list_layer[:-1]:
#             next_input = tf.nn.relu(
#                 layer(next_input)
#             )
#         next_input = self.list_layer[:-1](next_input)
#         return self.list_layer[:-1](next_input)

In [25]:
class LogisticRegression(tf.keras.Model):
    def __init__(self, list_size):
        super(LogisticRegression, self).__init__()
        self.list_layer=[]
        # prev_size = list_size[0]
        for next_size in list_size[1:]:
            self.list_layer.append(tf.keras.layers.Dense(next_size))

    def __call__(self, X, training=False):# trainingはdroppout用
        # layerを接続
        next_input = X
        for layer in self.list_layer[:-1]:
            next_input = tf.nn.relu(
                layer(next_input)
            )
        return self.list_layer[-1](next_input)

In [45]:
class ModelInterface(object):
    def __init__(self,model,optimizer,loss_func,acc_metric,acc_func):
        self.model = model
        self.optimizer = optimizer
        self.loss_func = loss_func
        self.acc_metric = acc_metric
        self.acc_func = acc_func
#         self.model.compile() #必要ならコンパイルしとく
#         model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

        self.train_loss_list = []
        self.train_acc_list = []
        self.valid_loss_list = []
        self.valid_acc_list = []

    def train(self, epoch, batch_size, interval=2):
        for _ in range(epoch):
            for X, y in zip(
                X_train_tensor.batch(batch_size),
                y_train_tensor.batch(batch_size)
            ):
                loss=self.train_step(X, y)
            if _ % interval==0:
                print(self.model.list_layer[0].weights[0])
                self.train_acc_list.append(self.acc_metric.result())
                self.acc_metric_result.reset_states()

                pred = self.model(X_valid_tensor.batch(100).take(1), training=False)
                self.valid_loss_list.append(
                    self.loss_func(
                        y_valid_tensor.batch(100).take(1),
                        pred
                    )
                )
                self.acc_metric.apdate_state(y_valid_tensor.batch(100).take(1), pred)
                self.valid_acc_list.append(self.acc_metric.result())
                self.acc_metric_result.reset_states()
                    
        
    @tf.function
    def train_step(self, X, y):
        with tf.GradientTape() as tape:
            pred = self.model(X, training=True)
            loss = self.loss_func(y, pred)
        grd = tape.gradient(loss, self.model.trainable_weights)
        self.optimizer.apply_gradients(zip(grd,self.model.trainable_weights))
        self.acc_metric.update_state(y, pred)# グラフに追加
        return loss

    @tf.function
    def model_predict(self,X):
        return self.model.predict(X)
                
    def calc_accuracy(self, y, pred, from_logits=False):
        return self.acc_func(y, pred, from_logits)


In [46]:
list_size = [300, 4]
model_interface = ModelInterface(
    model = LogisticRegression(list_size),
    optimizer = tf.keras.optimizers.Adadelta(),
    loss_func = tf.keras.losses.categorical_crossentropy,
    acc_metric = tf.keras.metrics.CategoricalCrossentropy(from_logits=False),
    acc_func = tf.keras.losses.categorical_crossentropy
)

In [47]:
EPOCHS = 5
BATCH_SIZE = 4
model_interface.train(EPOCHS, BATCH_SIZE, interval=2)

<tf.Variable 'dense_5/kernel:0' shape=(300, 4) dtype=float32, numpy=
array([[-0.05689063,  0.0796574 , -0.11627828,  0.04311698],
       [-0.00302254,  0.10474101,  0.05542445, -0.0359542 ],
       [-0.05717737,  0.06750635,  0.00318012,  0.0460308 ],
       ...,
       [-0.08387233,  0.00767977, -0.09202517, -0.07229059],
       [-0.09180865, -0.00207216,  0.08086109,  0.06985685],
       [ 0.03778009,  0.07263648,  0.12457413,  0.12222157]],
      dtype=float32)>


AttributeError: 'ModelInterface' object has no attribute 'acc_metric_result'

74. 正解率の計測  
問題73で求めた行列を用いて学習データおよび評価データの事例を分類したとき，その正解率をそれぞれ求めよ．

In [30]:
# 正解率はmodel.evaluate を行うために必要なcompileの作法を後から知ったので飛ばす

In [None]:
model_interface.valid_loss_list

In [34]:
model_interface.valid_acc_list

[]

75. 損失と正解率のプロット  
問題73のコードを改変し，各エポックのパラメータ更新が完了するたびに，訓練データでの損失，正解率，検証データでの損失，正解率をグラフにプロットし，学習の進捗状況を確認できるようにせよ．

In [None]:
# Tensorboardで確認できる…はず

76. チェックポイント  
問題75のコードを改変し，各エポックのパラメータ更新が完了するたびに，チェックポイント（学習途中のパラメータ（重み行列など）の値や最適化アルゴリズムの内部状態）をファイルに書き出せ．

77. ミニバッチ化  
問題76のコードを改変し，B事例ごとに損失・勾配を計算し，行列Wの値を更新せよ（ミニバッチ化）．Bの値を1,2,4,8,…と変化させながら，1エポックの学習に要する時間を比較せよ．

78. GPU上での学習  
問題77のコードを改変し，GPU上で学習を実行せよ．

79. 多層ニューラルネットワーク  
問題78のコードを改変し，バイアス項の導入や多層化など，ニューラルネットワークの形状を変更しながら，高性能なカテゴリ分類器を構築せよ．