# Seq2Seq
實現基礎版的Seq2Seq，輸入一個英文單字，模型將輸出一個對字母排序後的單字
<br>輸入 : hello
<br>輸出 : ehllo

其實在training階段採用teacher forcing把target中的真實值當作t時刻的輸入，代表「t時刻的輸入為真實值」的「機率為1」<br>
在inference階段把t-1時刻的輸出經過embedding層作為t時刻的輸入，代表「t時刻的輸入為t-1時刻的輸出經過embedding層」的「機率為1」<br>
這種在training階段採用teacher forcing的方式其實就是Scheduled Sampling的special case<br>
<br>
在本程式與「3_scheduled_sampling_不使用指令_version_2.ipynb」這支程式架構相同，只有coin tossing的調整方式不太相同<br>
在training階段coin tossing為[True , True , True , True , True , True , True , True]<br>
在inference階段coin tossing為[Faslse , Faslse , Faslse , Faslse , Faslse , Faslse , Faslse , Faslse]<br>

<img src="4_teacher_forcing_不使用指令.jpg" style="width:1240px;height:600px;float:middle">
以上為Decoder端的計算流程

In [1]:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import time
import copy
import tensorflow as tf
from tensorflow.python.layers.core import Dense

# 讀取數據

In [2]:
f = open('data/letters_source.txt', 'r', encoding = 'utf-8')
source_data = f.read()

f = open('data/letters_target.txt', 'r', encoding = 'utf-8')
target_data = f.read()

# 數據預處理

In [3]:
def extract_character_vocab(data):
    
    special_words = ['<PAD>' , '<UNK>' , '<GO>' , '<EOS>']
    
    words = []
    for line in data.split('\n'):
        for character in line:
            if character not in words:
                words.append(character)

    # 將四個特殊字加入詞庫       
    int_to_vocab = {idx: word for idx , word in enumerate(special_words + words)}
    vocab_to_int = dict(zip(int_to_vocab.values() , int_to_vocab.keys()))

    return int_to_vocab, vocab_to_int

In [4]:
source_int_to_letter , source_letter_to_int = extract_character_vocab(source_data)
target_int_to_letter , target_letter_to_int = extract_character_vocab(target_data)

# 將所有字母轉換成index
source_int = []
for line in source_data.split('\n'):
    temp = []
    for letter in line:
        temp.append(source_letter_to_int[letter])
    source_int.append(temp)    
        
target_int = []
for line in target_data.split('\n'):
    temp = []
    for letter in line:
        temp.append(target_letter_to_int[letter])
    temp = temp + [target_letter_to_int['<EOS>']]
    target_int.append(temp)   

In [5]:
# 決定source_int與target_int中的最大長度
# 因為後面的decoder的rnn不是使用tf.nn.dynamic_rnn，無法使用動態長度的功能，所以在這裡就要決定每個batch的長度
source_max_length , target_max_length = 0 , 0  
for vob_source , vob_target in zip(source_int , target_int):
    if len(vob_source) > source_max_length:
        source_max_length = len(vob_source)    
    if len(vob_target) > target_max_length:
        target_max_length = len(vob_target)  

# 分別對source_int與target_int_pad 補source_letter_to_int['<PAD>']與target_letter_to_int['<PAD>']到最大長度  
source_int_pad , target_int_pad = [] , []
for i_source , j_target in zip(range(len(source_int)) , range(len(target_int))):
    temp_source = source_int[i_source].copy()
    while len(temp_source) < source_max_length:
        temp_source.append(source_letter_to_int['<PAD>']) 
    source_int_pad.append(temp_source)
    
    temp_target = target_int[j_target].copy()
    while len(temp_target) < target_max_length:
        temp_target.append(target_letter_to_int['<PAD>']) 
    target_int_pad.append(temp_target)       

source_int_pad = np.array(source_int_pad)
target_int_pad = np.array(target_int_pad)     

In [6]:
# 超參數
# Number of Epochs
epochs = 100
# Batch Size
batch_size = 128
# RNN Size
rnn_hidden_unit = 50
# Number of Layers
num_layers = 1
# Embedding Size
encoding_embedding_size = 15
decoding_embedding_size = rnn_hidden_unit
# Learning Rate
learning_rate = 0.001
source_vocab_size = len(source_int_to_letter)
target_vocab_size = len(target_int_to_letter)

# Build Model

## 輸入層

In [7]:
input_data = tf.placeholder(tf.int32, [None , source_max_length] , name = 'inputs')
targets = tf.placeholder(tf.int32, [None , target_max_length] , name = 'targets')
targets_onehot = tf.one_hot(tf.reshape(targets , [-1]) , depth = target_vocab_size)
lr = tf.placeholder(tf.float32 , name = 'learning_rate')

# 決定到底是"t-1階段的輸出"還是"target中的真實答案"，當作t階段的輸入
from_model_or_target = tf.placeholder(tf.bool , [target_max_length , ] , name = 'from_model_or_target')

## Encoder

需要對source數據進行embedding，再傳入Decoder中的RNN

In [8]:
# input_data: 輸入tensor
# rnn_hidden_unit: rnn隱層結點數量
# num_layers: rnn cell的層數
# source_sequence_length: source數據的序列長度
# source_vocab_size: source數據的詞庫大小
# encoding_embedding_size: embedding的向量維度

# Encoder embedding
'''
encoder_embed_input = tf.contrib.layers.embed_sequence(input_data , source_vocab_size , encoding_embedding_size) 
                                                  ⇕ 相當於
encoder_embeddings = tf.Variable(tf.random_uniform([source_vocab_size , encoding_embedding_size]))
encoder_embed_input = tf.nn.embedding_lookup(encoder_embeddings , input_data)

若懶得寫兩行程式可以直接用tf.contrib.layers.embed_sequence這個函數
介紹 : https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence
'''
encoder_embeddings = tf.Variable(tf.random_uniform([source_vocab_size , encoding_embedding_size]))
encoder_embed_input = tf.nn.embedding_lookup(encoder_embeddings , input_data)


def get_lstm_cell(rnn_hidden_unit):
    lstm_cell = tf.contrib.rnn.LSTMCell(rnn_hidden_unit, 
                                        initializer = tf.random_uniform_initializer(-0.1 , 0.1))
    return lstm_cell

with tf.variable_scope('encoder'):   
    encoder_cell = tf.contrib.rnn.MultiRNNCell([get_lstm_cell(rnn_hidden_unit) for _ in range(num_layers)])
    
    encoder_output, encoder_state = tf.nn.dynamic_rnn(encoder_cell , 
                                                      encoder_embed_input, 
                                                      dtype = tf.float32)

## Decoder

In [9]:
# 預處理後的decoder輸入
# 在batch中每一筆data最前面加上<GO>，並移除最後一個字，所以每一筆data的詞的數目並無改變

# cut掉最後一個字
# ending = tf.strided_slice(targets , [0, 0] , [batch_size, -1] , [1, 1]) # 等同於 ending = tf.identity(targets[: , 0:-1])
ending = tf.identity(targets[: , 0:-1])
decoder_input = tf.concat([tf.fill([batch_size, 1] , target_letter_to_int['<GO>']) , ending] , axis = 1)

In [10]:
# decoding_embedding_size: embedding的向量維度
# num_layers: rnn cell的層數
# rnn_size: RNN單元的隱層結點數量
# encoder_state: encoder端編碼的狀態向量
# decoder_input: decoder端輸入

# 1. Embedding
decoder_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
decoder_embed_input = tf.nn.embedding_lookup(decoder_embeddings , decoder_input)

with tf.variable_scope('decoder'):
    # 2. 建造Decoder中的RNN單元
    decoder_cell = tf.contrib.rnn.MultiRNNCell([get_lstm_cell(rnn_hidden_unit) for _ in range(num_layers)])
    state = encoder_state 
    outputs  = []
    for time_step in range(0 , target_max_length):    
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        weights = tf.get_variable(initializer = tf.truncated_normal([rnn_hidden_unit , target_vocab_size] , mean = 0.01 , stddev = 0.1) , name = 'weights_decoder')
        biases = tf.get_variable(initializer = tf.zeros([1 , target_vocab_size]) + 0.0001 , name = 'biases_decoder')

        if time_step == 0: 
            input_to_decoder = decoder_embed_input[: , 0 , :]
            mlstm_cell_output , state = decoder_cell(input_to_decoder , state)
            output = tf.matmul(mlstm_cell_output , weights) + biases
            outputs.append(output)
            
        # from_target()    
        # 在training階段所採用的是Teacher Forcing，所以coin tossing為[True , True , True , True , True , True , True , True]
        # t時刻的輸入永遠是t-1時刻的期望輸出，所以都是採用decoder_embed_input[: , time_step , :](target中的真實答案)輸入至當前時刻decoder_cell
        # 因此mlstm_cell_output的值都是來自於from_target()
        
        # form_model()
        # 在inference階段coin tossing變為[Faslse , Faslse , Faslse , Faslse , Faslse , Faslse , Faslse , Faslse]
        # 也就是t-1時刻的輸出應該是當作t時刻的輸入，就是上一個時刻decoder_cell的輸出
        # 經過貪婪演算法選取最大機率的對應詞，最後再經過embedding_lookup選取該詞的對應詞向量，再輸入至當前時刻decoder_cell          
        elif time_step > 0:
            def from_target():
                input_to_decoder = decoder_embed_input[: , time_step , :]
                h_output , h_state = decoder_cell(input_to_decoder , state)
                return h_output , h_state

            def form_model():
                input_to_decoder = tf.argmax(output , axis = 1)
                input_to_decoder = tf.nn.embedding_lookup(decoder_embeddings , input_to_decoder)
                h_output , h_state = decoder_cell(input_to_decoder , state)
                return h_output , h_state

            mlstm_cell_output , state = tf.cond(from_model_or_target[time_step] ,
                                                from_target ,
                                                form_model)

            output = tf.matmul(mlstm_cell_output , weights) + biases
            outputs.append(output)

In [11]:
outputs_ = tf.transpose(tf.convert_to_tensor(outputs) , [1 , 0 , 2])    
logits = tf.reshape(outputs_ , [-1 , target_vocab_size])

# predicting_logits與訓練無關，純粹只是要看結果
predicting_logits = tf.nn.softmax(logits)   
predicting_logits = tf.argmax(predicting_logits , axis = 1)
predicting_logits = tf.reshape(predicting_logits , [batch_size , -1] , name = 'predictions')

In [12]:
# Loss function
loss = tf.nn.softmax_cross_entropy_with_logits(labels = targets_onehot , logits = logits)
total_loss = tf.reduce_mean(loss)

# Optimizer
optimizer = tf.train.AdamOptimizer(lr)

# Gradient Clipping
gradients = optimizer.compute_gradients(total_loss)
capped_gradients = [(tf.clip_by_value(grad, -5. , 5.), var) for grad, var in gradients if grad is not None]
train_op = optimizer.apply_gradients(capped_gradients)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


In [13]:
# 將數據集分割為train和validation
train_source = source_int_pad[batch_size:]
train_target = target_int_pad[batch_size:]
# 留出一個batch進行驗證
valid_source = source_int_pad[:batch_size]
valid_target = target_int_pad[:batch_size]

## Training

In [14]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# 在training階段所採用的是Teacher Forcing，所以coin tossing為[True , True , True , True , True , True , True , True]
coin_tossing = np.random.choice(a = 2 , size = target_max_length , replace = True , p = [0 , 1])
coin_tossing = coin_tossing.astype(bool)
for epoch_i in range(0 , epochs):
    
    # 在每進行一個epoch前，把每個batch的index先決定出來
    batch_index = []
    temp = []
    count = 0 # 隨機決定index的開頭 
    while len(batch_index) <= 77:  # 1個batch裡只有77筆資料
        temp.append(count)
        count += 1
        if len(temp) == batch_size:
            batch_index.append(temp)
            temp = []
        if count == len(train_source):
            count = 0

    for batch_i in range(0 , 77):
        train_source_batch , train_target_batch =\
        train_source[batch_index[batch_i] , :] , train_target[batch_index[batch_i] , :] 
        
        _ , training_loss , predicting_logits_result =\
        sess.run([train_op, total_loss , predicting_logits] , 
                 feed_dict = {input_data : train_source_batch ,
                              targets : train_target_batch ,
                              from_model_or_target : coin_tossing ,
                              lr: learning_rate})
   
        if batch_i % 30 == 0: # 每隔30個輪查看一下結果
            validation_loss = sess.run(total_loss, 
                                       feed_dict = {input_data : valid_source ,
                                                    targets : valid_target ,
                                                    from_model_or_target : coin_tossing}) 

            print('Epoch : {}/{} \nBatch : {}/{} \nTraining Loss : {:.3f} \nValidation loss: {:.3f}'
                  .format(epoch_i , epochs , 
                          batch_i , len(train_source) // batch_size , 
                          training_loss , validation_loss))
            
            index = np.random.randint(batch_size)
            print('Source : {}'.format([source_int_to_letter[i] for i in train_source_batch[index]] ))
            print('Target : {}'.format([target_int_to_letter[i] for i in train_target_batch[index]] ))
            print('Predict : {}\n'.format([target_int_to_letter[i] for i in predicting_logits_result[index]] ))     
    
    
# 保存模型
saver = tf.train.Saver()
saver.save(sess , 'trained_model/save_net')
print('Model Trained and Saved')

Epoch : 0/100 
Batch : 0/77 
Training Loss : 3.422 
Validation loss: 3.366
Source : ['e', 'z', 'x', 'r', '<PAD>', '<PAD>', '<PAD>']
Target : ['e', 'r', 'x', 'z', '<EOS>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['n', 'n', 'n', 'n', 'n', 'n', 'y', 'y']

Epoch : 0/100 
Batch : 30/77 
Training Loss : 2.387 
Validation loss: 2.349
Source : ['e', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['e', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 0/100 
Batch : 60/77 
Training Loss : 1.944 
Validation loss: 1.985
Source : ['b', 'h', 'g', 'k', '<PAD>', '<PAD>', '<PAD>']
Target : ['b', 'g', 'h', 'k', '<EOS>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['c', 'c', '<EOS>', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 1/100 
Batch : 0/77 
Training Loss : 1.919 
Validation loss: 1.812
Source : ['f', 'r', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['f', 'r', '<EOS>',

Epoch : 10/100 
Batch : 30/77 
Training Loss : 0.589 
Validation loss: 0.580
Source : ['l', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['l', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['l', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 10/100 
Batch : 60/77 
Training Loss : 0.513 
Validation loss: 0.544
Source : ['u', 'v', 'y', 'd', 'r', 'z', '<PAD>']
Target : ['d', 'r', 'u', 'v', 'y', 'z', '<EOS>', '<PAD>']
Predict : ['f', 'l', 'v', 'v', 'y', 'z', '<EOS>', '<PAD>']

Epoch : 11/100 
Batch : 0/77 
Training Loss : 0.585 
Validation loss: 0.531
Source : ['w', 'r', 'q', 'd', '<PAD>', '<PAD>', '<PAD>']
Target : ['d', 'q', 'r', 'w', '<EOS>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['c', 'q', 'r', 'w', '<EOS>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 11/100 
Batch : 30/77 
Training Loss : 0.515 
Validation loss: 0.508
Source : ['i', 'h', 'f', 'o', 'o', '<PAD>', '<PAD>']
Target : ['f', 'h', 'i', 'o', 'o', '<EOS>', '<PAD>', '<

Epoch : 20/100 
Batch : 60/77 
Training Loss : 0.172 
Validation loss: 0.190
Source : ['e', 'c', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['c', 'e', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['c', 'e', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 21/100 
Batch : 0/77 
Training Loss : 0.199 
Validation loss: 0.185
Source : ['z', 's', 'u', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['s', 'u', 'z', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['s', 'u', 'z', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 21/100 
Batch : 30/77 
Training Loss : 0.179 
Validation loss: 0.179
Source : ['t', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['t', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['t', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 21/100 
Batch : 60/77 
Training Loss : 0.154 
Validation loss: 0.175
Source : ['b', 'z', 'e', 'p', 'j', 'v', '<PAD>']

Epoch : 31/100 
Batch : 0/77 
Training Loss : 0.095 
Validation loss: 0.095
Source : ['p', 'h', 'w', 'y', 'c', 'r', 'c']
Target : ['c', 'c', 'h', 'p', 'r', 'w', 'y', '<EOS>']
Predict : ['c', 'c', 'h', 'p', 'r', 'w', 'y', '<EOS>']

Epoch : 31/100 
Batch : 30/77 
Training Loss : 0.090 
Validation loss: 0.098
Source : ['p', 'p', 'k', 'v', '<PAD>', '<PAD>', '<PAD>']
Target : ['k', 'p', 'p', 'v', '<EOS>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['k', 'p', 'p', 'v', '<EOS>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 31/100 
Batch : 60/77 
Training Loss : 0.080 
Validation loss: 0.094
Source : ['l', 'o', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['l', 'o', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['l', 'o', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 32/100 
Batch : 0/77 
Training Loss : 0.089 
Validation loss: 0.089
Source : ['y', 'z', 'n', 's', '<PAD>', '<PAD>', '<PAD>']
Target : ['n', 's', 'y', 'z', '<EOS>', '<PAD>', '<PAD>', '<PAD>']
Predict : 

Epoch : 41/100 
Batch : 30/77 
Training Loss : 0.048 
Validation loss: 0.061
Source : ['q', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['q', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['q', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 41/100 
Batch : 60/77 
Training Loss : 0.046 
Validation loss: 0.061
Source : ['z', 'b', 'q', 'n', 'o', 'z', '<PAD>']
Target : ['b', 'n', 'o', 'q', 'z', 'z', '<EOS>', '<PAD>']
Predict : ['b', 'n', 'o', 'p', 'z', 'z', '<EOS>', '<PAD>']

Epoch : 42/100 
Batch : 0/77 
Training Loss : 0.051 
Validation loss: 0.060
Source : ['y', 'y', 'h', 'q', 'b', 'v', '<PAD>']
Target : ['b', 'h', 'q', 'v', 'y', 'y', '<EOS>', '<PAD>']
Predict : ['b', 'h', 'p', 'v', 'y', 'y', '<EOS>', '<PAD>']

Epoch : 42/100 
Batch : 30/77 
Training Loss : 0.045 
Validation loss: 0.058
Source : ['d', 'p', 'i', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['d', 'i', 'p', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
P

Epoch : 51/100 
Batch : 60/77 
Training Loss : 0.027 
Validation loss: 0.043
Source : ['f', 'x', 'm', 'p', '<PAD>', '<PAD>', '<PAD>']
Target : ['f', 'm', 'p', 'x', '<EOS>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['f', 'm', 'p', 'x', '<EOS>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 52/100 
Batch : 0/77 
Training Loss : 0.031 
Validation loss: 0.042
Source : ['j', 'c', 's', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['c', 'j', 's', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['c', 'j', 's', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 52/100 
Batch : 30/77 
Training Loss : 0.027 
Validation loss: 0.042
Source : ['h', 'd', 'n', 'z', 'f', 'x', '<PAD>']
Target : ['d', 'f', 'h', 'n', 'x', 'z', '<EOS>', '<PAD>']
Predict : ['d', 'f', 'h', 'n', 'x', 'z', '<EOS>', '<PAD>']

Epoch : 52/100 
Batch : 60/77 
Training Loss : 0.026 
Validation loss: 0.041
Source : ['k', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['k', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<P

Epoch : 62/100 
Batch : 0/77 
Training Loss : 0.020 
Validation loss: 0.033
Source : ['q', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['q', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['q', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 62/100 
Batch : 30/77 
Training Loss : 0.017 
Validation loss: 0.032
Source : ['o', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['o', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['o', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 62/100 
Batch : 60/77 
Training Loss : 0.017 
Validation loss: 0.032
Source : ['b', 'd', 'h', 'p', '<PAD>', '<PAD>', '<PAD>']
Target : ['b', 'd', 'h', 'p', '<EOS>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['b', 'd', 'h', 'p', '<EOS>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 63/100 
Batch : 0/77 
Training Loss : 0.019 
Validation loss: 0.032
Source : ['h', 's', 't', 'm', 'q', 'q', 'c']
Targ

Epoch : 72/100 
Batch : 30/77 
Training Loss : 0.010 
Validation loss: 0.026
Source : ['m', 'q', 's', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['m', 'q', 's', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['m', 'q', 's', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 72/100 
Batch : 60/77 
Training Loss : 0.011 
Validation loss: 0.028
Source : ['j', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['j', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['j', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 73/100 
Batch : 0/77 
Training Loss : 0.013 
Validation loss: 0.027
Source : ['p', 'r', 'u', 'h', 'c', 'j', '<PAD>']
Target : ['c', 'h', 'j', 'p', 'r', 'u', '<EOS>', '<PAD>']
Predict : ['c', 'h', 'j', 'p', 'r', 'u', '<EOS>', '<PAD>']

Epoch : 73/100 
Batch : 30/77 
Training Loss : 0.010 
Validation loss: 0.025
Source : ['w', 'r', 'o', 'n', 'q', 'l', 'a']
Target : ['a', 'l', 'n', 'o', 'q', 'r', 'w', '<EOS>

Epoch : 82/100 
Batch : 60/77 
Training Loss : 0.008 
Validation loss: 0.023
Source : ['y', 's', 'f', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['f', 's', 'y', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['f', 's', 'y', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 83/100 
Batch : 0/77 
Training Loss : 0.009 
Validation loss: 0.021
Source : ['w', 'x', 'x', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['w', 'x', 'x', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['w', 'x', 'x', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 83/100 
Batch : 30/77 
Training Loss : 0.007 
Validation loss: 0.021
Source : ['q', 'e', 't', 'm', 'u', 's', '<PAD>']
Target : ['e', 'm', 'q', 's', 't', 'u', '<EOS>', '<PAD>']
Predict : ['e', 'm', 'q', 's', 't', 'u', '<EOS>', '<PAD>']

Epoch : 83/100 
Batch : 60/77 
Training Loss : 0.007 
Validation loss: 0.022
Source : ['p', 'q', 's', 'r', 't', 'y', '<PAD>']
Target : ['p', 'q', 'r', 's', 't', 'y', '<EOS>', '<PAD>']
Predict : ['p

Epoch : 92/100 
Batch : 60/77 
Training Loss : 0.005 
Validation loss: 0.019
Source : ['p', 'x', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['p', 'x', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['p', 'x', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']

Epoch : 93/100 
Batch : 0/77 
Training Loss : 0.006 
Validation loss: 0.017
Source : ['d', 'a', 'v', 'o', 'k', 'o', 'c']
Target : ['a', 'c', 'd', 'k', 'o', 'o', 'v', '<EOS>']
Predict : ['a', 'c', 'd', 'k', 'o', 'o', 'v', '<EOS>']

Epoch : 93/100 
Batch : 30/77 
Training Loss : 0.005 
Validation loss: 0.018
Source : ['a', 'b', 'q', 'k', 'q', 'l', 'p']
Target : ['a', 'b', 'k', 'l', 'p', 'q', 'q', '<EOS>']
Predict : ['a', 'b', 'k', 'l', 'p', 'q', 'q', '<EOS>']

Epoch : 93/100 
Batch : 60/77 
Training Loss : 0.005 
Validation loss: 0.019
Source : ['f', 'i', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Target : ['f', 'i', '<EOS>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']
Predict : ['f', 'i', '<EOS>',

## Testing

In [16]:
import os    
sess = tf.Session()
new_saver = tf.train.import_meta_graph(os.path.join('trained_model/save_net.meta'))
new_saver.restore(sess, tf.train.latest_checkpoint(os.path.join('trained_model')))

graph = tf.get_default_graph()
input_data = graph.get_tensor_by_name('inputs:0')
targets = graph.get_tensor_by_name('targets:0')
predicting_logits = graph.get_tensor_by_name('predictions:0')
from_model_or_target = graph.get_tensor_by_name('on_train:0')

input_word = 'common'

test_source = [] 
for letter in input_word:
    if letter not in source_letter_to_int.keys():
        test_source.append(source_letter_to_int['<UNK>'])
    else:
        test_source.append(source_letter_to_int[letter])
        
# 輸入的句子的長度是固定source_max_length，所以補source_letter_to_int['<PAD>']到長度為source_max_length
while len(test_source) < source_max_length:
    test_source.append(source_letter_to_int['<PAD>'])
test_source = [test_source] * batch_size   
        
# test_target輸入的值可以隨便選，只要長度為target_max_length即可    
test_target = [0 for _ in range(0 , target_max_length)] 
test_target = [test_target] * batch_size

test_source = np.array(test_source)
test_target = np.array(test_target)
# 在inference階段coin tossing變為[Faslse , Faslse , Faslse , Faslse , Faslse , Faslse , Faslse , Faslse]
coin_tossing = np.random.choice(a = 2 , size = target_max_length , replace = True , p = [1 , 0])
coin_tossing = coin_tossing.astype(bool)
answer = sess.run(predicting_logits , feed_dict = {input_data : test_source ,
                                                   targets : test_target ,
                                                   from_model_or_target : coin_tossing})

answer = answer[0 , :]
answer_to_letter = []
for num in answer:
    answer_to_letter.append(target_int_to_letter[num])
print(answer_to_letter)     

INFO:tensorflow:Restoring parameters from trained_model\save_net
['c', 'm', 'm', 'n', 'o', 'o', '<EOS>', '<PAD>']
