# BERT Abstractive Summarizer in CNN Daily Mail

## Text Summarization

![Text Summarization](https://blog.fpt-software.com/hs-fs/hubfs/image-8.png?width=376&name=image-8.png)

Text summarization คือ กระบวนการที่ใช้ในการสรุปข้อความยาวๆ ให้เป็นข้อความขนาดสั้นๆ ที่เข้าใจได้ง่าย และยังคงไว้ซึ่งสารที่ต้องการสื่อ

โดยทั่วไป การทำสรุปจะมีสองประเภท คือ การสรุปแบบ Extractive Summary และการสรุปแบบ Abstractive Summary

โดย Extractive Summary จะเป็นการเลือกประโยคที่มีใจความเด่นๆ ขึ้นมาทำเป็นสรุป
แต่ Abstractive Summary จะเป็นการเขียนข้อความขึ้นมาใหม่ให้สั้นและกระทัดรัด

## BERT Abstractive Summarizer

Bidirectional Encoder Representations from Transformers (BERT) คือ pre-trained language model ที่ไม่ยึดติดกับภาษาใดภาษาหนึ่ง ซึ่งสามารถนำไปใช้กับการประยุกต์ใช้การประมวลผลภาษาธรรมชาติต่างๆ เช่น ระบบตอบคำถาม หรือการทำสรุปใจความอัตโนมัติ ซึ่งถูกพัฒนาโดย Google และสามารถนำ pre-trained model ไปใช้ได้

ในตัวอย่างนี้จะทำการ download pre-trained BERT model ซึ่งทำงานบน Tensorflow framework มาใช้โดยจะแสดงขั้นตอนต่างๆ ดังนี้

1. Install and setup tools
2. Load BERT Model
3. Load CNN-Daily Mail data
4. Setup tokenization pipeline
5. Config Tensorflow session and Test prediction using BERT
6. Visualize output

![BERT High level architect](https://deeplearn.org/arxiv_files/1907.06226v2/MLM_LS.png)


## 1.Install tools

[Attention Visualizer](https://github.com/abisee/attn_vis) เป็นเครื่องมือที่ใช้ในการแสดงผล highlight

[BERT Abstractive Summary](https://github.com/raufer/bert-summarization) เป็น abstractive summarization ที่ทใช้ BERT เป็น pre-trained model

In [1]:
!git clone https://github.com/raufer/bert-summarization bert_sum 2>/dev/null
!git clone https://github.com/abisee/attn_vis 2>/dev/null

setup environment

In [2]:
import json
import os
import logging

In [3]:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf

# print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

In [4]:
import tensorflow_datasets as tfds
import tensorflow_hub as hub
from rouge import Rouge
import numpy as np
from IPython.display import IFrame

In [5]:
from data.load import pipeline
from ops.tokenization import tokenizer
from ops.session import initialize_vars
from ops.session import save_variable_specs
from models.abstractive_summarizer import AbstractiveSummarization
from models.abstractive_summarizer import eval
from models.abstractive_summarizer import train
from config import config

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore














In [6]:
logger = logging.getLogger()
logger.setLevel(logging.INFO)

tf.logging.set_verbosity(tf.logging.INFO) 
tf.enable_resource_variables()

## 2.Load BERT Model

In [7]:
model = AbstractiveSummarization(
    num_layers=config.NUM_LAYERS,
    d_model=config.D_MODEL,
    num_heads=config.NUM_HEADS,
    dff=config.D_FF,
    vocab_size=config.VOCAB_SIZE,
    input_seq_len=config.INPUT_SEQ_LEN,
    output_seq_len=config.OUTPUT_SEQ_LEN,
    rate=config.DROPOUT_RATE
)

INFO:root:Extracting pretrained word embeddings weights from BERT






























INFO:root:Embedding matrix shape '(30522, 768)'


## 3.Load CNN-DailyMail dataset using Tensorflow Datasets

ดลและทำการ split dataset เป็น train / test / validation

In [8]:
examples, metadata = tfds.load('cnn_dailymail', with_info=True, as_supervised=True)

train, val, test = examples['train'], examples['validation'], examples['test']

INFO:absl:No config specified, defaulting to first: cnn_dailymail/plain_text
INFO:absl:Overwrite dataset info from restored data version.
INFO:absl:Reusing dataset cnn_dailymail (/home/rattaphon/tensorflow_datasets/cnn_dailymail/plain_text/0.0.2)


In [9]:
# grab information regarding the number of examples
metadata = json.loads(metadata.as_json)

n_test_examples = int(metadata['splits'][0]['statistics']['numExamples'])
n_train_examples = int(metadata['splits'][1]['statistics']['numExamples'])
n_val_examples = int(metadata['splits'][2]['statistics']['numExamples'])

In [16]:
metadata

{'citation': '@article{DBLP:journals/corr/SeeLM17,\n  author    = {Abigail See and\n               Peter J. Liu and\n               Christopher D. Manning},\n  title     = {Get To The Point: Summarization with Pointer-Generator Networks},\n  journal   = {CoRR},\n  volume    = {abs/1704.04368},\n  year      = {2017},\n  url       = {http://arxiv.org/abs/1704.04368},\n  archivePrefix = {arXiv},\n  eprint    = {1704.04368},\n  timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},\n  biburl    = {https://dblp.org/rec/bib/journals/corr/SeeLM17},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n\n@inproceedings{hermann2015teaching,\n  title={Teaching machines to read and comprehend},\n  author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},\n  booktitle={Advances in neural information processing systems},\n  pages={1693--1701},\n  year={2015}\n}\n',
 'description': 'CNN/DailyMail n

## 4.Setup tokenization pipeline

สำหรับการตัดคำ

In [10]:
train_dataset = pipeline(train, tokenizer)
val_dataset = pipeline(val, tokenizer)
test_dataset = pipeline(test, tokenizer)

In [11]:
val_iterator = val_dataset.make_initializable_iterator()
train_iterator = train_dataset.make_initializable_iterator()

Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_initializable_iterator(dataset)`.


Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_initializable_iterator(dataset)`.


## 5.Config Tensorflow session and Test prediction using BERT

ทำการ setup Tensorflow graph 

In [12]:
# warm up tensorflow graph
val_stream = val_iterator.get_next()
xs, ys = val_stream[:3], val_stream[3:] 
y, y_hat, eval_loss, eval_summaries = eval(model, xs, ys)

INFO:root:Building Evaluation Graph


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore




INFO:root:Building: 'Greedy Draft Summary'
















  0%|          | 0/4 [00:00<?, ?it/s]

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

Instructions for updating:
Use `tf.cast` instead.
 25%|██▌       | 1/4 [00:04<00:13,  4.57s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 50%|█████     | 2/4 [00:06<00:07,  3.63s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 75%|███████▌  | 3/4 [00:07<00:03,  3.03s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

100%|██████████| 4/4 [00:09<00:00,  2.55s/it]




INFO:root:Building: 'Greedy Refined Summary'












  0%|          | 0/4 [00:00<?, ?it/s]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 25%|██▌       | 1/4 [00:04<00:14,  4.75s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 50%|█████     | 2/4 [00:09<00:09,  4.88s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 75%|███████▌  | 3/4 [00:14<00:04,  4.90s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

100%|██████████| 4/4 [00:19<00:00,  4.93s/it]








Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    


Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    




















predict โดยใช้ eval ซึ่งจะมีการทำการ decode (แปลง index token เป็น text token) มาให้พร้อมใช้งาน

In [13]:
# Setting TF graph    
saver = tf.train.Saver(max_to_keep=config.NUM_EPOCHS)
config_tf = tf.ConfigProto(allow_soft_placement=True)

with tf.Session() as sess:

    ckpt = tf.train.latest_checkpoint(config.CHECKPOINTDIR)

    rouge = Rouge()

    if ckpt is None:
        logging.info("Initializing from scratch")
        sess.run(tf.global_variables_initializer())
        save_variable_specs(os.path.join(config.LOGDIR, "specs"))
    else:
        saver.restore(sess, ckpt)   
    
    initialize_vars(sess)
        
    # fetch data
    val_stream = val_iterator.get_next()
    xs, ys = val_stream[:3], val_stream[3:] 

    # evaluate
    y, y_hat, eval_loss, eval_summaries = eval(model, xs, ys)
    
    sess.run(val_iterator.initializer)
    
    _y, _y_hat = sess.run([y, y_hat])

    # decode id to tokens and article
    y_str = ' '.join(tokenizer.convert_ids_to_tokens(list(_y[0])))
    y_hat_str = ' '.join(tokenizer.convert_ids_to_tokens(list(_y_hat[0])))


INFO:tensorflow:Restoring parameters from checkpoint2/abstractive_summarization_2019_final-0


INFO:tensorflow:Restoring parameters from checkpoint2/abstractive_summarization_2019_final-0
INFO:root:Building Evaluation Graph


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore




INFO:root:Building: 'Greedy Draft Summary'












  0%|          | 0/4 [00:00<?, ?it/s]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 25%|██▌       | 1/4 [00:01<00:04,  1.45s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 50%|█████     | 2/4 [00:02<00:02,  1.45s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 75%|███████▌  | 3/4 [00:04<00:01,  1.44s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

100%|██████████| 4/4 [00:05<00:00,  1.45s/it]




INFO:root:Building: 'Greedy Refined Summary'












  0%|          | 0/4 [00:00<?, ?it/s]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 25%|██▌       | 1/4 [00:05<00:17,  6.00s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 50%|█████     | 2/4 [00:11<00:11,  5.97s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 75%|███████▌  | 3/4 [00:18<00:06,  6.13s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

100%|██████████| 4/4 [00:24<00:00,  6.24s/it]


ใช้ model.predict เพื่อดึงค่า attention distribution ไปใช้งานต่อ

In [14]:
# fetch data
val_stream = val_iterator.get_next()
xs, ys = val_stream[:3], val_stream[3:] 

# logits_draft_summary, preds_draft_summary, draft_attention_dist, logits_refined_summary, preds_refined_summary, refined_attention_dist = model.predict(xs)

In [15]:
# warm up tensorflow graph
saver = tf.train.Saver(max_to_keep=config.NUM_EPOCHS)
config_tf = tf.ConfigProto(allow_soft_placement=True)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    initialize_vars(sess)
        
    # fetch data
    val_stream = val_iterator.get_next()
    xs, ys = val_stream[:3], val_stream[3:] 

    # evaluate
    logits_draft_summary, preds_draft_summary, draft_attention_dist, logits_refined_summary, preds_refined_summary, refined_attention_dist = model.predict(xs)
    
    sess.run(val_iterator.initializer)
    
    _logits_draft_summary, _logits_refined_summary, _refined_attention_dist = sess.run([logits_draft_summary, logits_refined_summary, refined_attention_dist])
    attention_dist = _refined_attention_dist[-1]["decoder_layer8_block2"]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:root:Building: 'Greedy Draft Summary'
  0%|          | 0/4 [00:00<?, ?it/s]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 25%|██▌       | 1/4 [00:01<00:04,  1.52s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 50%|█████     | 2/4 [00:03<00:03,  1.59s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 75%|███████▌  | 3/4 [00:04<00:01,  1.59s/it]

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

100%|██████████| 4/4 [00:06<00:00,  1.65s/it]
INFO:root:Building: 'Greedy Refined Summary'
  0%|          | 0/4 [00:00<?, ?it/s]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 25%|██▌       | 1/4 [00:07<00:21,  7.25s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 50%|█████     | 2/4 [00:14<00:14,  7.24s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

 75%|███████▌  | 3/4 [00:22<00:07,  7.41s/it]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        
        seq_len = tf.shape(x)[1]
        attention_weights = {}

#         if not input_alreay_embedded:
#             x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
            
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            
#             dv = f"/device:GPU:{str(next(selector))}"
#             print(f"With device )
#             with tf.device():
                
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

This may 

100%|██████████| 4/4 [00:29<00:00,  7.44s/it]


## Expected Output 

In [17]:
y_hat_lst = tokenizer.convert_ids_to_tokens(list(_y_hat[0]))

In [18]:
attention_dist = _refined_attention_dist[-1]["decoder_layer8_block2"].squeeze(axis=0).squeeze(axis=0)

In [23]:
att_viz_json = {
    "article_lst": "", # Xs
    "p_gens": [[x*0.0] for x in range(len(y_hat_lst))], # BERT did not directly emit them
    "decoded_lst": y_hat_lst,
    "abstract_str": y_str, 
    "attn_dists": attention_dist.tolist()
}
with open('./attn_vis/attn_vis_data.json', 'w') as outfile:
    json.dump(att_viz_json, outfile)

In [24]:
y_hat_lst

['[CLS]', 'prevalent', 'grinned', 'prevalent', 'prevalent']

In [None]:
IFrame(src='http://localhost:8000', width=900, height=400)

In [None]:
x_rnd = convert_idx_to_token_tensor(xs[0][:])