## Tensorflow
- `Tensorflow`가 예전에는 쓰기 어려운 모델이었음 (코딩할 줄 아는 사람들만 사용)
- 그래서 `pytorch`가 많이 쓰이다 보니, `Tensorflow`에서도 쉽게 사용할 수 있는 `Keras` 만듦
- `Tensorflow 2.0`에서는 `keras`와 합쳐진 `tf.keras.Model`이나 `Sequential` 많이 사용
- `Tensorflow`에서 train step, test step을 사용하는 class 구조는 `pytorch lightening`과 비슷
  - `pytorch lightening` : `pytorch`를 더 쉽게 사용하기 위한 library
- Optimizer : Tensorflow addon
  - https://www.tensorflow.org/addons/overview?hl=ko
  - https://github.com/tensorflow/addons
    - 여러 tensorflow 개발자들이 다양한 optimizer 구현 코드 업로드

## Hydra
- Facebook에서 제공하는 범용적인 configuration management tool
- 'hydra config' 검색 : https://hydra.cc/docs/intro/
- Omegaconf library를 기반으로 만들어짐
  - Documentation : https://omegaconf.readthedocs.io/en/2.1_branch/
  - https://github.com/omry/omegaconf : 이 개발자가 facebook에 가서 만든 게 hydra
  - MLP 같은 작은 모델은 `__init__(self, input_dim: int, h1_dim: int, h2_dim: int, out_dim: int)` 이런 식으로 써도 되지만, 모델이 커질수록 `init` 안에 들어가야 할 *argument 많아지고 관리가 어려워짐
    - 그래서 configuration tool을 이용해서 관리하는 것이 권장
    - tensorflow에 hyperparameter도 같은 역할이고 이건 tensorflow와 연동이 되는 장점이 있지만 wandB 등 다른 tool과 연동이 잘 안 되는 단점
    - omegaconf가 structure 관리에도 유리

## Efficient Network
- https://www.tensorflow.org/api_docs/python/tf/keras/applications/efficientnet

## Tensorflow Text Generation
- https://www.tensorflow.org/text/tutorials/nmt_with_attention

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os
import sys
sys.path.append('/content/drive/MyDrive/#fastcampus')
drive_project_root = '/content/drive/MyDrive/#fastcampus'
!pip install -r '/content/drive/MyDrive/#fastcampus/requirements.txt'

In [None]:
# pip install tensorflow_addons

In [None]:
# pip install wandb

In [None]:
# pip install omegaconf

In [None]:
from typing import Optional, List, Dict, Tuple

import io
import re
import unicodedata
import time
from datetime import datetime
import random

import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
import matplotlib.pyplot as ticker
from omegaconf import OmegaConf, DictConfig    # DictConfig is for time checking

import hydra
from hydra.core.config_store import ConfigStore

import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer

import wandb

In [None]:
from config_utils_tf import flatten_dict, register_config, get_optimizer_element, get_callbacks

GPU 확인

In [None]:
tf.config.list_physical_devices()

In [None]:
!nvidia-smi

https://www.tensorflow.org/</br>
- https://www.tensorflow.org/overview/?hl=ko</br>
- 튜토리얼 : https://www.tensorflow.org/tutorials?hl=ko
- API > Tensorflow : 각 함수에 대한 설명
  - 구글에 'Tensorflow API 한글' 검색하면 번역본도 볼 수 있음

초보자용 vs 전문가용
- 수업에서는 전문가용으로 할 예정
- 초보자용에서 사용하는 Sequential 버전(순차적으로 build 하는 방법)에는 한계가 있기 때문
- 실제 현업/연구에서는 Sequential 거의 안 씀

## define gpu
- https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy
- This strategy is typically used for training on one machine with multiple GPUs.
- 아래 코드 결과 보면 GPU 0번 잡아서 가져옴

In [None]:
# mirrored_strategy = tf.distribute.MirroredStrategy()

## Data download

- code is from : https://www.tensorflow.org/text/tutorials/nmt_with_attention

In [None]:
data_root = os.path.join(drive_project_root, "data", "anki_spa_eng")
# data_root = os.path.join("/drive/MyDrive/#fastcampus", "data", "anki_spa_eng")

if not os.path.exists(data_root):
    # os.mkdir(data_root)
    os.makedirs(data_root)
    
data_path = os.path.join(data_root, "spa-eng.zip")

In [None]:
path_to_zip = tf.keras.utils.get_file(
    data_path,
    # 데이터 다운로드 받아진 거 보면 왼쪽은 영어 오른쪽은 스페인어
    origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip',
    extract=True,
    cache_dir = data_root  # Once you download the data, this would block to download the same data again
)

path_to_file = os.path.join(
    os.path.dirname(path_to_zip),
    "datasets",
    "spa-eng",
    "spa.txt",
)

print(path_to_file)

## Preprocessing
- Optional(int) : input integer or None

In [None]:
def unicode_to_ascii(s):
    # unicode normalize with NFD mode -> return list -> join : convert list to string
    # mn : https://www.compart.com/en/unicode/category/Mn
    return "".join(c for c in unicodedata.normalize("NFD", s) if unicodedata.category(c) != "Mn")

def preprocess_sentence(w):
    # convert it to ascii 
    w = unicode_to_ascii(w.lower().strip())

    # make space between words and punctuation
    w = re.sub(r"([?.!,¿])", r" \1 ", w)
    w = re.sub(r'[" "]+', " ", w)
    
    # replace every thing to space without "a-z, A-Z, [.?!,¿]"
    w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)
    w = w.strip()

    # add "start" and "end" token at the front and back of the model
    w = "<start> " + w + " <end>"  # warning! SPACE!!
    return w

def create_dataset(path: str, num_examples: Optional[int]=None):
    lines = io.open(path, encoding="UTF-8").read().strip().split("\n")

    # 데이터에 tab(\t)으로 나뉘어 영어-스페인어가 pair식으로 있었음
    word_pairs = [[preprocess_sentence(w) for w in l.split("\t")] for l in lines[:num_examples]]

    return zip(*word_pairs)

en, sp = create_dataset(path_to_file)
print(en[-1])
print(sp[-1])

In [None]:
# Define Tokenizer & Final use data

def tokenize(lang):
    lang_tokenizer = Tokenizer(filters="")
    lang_tokenizer.fit_on_texts(lang)

    # we cannot put string into model (neither pytorch nor tensorflow)
    # we should convert it to int or float type
    tensor = lang_tokenizer.texts_to_sequences(lang)
    tensor = pad_sequences(tensor, padding="post")

    return tensor, lang_tokenizer

def load_dataset(path, num_examples=None):
    tar_lang, src_lang = create_dataset(path, num_examples)  # en, sp

    src_tensor, src_tokenizer = tokenize(src_lang)
    tar_tensor, tar_tokenizer = tokenize(tar_lang)

    return src_tensor, tar_tensor, src_tokenizer, tar_tokenizer

# call language dataset
num_examples = 30000
src_tensor, tar_tensor, src_tokenizer, tar_tokenizer = load_dataset(
    path_to_file, num_examples
)

max_tar_len, max_src_len = tar_tensor.shape[1], src_tensor.shape[1]

src_vocab_size = len(src_tokenizer.word_index) + 1
tar_vocab_size = len(tar_tokenizer.word_index) + 1

print(src_vocab_size, tar_vocab_size)

check
- 1 : start
- 2 : end
- 0 : padding

In [None]:
print(tar_tensor[-1])
print(tar_tokenizer.word_index)

In [None]:
for i in tar_tensor[-1]:
    if i == 0:
        break
    print(tar_tokenizer.index_word[i])

## Define Model

In [None]:
class GRUEncoder(tf.keras.Model):
    def __init__(self, cfg: DictConfig):
        super().__init__()
        self.cfg = cfg
        self.enc_emb = tf.keras.layers.Embedding(
            cfg.data.src.vocab_size,
            cfg.model.enc.embed_size
        )
        self.enc_gru = tf.keras.layers.GRU(
            cfg.model.enc.rnn.units,
            return_state=True,
            return_sequences=True,
            recurrent_initializer="glorot_uniform"
        )
    
    # state : RNN's state
    def call(self, src_tokens, state=None, training=False):
        embed_enc = self.enc_emb(src_tokens)
        enc_outputs, enc_states = self.enc_gru(
            embed_enc, initial_state=state
        )
        return enc_outputs, enc_states

class GRUDecoder(tf.keras.Model):
    def __init__(self, cfg: DictConfig):
        super().__init__()
        self.cfg = cfg
        self.dec_emb = tf.keras.layers.Embedding(
            cfg.data.tar.vocab_size,
            cfg.model.dec.embed_size
        )
        self.dec_gru = tf.keras.layers.GRU(
            cfg.model.dec.rnn.units,
            return_state=True,
            return_sequences=True,
            recurrent_initializer="glorot_uniform"
        )
        self.fc = tf.keras.layers.Dense(cfg.data.tar.vocab_size)
    
    # state : RNN's state
    def call(self, tar_tokens, state=None, training=False):
        embed_dec = self.dec_emb(tar_tokens)
        dec_outputs, dec_states = self.dec_gru(
            embed_dec, initial_state=state
        )
        final_outputs = self.fc(dec_outputs)
        return final_outputs, dec_states, None   # None은 추후 attention 등 추가 시 interface 통일 위함

## Attention 모델 정의

In [None]:
# attention은 decoder에 붙이는 것이기 때문에 layer로 정의
class BahdanauAttention(tf.keras.layers.Layer):
    def __init__(self, cfg:DictConfig):
        super().__init__()
        self.cfg = cfg

        # needs 3 different fully connected layers
        self.fc1 = tf.keras.layers.Dense(cfg.model.attention.latent_dim)   # fully connected for key
        self.fc2 = tf.keras.layers.Dense(cfg.model.attention.latent_dim)   # fully connected for query
        self.fc_score = tf.keras.layers.Dense(1)      # fully connected for weighted score

    def call(self, query, value):

        # querry = decoder hidden, value = encoder output
        # querry : hidden이기 때문에 차원을 맞춰주기 위해 expand dims 사용
        query_with_time_axis = tf.expand_dims(query, 1)   # [batch, 1(length), hidden_dim] -> 보통 decoder에서 length는 1

        score = self.fc_score(
            tf.nn.tanh(
                self.fc1(query_with_time_axis) + self.fc2(value)  # score
            )
        )  # [batch, length, hidden_dim] -> [batch, length, 1]

        attention_weights = tf.nn.softmax(score, axis=1)  # [batch_size, length, 1]

        context_vector = attention_weights * value  # [batch_size, hidden]
        context_vector = tf.reduce_sum(context_vector, axis=1)

        return context_vector, attention_weights

class AttentionalGRUDecoder(tf.keras.Model):
    def __init__(self, cfg: DictConfig):
        super().__init__()
        self.cfg = cfg
        self.dec_emb = tf.keras.layers.Embedding(
            cfg.data.tar.vocab_size,
            cfg.model.dec.embed_size
        )
        self.dec_gru = tf.keras.layers.GRU(
            cfg.model.dec.rnn.units,
            return_state=True,
            return_sequences=True,
            recurrent_initializer="glorot_uniform"
        )
        self.attention = BahdanauAttention(cfg)
        self.fc = tf.keras.layers.Dense(cfg.data.tar.vocab_size)

    def call(self, tar_tokens, hidden, enc_output):
        # enc_output: [batch, length, hidden_dim]
        context_vector, attention_weights = self.attention(hidden, enc_output)
        
        x = self.dec_emb(tar_tokens)

        # embedding된 target token과 context vecotr를 concat으로 합치기
        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

        # 이 때는 이미 결합이 되었었기 때문에 attention에서 반영되었다고 보고 굳이 state 넣지 않음
        dec_outputs, dec_states = self.dec_gru(x)

        # [batch_size * 1, embedding_dim + hidden_dim]
        dec_outputs = tf.reshape(dec_outputs, (-1, dec_outputs.shape[2]))

        # [batch_size, vocab_size]
        final_outputs = self.fc(dec_outputs)

        return final_outputs, dec_states, attention_weights

## Configuration 정의

### data configuration

In [None]:
data_anki_spa_eng_cfg ={
    "name": "anki_spa_eng_cfg",
    "src":{
        "vocab_size": src_vocab_size,
        "max_len": max_src_len,
    },
    "tar":{
        "vocab_size": tar_vocab_size,
        "max_len": max_tar_len,
    },
    "train_val_test_split_ratio": [0.8, 0.1, 0.1],
    "train_val_shuffle": True,
}

### model configuration

In [None]:
model_translate_rnn_seq2seq_cfg = {
    "name": "RNNSeq2Seq",
    "enc": {
        "embed_size": 256,
        "rnn": {
            "units": 1024,
        }
    },
    "dec": {
        "embed_size": 256,
        "rnn": {
            "units": 1024,
        }
    }
}

model_translate_attention_based_seq2seq_cfg = {
    "name": "AttentionBasedSeq2Seq",
    "enc": {
        "embed_size": 256,
        "rnn": {
            "units": 1024,
        }
    },
    "dec": {
        "embed_size": 256,
        "rnn": {
            "units": 1024,
        }
    },
    "attention": {
        "latent_dim" : 1024,
    }
}

### optimizer configuration

In [None]:
adam_warmup_lr_sch_opt_cfg = {
    "optimizer": {
        "name": "Adam",
        "other_kwargs": {},
    },
    "lr_scheduler": {
        "name": "LinearWarmupLRSchedule",
        "kwargs": {
            "lr_peak": 1e-3,
            "warmup_end_steps": 1500,
        }
    }
}

# RAdam은 scheduler 필요 없었음
radam_no_lr_sch_opt_cfg = {
    "optimizer": {
        "name": "RectifiedAdam",
        "learning_rate": 1e-3,
        "other_kwargs": {},
    },
    "lr_scheduler": None
}

# train_cfg
train_cfg: dict = {
    "train_batch_size": 128,
    "val_batch_size": 32,
    "test_batch_size": 32,
    "max_epochs": 50,
    "distribute_strategy": "MirroredStrategy",   # colab(notebook)이 아니고 다른 server에서 하면 다른 strategy 필요
    # teacher_forcing : 처음에 적용했다가 없애주는 게 가장 좋은 방법이라고 알려져 있음
    "teacher_forcing_ratio": 0.5,
}

_merged_cfg_presets = {
    "rnn_translate_spa_eng_radam": {
        "data": data_anki_spa_eng_cfg,
        "model": model_translate_rnn_seq2seq_cfg,
        "opt": radam_no_lr_sch_opt_cfg,
        "train": train_cfg,
    },
    "attention_based_translate_spa_eng_radam": {
        "data": data_anki_spa_eng_cfg,
        "model": model_translate_attention_based_seq2seq_cfg,
        "opt": radam_no_lr_sch_opt_cfg,
        "train": train_cfg,      
    }
}

### hydra composition ###
# clear hydra instance -> Jupyter 환경에서 할 때는 일단 instance clear 하기
hydra.core.global_hydra.GlobalHydra.instance().clear()

# register preset configs
register_config(_merged_cfg_presets)

# initialization
hydra.initialize(config_path=None)    # yaml을 쓰고 있고 외부에서 하면 config_path 지정해야 함

# using_config_key = "cnn_fashion_mnist_radam"
using_config_key = "attention_based_translate_spa_eng_radam"
cfg = hydra.compose(using_config_key)

# define & override log _cfg
model_name = cfg.model.name
run_dirname = "dnn-tutorial-spa_eng-translate-runs-tf"
run_name = f"{datetime.now().isoformat(timespec='seconds')}-{using_config_key}-{model_name}"
log_dir = os.path.join(drive_project_root, "runs", run_dirname, run_name)

log_cfg = {
    "run_name": run_name,
    # callback을 못 쓰니 filepath 등 지정 필요
    "checkpoint_filepath": os.path.join(log_dir, "model"),
    "tensorboard_log_dir": log_dir,
    "callbacks": {
        "TensorBoard": {
            "log_dir": log_dir,
            "update_freq": 50,
        },
        "EarlyStopping": {
            "patience": 30,
            "verbose": True,
        }
    },
    "wandb": {
        "project": "dnn-tutorial-spa_eng-translate-runs-tf",
        "name": run_name,
        "tags": ["dnn-tutorial-spa_eng-translate-runs-tf"],
        "reinit": True,
        "sync_tensorboard": True,
    },
}

# unlock struct of config & set log config
OmegaConf.set_struct(cfg, False)
cfg.log = log_cfg

# relock config
OmegaConf.set_struct(cfg, True)
print(OmegaConf.to_yaml(cfg))

# save yaml
# with open(os.path.join(log_dir, "config.yaml")) as f:
# with open("config.yaml", "w") as f:
#     OmegaConf.save(cfg, f)

# This would open the file we saved above
# and tell you the result of the model and its configs (weights, ...)
# You can check it whenever you want
# OmegaConf.load()

In [None]:
def get_distribute_strategy(strategy_name: str, **kwargs):
    return getattr(tf.distribute, strategy_name)(**kwargs)

distribute_strategy = get_distribute_strategy(cfg.train.distribute_strategy)

In [None]:
# dataset batchfy 및 train/val/test splits
# from_tensor_slices : numpy array 형태와 비슷
dataset = tf.data.Dataset.from_tensor_slices((src_tensor, tar_tensor))
total_n = len(src_tensor)

print(cfg.data.train_val_test_split_ratio)
train_size = int(total_n * cfg.data.train_val_test_split_ratio[0])
val_size = int(total_n * cfg.data.train_val_test_split_ratio[1])
test_size = total_n - (train_size + val_size)

# split (train, val), (test) dataset
test_dataset = dataset.skip(train_size + val_size)
train_val_dataset = dataset.take(train_size + val_size)

if cfg.data.train_val_shuffle:  # True
    train_val_dataset = train_val_dataset.shuffle(buffer_size=1024)

train_dataset = train_val_dataset.take(train_size)
val_dataset = train_val_dataset.skip(train_size)

train_n, val_n, test_n = len(train_dataset), len(val_dataset), len(test_dataset)
print(train_n, val_n, test_n)
assert train_n + val_n + test_n == total_n   # 문제가 없는지 확인

# batchfy (dataloader)
train_batch_size = cfg.train.train_batch_size
val_batch_size = cfg.train.val_batch_size 
test_batch_size = cfg.train.test_batch_size

train_dataloader = train_dataset.batch(train_batch_size, drop_remainder=True)
val_dataloader = val_dataset.batch(val_batch_size, drop_remainder=True)
test_dataloader = test_dataset.batch(test_batch_size, drop_remainder=True)

## define model

LinearWarmupLRScheduler 하는 이유
- SGD는 다른 optimizer 대비 learning rate 값에 매우 민감
  - learning rate를 잘 setting 해야 성능이 좋게 나옴 (Adam보다 더 좋게 나오기도 함)
- 따라서 optimizer와 함께 learning rate도 tuning 하는 게 원래는 좋음
- 그러나 학습 속도가 너무 느려지는 단점

warmup을 하기 어려운 상황이면?
- Rectified Adam으로 먼저 테스트 해 보고, optimizer는 조절해도 거의 결과 비슷하게 나오니, 모델링 부분을 업데이트 해 보기
- Rectified Adam에도 tuning 할 수 있는 요소 많음
  - https://www.tensorflow.org/addons/api_docs/python/tfa/optimizers/RectifiedAdam

In [None]:
# 모델 정의
def get_seq2seq_model(cfg: DictConfig):
    if cfg.model.name == "RNNSeq2Seq":
        encoder = GRUEncoder(cfg)
        decoder = GRUDecoder(cfg)
        return encoder, decoder
    elif cfg.model.name == "AttentionBasedSeq2Seq":
        encoder = GRUEncoder(cfg)
        decoder = AttentionalGRUDecoder(cfg)
        return encoder, decoder
    else:
        raise NotImplementedError()

In [None]:
# loss 정의
def loss_function(
    real,
    pred,
    loss_object=tf.keras.losses.SparseCategoricalCrossentropy(
        from_logits=True, reduction="none"
    )
):
    # delete [pad] loss part with masks. 
    mask = tf.math.logical_not(
        tf.math.equal(real, 0)
    )
    _loss = loss_object(real, pred)

    mask = tf.cast(mask, dtype=_loss.dtype)
    _loss *= mask

    return tf.reduce_mean(_loss)


In [None]:
# get model
encoder, decoder = get_seq2seq_model(cfg)

# get optimizer
optimizer, scheduler = get_optimizer_element(
    cfg.opt.optimizer, cfg.opt.lr_scheduler
)

# checkpoints
checkpoint_prefix = cfg.log.checkpoint_filepath
checkpoint = tf.train.Checkpoint(
    optimizer=optimizer,
    encoder=encoder,
    decoder=decoder,
)

In [None]:
# callbacks = get_callbacks(cfg.log)

## Define Custom Train/Eval Steps

In [None]:
@tf.function
def _step(src, tar, enc_hidden, teacher_forcing_ratio=0.5):

    if cfg.model.name == "RNNSeq2Seq":
        return _rnn_step(src, tar, enc_hidden, teacher_forcing_ratio)
    elif cfg.model.name == "AttentionBasedSeq2Seq":
        return _attentional_rnn_step(src, tar, enc_hidden, teacher_forcing_ratio)
    else:
        raise NotImplementedError()

@tf.function
def _attentional_rnn_step(src, tar, enc_hidden, teacher_forcing_ratio=0.5):
    enc_output, enc_hidden = encoder(src, enc_hidden)

    dec_hidden = enc_hidden

    # add start token
    dec_input = tf.expand_dims(
        [tar_tokenizer.word_index["<start>"]] * src.shape[0],   # multiply with batch_size
        1
    )  # [Batch, 1]

    outputs = []
    loss = 0

    # sequence 길이만큼 루프! (autoregressive or teacher-forcing)
    for t in range(1, tar.shape[1]):
        predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
        
        outputs.append(predictions)            # [batch_size, vocab_size]
        final_outs = tf.argmax(predictions, 1) # [batch_size]
        ground_truth = tar[:, t]               # [batch_size]

        loss += loss_function(ground_truth, predictions)

        # random.random() : pick random number between 0~1
        if random.random() < teacher_forcing_ratio:  # teacher forcing case
            dec_input = tf.expand_dims(ground_truth, 1)
        else:                                        # no teacher forcing case
            dec_input = tf.expand_dims(final_outs, 1)
    
    return loss, outputs

# 최적화 이슈가 있는 경우 tensorflow에서는 @ decorator 사용
@tf.function
def _rnn_step(src, tar, enc_hidden, teacher_forcing_ratio=0.5):

    enc_output, enc_hidden = encoder(src, enc_hidden)

    dec_hidden = enc_hidden

    # add start token
    dec_input = tf.expand_dims(
        [tar_tokenizer.word_index["<start>"]] * src.shape[0], # multiply with batch_size
        1
    )  # [Batch, 1]

    outputs = []
    loss = 0

    # sequence 길이만큼 루프! (autoregressive or teacher-forcing)
    for t in range(1, tar.shape[1]):
        predictions, dec_hidden, _ = decoder(dec_input, dec_hidden) # prediction : [Batch, 1, target voca size]

        outputs.append(predictions[:, 0])   # [Batch, voca size]
        final_outs = tf.argmax(predictions, 2) # [Batch, 1]

        ground_truth = tf.expand_dims(tar[:, t], 1)  # for teacher-forcing : [B] -> [B, 1]

        loss += loss_function(ground_truth, predictions)

        # random.random() : pick random number between 0~1
        if random.random() < teacher_forcing_ratio: # teacher forcing case
            dec_input = ground_truth
        else:                                        # no teacher forcing case
            dec_input = final_outs 
    
    return loss, outputs

@tf.function
def train_step(src, tar, enc_hidden, teacher_forcing_ratio=0.5):
    with tf.GradientTape() as tape:
        loss, outputs = _step(src, tar, enc_hidden, teacher_forcing_ratio)
    
    batch_loss = (loss / int(tar.shape[1])) # divide with seq_len

    variables = encoder.trainable_variables + decoder.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))

    return batch_loss, outputs


# test에서는 teacher forcing이 있을 수 없음 - 순차적으로 적용해서 맞아야 하기 때문에 (test set은 순서 중요)
@tf.function
def eval_step(src, tar, enc_hidden):
    loss, outputs = _step(src, tar, enc_hidden, 0.0)
    batch_loss = (loss / int(tar.shape[1])) # divide with seq_len
    return batch_loss, outputs

## wandb setup

- https://docs.wandb.ai/guides/integrations/tensorflow
- sync_tensorboard=True : tensorflow에 적혀있는 걸 wandb에 업로드

In [None]:
# flatten_dict(cfg)   # 전부 flatten 하게 바꿔주는 함수 -> nested 구조를 모두 under bar 형태로 바꿈

In [None]:
wandb.init(
    config= flatten_dict(cfg),
    **cfg.log.wandb
)

경로 잘 찾고 있는지 확인

In [None]:
! ls /content/drive/MyDrive/\#fastcampus/runs/

## Training/Eval Loop

In [None]:
# tensorboard load하기 : load extension
%load_ext tensorboard

# 경로 지정 : terminal 문법이기 때문에 #을 # 그대로 인지하려면 앞에 '\' 써줘야 함
%tensorboard --logdir /content/drive/MyDrive/\#fastcampus/runs/

val_dataloader = iter(val_dataloader)
steps_per_epoch = train_n // cfg.train.train_batch_size   # epoch 별로 얼마나 갈 것인가

# tensorboard summary writer
tb_writer = tf.summary.create_file_writer(
    cfg.log.tensorboard_log_dir
)

# custom loop
# callback을 못 쓰니 최소한의 것 수동으로 지정
step = 0
for epoch in range(cfg.train.max_epochs):
    start = time.time()

    total_epoch_loss = 0

    # batch iteration
    for (batch, (cur_src, cur_tar)) in enumerate(train_dataloader.take(steps_per_epoch)):

        enc_hidden = tf.zeros((
            cfg.train.train_batch_size,
            cfg.model.enc.rnn.units   # unit = hidden
        ))
        batch_loss, outputs = train_step(cur_src, cur_tar, enc_hidden)
        total_epoch_loss += batch_loss

        if batch % 100 == 0 or steps_per_epoch == batch:
            print("Epoch {} Batch {} Train Loss {:.4f}".format(
                epoch+1,
                batch,
                batch_loss.numpy()
            ))
        
        step += 1

    # save model per 2 epoch
    if (epoch + 1) % 2 == 0:
        checkpoint.save(file_prefix=checkpoint_prefix)

    train_loss = total_epoch_loss / steps_per_epoch
    print("Epoch {} Train Loss {:.4f}".format(epoch+1, train_loss))

    with tb_writer.as_default():
        tf.summary.scalar("train_loss", train_loss, step=step)

    # validation step
    enc_hidden = tf.zeros((
        cfg.train.val_batch_size,
        cfg.model.enc.rnn.units   # unit = hidden
        ))
    cur_src, cur_tar = next(val_dataloader)
    val_loss, outputs = eval_step(cur_src, cur_tar, enc_hidden)
    print("Epoch {} Val Loss {:.4f}".format(epoch+1, val_loss))

    # token -> text & logging
    preds = tf.stack(outputs, axis=1)
    preds = tf.argmax(preds, axis=2)   
    preds = [p.numpy() for p in preds]

    src_texts = src_tokenizer.sequences_to_texts(cur_src.numpy())
    tar_texts = tar_tokenizer.sequences_to_texts(cur_tar.numpy())
    pred_texts = tar_tokenizer.sequences_to_texts(preds)

    with tb_writer.as_default():
        tf.summary.scalar("val_loss", val_loss, step=step)
        tf.summary.text("val_src_text", src_texts[0], step=step)
        tf.summary.text("val_tar_text", tar_texts[0], step=step)
        tf.summary.text("val_pred_text", pred_texts[0], step=step)

    print(f"Time taken for 1 epoch {time.time() - start} sec\n")

## Evaluation Code Examples (Attentional RNN)

In [None]:
def evaluate(
    sentence,  # input
    encoder,
    decoder,
    src_tokenizer,
    tar_tokenizer,
    max_src_len,
    max_tar_len,
):
    # preprocessing sentence
    sentence = preprocess_sentence(sentence)

    inputs = [src_tokenizer.word_index[i] for i in sentence.split(" ")]
    inputs = pad_sequences([inputs], maxlen=max_src_len, padding="post")

    inputs = tf.convert_to_tensor(inputs)

    result = ""

    # encoder forward
    hidden = [tf.zeros((1, encoder.cfg.model.enc.rnn.units))]  # 1 : batch size
    enc_out, enc_hidden = encoder(inputs, hidden)

    # autoregressive inference of decoder
    dec_hidden = enc_hidden
    dec_input = tf.expand_dims([tar_tokenizer.word_index["<start>"]], 0)   # start token

    # source와 target 간 상관관계 볼 plot
    attention_plot = np.zeros((max_tar_len, max_src_len))

    for t in range(max_tar_len):
        predictions, dec_hidden, attention_weights = decoder(
            dec_input, dec_hidden, enc_out
        )

        # for plotting of attention weights
        attention_weights = tf.reshape(attention_weights, (-1, ))
        attention_plot[t] = attention_weights.numpy()

        predicted_id = tf.argmax(predictions[0]).numpy()

        result += tar_tokenizer.index_word[predicted_id] + " "

        if tar_tokenizer.index_word[predicted_id] == "<end>":
            break

        # predicted 된 id를 모델에 다시 넣기 위해 (autoregressive)
        dec_input = tf.expand_dims([predicted_id], 0)

    return result, sentence, attention_plot

In [None]:
def plot_attention(attention, sentence: List, predicted_sentence: List):
    fig = plt.figure(figsize=(15, 15))
    ax = fig.add_subplot(1, 1, 1)

    ax.matshow(attention, cmap="viridis")

    fontdict = {"fontsize": 16}

    # 부가적으로 보여줄 것 setting
    ax.set_xticklabels([""] + sentence, fontdict=fontdict, rotation=90)
    ax.set_yticklabels([""] + predicted_sentence, fontdict=fontdict)

    ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

    plt.show()

## checkpoint restore

In [None]:
# best인 걸 불러올 수도 있는데 일단 가장 최신 거 불러옴 : latest_checkpoint

checkpoint.restore(tf.train.latest_checkpoint(cfg.log.checkpoint_filepath))

In [None]:
result, sentence, attention_plot = evaluate(
    u"Esta es mi vida.", encoder, decoder, src_tokenizer, tar_tokenizer, max_src_len, max_tar_len
)

attention_plot = attention_plot[:len(result.split(" ")), :len(sentence.split(" "))]  # 일부만 가져오게 함
plot_attention(attention_plot, sentence.split(" "), result.split(" "))