# SeqGAN(Sequence Generative Adverserial Networks with Policy Grandient)

- https://arxiv.org/abs/1609.05473
- Code Reference : https://github.com/suragnair/seqGAN

### 특징
- GAN 모형에 Reinforcement Learning을 적용한 최초의 시도
- **MCTS** : Heuristic Search
- **GAN**
---
### 문제점
**Sequence generation**에 GAN모형을 적용하기 위해서는 2가지의 문제점을 극복해야 함
1. GAN의 Discriminator는 오직 전체 sequence에 대한 score, loss를 계산 가능. 부분적으로 생성된 현재 Sequence와 완성된 미래 Sequence의 score 사이의 균형을 맞추기가 어려움 점이 존재함

2. real-valued, continuous data 생성을 위해 모델링된 GAN의 Generator는 discrete token으로 구성된 sequence generation에 적용하기가 어려움
 - generated data가 discrete token인 경우, limited dictionary space에서 slight change(gradient)와 대응하는 token이 존재하지 않을 가능성이 크므로 direct gradient가 불가능함
 
### 제안방법
1. SeqGAN은 Generator를 sotchastic policy(REINFORCE)를 정의하고 바로 gradient policy update가 가능하게 함으로써 generator differenciation problem을 우회함
2. 완전한 sequence에 대한 D의 판단 score가 RL의 reward가 되며, MCTS를 통해 중간에 state-action단계로 reward를 전달함

## 접근방법

![](./source/source_01.png)

![](./source/source_02.png)

## Algorithm process
![](./source/pseudo_code.png)

## 1. Import Libs

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import math
import torch.nn.init as init
import torch.autograd as autograd
import sys
import warnings
warnings.filterwarnings('ignore')

## 2. Utils

In [3]:
def prepare_generator_batch(samples, start_letter=0, gpu=False):
    """
    Takes samples (a batch) and returns
    Inputs: samples, start_letter, cuda
        - samples: batch_size x seq_len (Tensor with a sample in each row)
    Returns: inp, target
        - inp: batch_size x seq_len (same as target, but with start_letter prepended)
        - target: batch_size x seq_len (Variable same as samples)
    """

    batch_size, seq_len = samples.size()

    inp = torch.zeros(batch_size, seq_len)
    target = samples
    inp[:, 0] = start_letter
    inp[:, 1:] = target[:, :seq_len-1]

    inp = autograd.Variable(inp).type(torch.LongTensor)
    target = autograd.Variable(target).type(torch.LongTensor)

    if gpu:
        inp = inp.cuda()
        target = target.cuda()

    return inp, target


def prepare_discriminator_data(pos_samples, neg_samples, gpu=False):
    """
    Takes positive (target) samples, negative (generator) samples and prepares inp and target data for discriminator.
    Inputs: pos_samples, neg_samples
        - pos_samples: pos_size x seq_len
        - neg_samples: neg_size x seq_len
    Returns: inp, target
        - inp: (pos_size + neg_size) x seq_len
        - target: pos_size + neg_size (boolean 1/0)
    """

    inp = torch.cat((pos_samples, neg_samples), 0).type(torch.LongTensor)
    target = torch.ones(pos_samples.size()[0] + neg_samples.size()[0])
    target[pos_samples.size()[0]:] = 0

    # shuffle
    perm = torch.randperm(target.size()[0])
    target = target[perm]
    inp = inp[perm]

    inp = autograd.Variable(inp)
    target = autograd.Variable(target)

    if gpu:
        inp = inp.cuda()
        target = target.cuda()

    return inp, target


def batchwise_sample(gen, num_samples, batch_size):
    """
    Sample num_samples samples batch_size samples at a time from gen.
    Does not require gpu since gen.sample() takes care of that.
    """

    samples = []
    for i in range(int(math.ceil(num_samples/float(batch_size)))):
        samples.append(gen.sample(batch_size))

    return torch.cat(samples, 0)[:num_samples]

def batchwise_oracle_nll(gen, oracle, num_samples, batch_size, max_seq_len, start_letter=0, gpu=False):
    s = batchwise_sample(gen, num_samples, batch_size)
    oracle_nll = 0
    for i in range(0, num_samples, batch_size):
        inp, target = prepare_generator_batch(s[i:i+batch_size], start_letter, gpu)
        oracle_loss = oracle.batchNLLLoss(inp, target) / max_seq_len
        oracle_nll += oracle_loss.data[0]

    return oracle_nll/(num_samples/batch_size)

# 3. Sequence Generator
### Objective : to maximize the expected reward

$$J(\theta) = E\big[R_T | s_0, \theta \big] = \sum_{y_{1} \in Y} G_{\theta}(y_1 | s_0) \dot{} Q_{D_{\phi}}^{G_{\theta}}(s_0, y_1) $$


### Monte Carlo Tree Search

$$\Big\{ Y_{1:T}^{1}, ..., Y_{1:T}^N\Big\} = MC^{G_{\beta}} \big(Y_{1:t}; N \big)$$

### action-value function

$$Q_{D_{\phi}}^{G_{\theta}}\big(s = Y_{1:t-1}, a = y_t\big) = $$

$$\begin{cases} 
    \frac{1}{N} \sum_{n=1}^N  D_{\phi}(Y_{1:T}^n), Y_{1:T}^n \in MC^{G_{\beta}}(Y_{1:T}; N)\ \ \ \text{for} \ \ \ t < T\\
    D_{\phi}(Y_{1:T}) \ \ \ \text{for} \ \ \ t < T 
\end{cases}$$

### LSTM
![](./source/Sequence_Generative_Model.png)

### Policy Gradient
- 증명부분 제외

![](./source/REINFORCE.png)

In [4]:
class Generator(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, max_seq_len, gpu=False, oracle_init=False):
        super(Generator, self).__init__()
        self.hidden_dim = hidden_dim
        self.embedding_dim = embedding_dim
        self.max_seq_len = max_seq_len
        self.vocab_size = vocab_size
        self.gpu = gpu

        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.gru = nn.GRU(embedding_dim, hidden_dim)
        self.gru2out = nn.Linear(hidden_dim, vocab_size)

        # initialise oracle network with N(0,1)
        # otherwise variance of initialisation is very small => high NLL for data sampled from the same model
        if oracle_init:
            for p in self.parameters():
                init.normal(p, 0, 1)
                
    def init_hidden(self, batch_size = 1):
        h = autograd.Variable(torch.zeros(1, batch_size, self.hidden_dim))
        
        if self.gpu:
            return h.cuda()
        else:
            return h
        
    def forward(self, inp, hidden):
        """
        Embeds input and applies GRU one token at a time (seq_len = 1)
        """
        # input_dim                                                     # batch_size
        emb = self.embeddings(inp)                             # batch_size x embedding_dim         
        emb = emb.view(1, -1, self.embedding_dim)      # 1 x batch_size x embedding_dim,   B,T,D
        out, hidden = self.gru(emb, hidden)                   # 1 x batch_size x hiddim_dim,         1,B,H
        out = self.gru2out(out.view(-1, self.hidden_dim))
        out = F.log_softmax(out)
        return out, hidden
    
    def sample(self, num_samples, start_letter=0):
        """
        Samples the network and returns num_samples samples of length max_seq_len.
        Outputs: samples, hidden
            - samples: num_samples x max_seq_length (a sampled sequence in each row)
        """
        samples = torch.zeros(num_samples, self.max_seq_len).type(torch.LongTensor)

        h = self.init_hidden(num_samples)
        inp = autograd.Variable(torch.LongTensor([start_letter]*num_samples))

        if self.gpu:
            samples = samples.cuda()
            inp = inp.cuda()

        for i in range(self.max_seq_len):
            out, h = self.forward(inp, h)               # out: num_samples x vocab_size
            out = torch.multinomial(torch.exp(out), 1)  # num_samples x 1 (sampling from each row)
            samples[:, i] = out.data

            inp = out.view(-1)

        return samples

        
    def batchNLLLoss(self, inp, target):
        """
        Returns the NLL Loss for predicting target sequence.
        Inputs: inp, target
            - inp: batch_size x seq_len
            - target: batch_size x seq_len
            inp should be target with <s> (start letter) prepended
        """
        loss_fn = nn.NLLLoss()
        batch_size, seq_len = inp.size()
        inp = inp.permute(1, 0)                       # seq_len x batch_size
        target = target.permute(1, 0)               # seq_len x batch_size
        h = self.init_hidden(batch_size)
        
        loss = 0
        for i in range(seq_len):
            out, h = self.forward(inp[i], h) # loss
            loss += loss_fn(out, target[i])
            
        return loss # per batch
    
    # Policy Gradient : REINFORCE
    def batchPGLoss(self, inp, target, reward):
        """
        Returns a pseudo-loss that gives corresponding policy gradients (on calling .backward()).
        Inspired by the example in http://karpathy.github.io/2016/05/31/rl/
        Inputs: inp, target
            - inp: batch_size x seq_len
            - target: batch_size x seq_len
            - reward: batch_size (discriminator reward for each sentence, applied to each token of the corresponding
                      sentence)
            inp should be target with <s> (start letter) prepended
        """
        batch_size, seq_len = inp.size()
        inp = inp.permute(1, 0)
        target = target.permute(1, 0)
        h = self.init_hidden(batch_size)
        
        loss = 0
        for i in range(seq_len):
            out, h = self.forward(inp[i], h)
            for j in range(batch_size):
                loss += -out[j][target.data[i][j]]* reward[j]
                
        return loss / batch_size

## 4. Sequence Discriminator Model
- 기존 GAN과 동일한 목적함수를 이용해서 $D_{\phi}$를 학습.
- real data = 1, generated data = 0으로 접근하는 이진 분류로 접근
- Word Embedding, Conv, MLP,....

### Objective function

$$\min_{\phi} - E_{Y \sim P_{data}} \big[ \text{log} D_{\phi} \big( Y \big) \big] - E_{Y \sim G_{\theta}} \big[ \text{log} \big( 1 - D_{\theta} \big( Y\big) \big]$$

<br><br>

![](./source/Sequence_Discriminator_Model.png)

In [5]:
class Discriminator(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, max_seq_len, gpu=False, dropout=0.2):
        super(Discriminator, self).__init__()
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.max_seq_len = max_seq_len
        self.gpu = gpu
        
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.gru = nn.GRU(embedding_dim, hidden_dim, num_layers=2, bidirectional=True, dropout=dropout)
        self.gru2hidden = nn.Linear(2*2*hidden_dim, hidden_dim) # (bidirectional * num_layers * hidden_dim, hidden_dim)
        self.dropout_linear = nn.Dropout(p=dropout)
        self.hidden2out = nn.Linear(hidden_dim, 1)
        
    def init__hidden(self, batch_size):
        h = autograd.Variable(torch.zeros(2*2*1, batch_size, self.hidden_dim))
        
        if self.gpu:
            return h.cuda()
        else:
            return h
    
    def forward(self, input, hidden):
        # input dim                                                                      # batch_size x seq_len
        emb = self.embeddings(input)                                           # batch_size x seq_len x embedding_dim
        emb = emb.permute(1, 0, 2)                                             # seq_len x batch_size x embedding_dim
        _, hidden = self.gru(emb, hidden)                                      # 4 x batch_size x hidden_dim
        hidden = hidden.permute(1, 0, 2).contiguous()                   # batch_size x 4 x hidden
        out = self.gru2hidden(hidden.view(-1, 4*self.hidden_dim))  # batch_size x 4*hidden_dim
        out = F.tanh(out)
        out = self.dropout_linear(out)
        out = self.hidden2out(out)                                                # batch_size x 1
        out = F.sigmoid(out)
        return out
    
    def batchClassify(self, inp):
        """
        Classifies a batch of sequences.
        Inputs: inp
            - inp: batch_size x seq_len
        Returns: out
            - out: batch_size ([0,1] score)
        """
        h = self.init__hidden(inp.size()[0])
        out = self.forward(inp, h)
        return out.view(-1)
    
    def batchBCELoss(self, inp, target):
        """
        Returns Binary Cross Entropy Loss for discriminator.
         Inputs: inp, target
            - inp: batch_size x seq_len
            - target: batch_size (binary 1/0)
        """
        loss_fn = nn.BCELoss()
        h = self.init__hidden(inp.size()[0])
        out = self.forward(inp, h)
        return loss_fn(out, target)

## 5. Hyperparameters

In [6]:
CUDA = False
VOCAB_SIZE = 5000
MAX_SEQ_LEN = 20
START_LETTER = 0
BATCH_SIZE = 32
MLE_TRAIN_EPOCHS = 100
ADV_TRAIN_EPOCHS = 50
POS_NEG_SAMPLES = 10000

GEN_EMBEDDING_DIM = 32
GEN_HIDDEN_DIM = 32
DIS_EMBEDDING_DIM = 64
DIS_HIDDEN_DIM = 64

oracle_samples_path = './oracle_samples.trc'
oracle_state_dict_path = './oracle_EMBDIM32_HIDDENDIM32_VOCAB5000_MAXSEQLEN20.trc'
pretrained_gen_path = './gen_MLEtrain_EMBDIM32_HIDDENDIM32_VOCAB5000_MAXSEQLEN20.trc'
pretrained_dis_path = './dis_pretrain_EMBDIM_64_HIDDENDIM64_VOCAB5000_MAXSEQLEN20.trc'

## 6. Train

In [7]:
def train_generator_MLE(gen, gen_opt, oracle, real_data_samples, epochs):
    """
    Max Likelihood Pretraining for the generator
    """
    for epoch in range(epochs):
        print('epoch %d : ' % (epoch + 1), end='')
        sys.stdout.flush()
        total_loss = 0

        for i in range(0, POS_NEG_SAMPLES, BATCH_SIZE):
            inp, target = prepare_generator_batch(real_data_samples[i:i + BATCH_SIZE], start_letter=START_LETTER, gpu=CUDA)
            gen_opt.zero_grad()
            loss = gen.batchNLLLoss(inp, target)
            loss.backward()
            gen_opt.step()

            total_loss += loss.data[0]

            if (i / BATCH_SIZE) % math.ceil(
                            math.ceil(POS_NEG_SAMPLES / float(BATCH_SIZE)) / 10.) == 0:  # roughly every 10% of an epoch
                print('.', end='')
                sys.stdout.flush()

        # each loss in a batch is loss per sample
        total_loss = total_loss / math.ceil(POS_NEG_SAMPLES / float(BATCH_SIZE)) / MAX_SEQ_LEN

        # sample from generator and compute oracle NLL
        oracle_loss = batchwise_oracle_nll(gen, oracle, POS_NEG_SAMPLES, BATCH_SIZE, MAX_SEQ_LEN,
                                                   start_letter=START_LETTER, gpu=CUDA)

        print(' average_train_NLL = %.4f, oracle_sample_NLL = %.4f' % (total_loss, oracle_loss))


def train_generator_PG(gen, gen_opt, oracle, dis, num_batches):
    """
    The generator is trained using policy gradients, using the reward from the discriminator.
    Training is done for num_batches batches.
    """

    for batch in range(num_batches):
        s = gen.sample(BATCH_SIZE*2)        # 64 works best
        inp, target = prepare_generator_batch(s, start_letter=START_LETTER, gpu=CUDA)
        rewards = dis.batchClassify(target)

        gen_opt.zero_grad()
        pg_loss = gen.batchPGLoss(inp, target, rewards)
        pg_loss.backward()
        gen_opt.step()

    # sample from generator and compute oracle NLL
    oracle_loss = batchwise_oracle_nll(gen, oracle, POS_NEG_SAMPLES, BATCH_SIZE, MAX_SEQ_LEN,
                                                   start_letter=START_LETTER, gpu=CUDA)

    print(' oracle_sample_NLL = %.4f' % oracle_loss)


def train_discriminator(discriminator, dis_opt, real_data_samples, generator, oracle, d_steps, epochs):
    """
    Training the discriminator on real_data_samples (positive) and generated samples from generator (negative).
    Samples are drawn d_steps times, and the discriminator is trained for epochs epochs.
    """

    # generating a small validation set before training (using oracle and generator)
    pos_val = oracle.sample(100)
    neg_val = generator.sample(100)
    val_inp, val_target = prepare_discriminator_data(pos_val, neg_val, gpu=CUDA)

    for d_step in range(d_steps):
        s = batchwise_sample(generator, POS_NEG_SAMPLES, BATCH_SIZE)
        dis_inp, dis_target = prepare_discriminator_data(real_data_samples, s, gpu=CUDA)
        for epoch in range(epochs):
            print('d-step %d epoch %d : ' % (d_step + 1, epoch + 1), end='')
            sys.stdout.flush()
            total_loss = 0
            total_acc = 0

            for i in range(0, 2 * POS_NEG_SAMPLES, BATCH_SIZE):
                inp, target = dis_inp[i:i + BATCH_SIZE], dis_target[i:i + BATCH_SIZE]
                dis_opt.zero_grad()
                out = discriminator.batchClassify(inp)
                loss_fn = nn.BCELoss()
                loss = loss_fn(out, target)
                loss.backward()
                dis_opt.step()

                total_loss += loss.data[0]
                total_acc += torch.sum((out>0.5)==(target>0.5)).data[0]

                if (i / BATCH_SIZE) % math.ceil(math.ceil(2 * POS_NEG_SAMPLES / float(
                        BATCH_SIZE)) / 10.) == 0:  # roughly every 10% of an epoch
                    print('.', end='')
                    sys.stdout.flush()

            total_loss /= math.ceil(2 * POS_NEG_SAMPLES / float(BATCH_SIZE))
            total_acc /= float(2 * POS_NEG_SAMPLES)

            val_pred = discriminator.batchClassify(val_inp)
            print(' average_loss = %.4f, train_acc = %.4f, val_acc = %.4f' % (total_loss, total_acc, torch.sum((val_pred>0.5)==(val_target>0.5)).data[0]/200.))

In [8]:
oracle = Generator(GEN_EMBEDDING_DIM, GEN_HIDDEN_DIM, VOCAB_SIZE, MAX_SEQ_LEN, gpu=CUDA)
oracle.load_state_dict(torch.load(oracle_state_dict_path))
oracle_samples = torch.load(oracle_samples_path).type(torch.LongTensor)
# a new oracle can be generated by passing oracle_init=True in the generator constructor
# samples for the new oracle can be generated using helpers.batchwise_sample()

gen = Generator(GEN_EMBEDDING_DIM, GEN_HIDDEN_DIM, VOCAB_SIZE, MAX_SEQ_LEN, gpu=CUDA)
dis = Discriminator(DIS_EMBEDDING_DIM, DIS_HIDDEN_DIM, VOCAB_SIZE, MAX_SEQ_LEN, gpu=CUDA)

if CUDA:
    oracle = oracle.cuda()
    gen = gen.cuda()
    dis = dis.cuda()
    oracle_samples = oracle_samples.cuda()

# GENERATOR MLE TRAINING
print('Starting Generator MLE Training...')
gen_optimizer = optim.Adam(gen.parameters(), lr=1e-2)
train_generator_MLE(gen, gen_optimizer, oracle, oracle_samples, MLE_TRAIN_EPOCHS)

torch.save(gen.state_dict(), pretrained_gen_path)
gen.load_state_dict(torch.load(pretrained_gen_path))

# PRETRAIN DISCRIMINATOR
print('\nStarting Discriminator Training...')
dis_optimizer = optim.Adagrad(dis.parameters())
train_discriminator(dis, dis_optimizer, oracle_samples, gen, oracle, 50, 3)

torch.save(dis.state_dict(), pretrained_dis_path)
dis.load_state_dict(torch.load(pretrained_dis_path))

# ADVERSARIAL TRAINING
print('\nStarting Adversarial Training...')
oracle_loss = batchwise_oracle_nll(gen, oracle, POS_NEG_SAMPLES, BATCH_SIZE, MAX_SEQ_LEN, start_letter=START_LETTER, gpu=CUDA)
print('\nInitial Oracle Sample Loss : %.4f' % oracle_loss)

for epoch in range(ADV_TRAIN_EPOCHS):
    print('\n--------\nEPOCH %d\n--------' % (epoch+1))
    # TRAIN GENERATOR
    print('\nAdversarial Training Generator : ', end='')
    sys.stdout.flush()
    train_generator_PG(gen, gen_optimizer, oracle, dis, 1)

    # TRAIN DISCRIMINATOR
    print('\nAdversarial Training Discriminator : ')
    train_discriminator(dis, dis_optimizer, oracle_samples, gen, oracle, 5, 3)

Starting Generator MLE Training...
epoch 1 : .......... average_train_NLL = 6.8153, oracle_sample_NLL = 14.5579
epoch 2 : .......... average_train_NLL = 6.1321, oracle_sample_NLL = 13.5007
epoch 3 : .......... average_train_NLL = 5.7977, oracle_sample_NLL = 12.8886
epoch 4 : .......... average_train_NLL = 5.5862, oracle_sample_NLL = 12.5322
epoch 5 : .......... average_train_NLL = 5.4373, oracle_sample_NLL = 12.2580
epoch 6 : .......... average_train_NLL = 5.3256, oracle_sample_NLL = 12.0543
epoch 7 : .......... average_train_NLL = 5.2375, oracle_sample_NLL = 11.9010
epoch 8 : .......... average_train_NLL = 5.1659, oracle_sample_NLL = 11.7473
epoch 9 : .......... average_train_NLL = 5.1053, oracle_sample_NLL = 11.6427
epoch 10 : .......... average_train_NLL = 5.0544, oracle_sample_NLL = 11.6157
epoch 11 : .......... average_train_NLL = 5.0113, oracle_sample_NLL = 11.5355
epoch 12 : .......... average_train_NLL = 4.9741, oracle_sample_NLL = 11.5011
epoch 13 : .......... average_train_NL

d-step 2 epoch 2 : .......... average_loss = 0.6019, train_acc = 0.6770, val_acc = 0.5700
d-step 2 epoch 3 : .......... average_loss = 0.5728, train_acc = 0.7067, val_acc = 0.6100
d-step 3 epoch 1 : .......... average_loss = 0.5847, train_acc = 0.6935, val_acc = 0.6100
d-step 3 epoch 2 : .......... average_loss = 0.5578, train_acc = 0.7179, val_acc = 0.6200
d-step 3 epoch 3 : .......... average_loss = 0.5294, train_acc = 0.7415, val_acc = 0.6250
d-step 4 epoch 1 : .......... average_loss = 0.5425, train_acc = 0.7320, val_acc = 0.6250
d-step 4 epoch 2 : .......... average_loss = 0.5146, train_acc = 0.7515, val_acc = 0.6350
d-step 4 epoch 3 : .......... average_loss = 0.4894, train_acc = 0.7705, val_acc = 0.6550
d-step 5 epoch 1 : .......... average_loss = 0.5103, train_acc = 0.7563, val_acc = 0.6250
d-step 5 epoch 2 : .......... average_loss = 0.4828, train_acc = 0.7761, val_acc = 0.6550
d-step 5 epoch 3 : .......... average_loss = 0.4553, train_acc = 0.7921, val_acc = 0.6350
d-step 6 e

d-step 32 epoch 2 : .......... average_loss = 0.1331, train_acc = 0.9648, val_acc = 0.5950
d-step 32 epoch 3 : .......... average_loss = 0.1171, train_acc = 0.9699, val_acc = 0.5950
d-step 33 epoch 1 : .......... average_loss = 0.1443, train_acc = 0.9625, val_acc = 0.6000
d-step 33 epoch 2 : .......... average_loss = 0.1211, train_acc = 0.9689, val_acc = 0.6000
d-step 33 epoch 3 : .......... average_loss = 0.1062, train_acc = 0.9728, val_acc = 0.6200
d-step 34 epoch 1 : .......... average_loss = 0.1518, train_acc = 0.9574, val_acc = 0.5950
d-step 34 epoch 2 : .......... average_loss = 0.1263, train_acc = 0.9669, val_acc = 0.5950
d-step 34 epoch 3 : .......... average_loss = 0.1106, train_acc = 0.9709, val_acc = 0.5750
d-step 35 epoch 1 : .......... average_loss = 0.1417, train_acc = 0.9600, val_acc = 0.5950
d-step 35 epoch 2 : .......... average_loss = 0.1173, train_acc = 0.9689, val_acc = 0.6050
d-step 35 epoch 3 : .......... average_loss = 0.1044, train_acc = 0.9721, val_acc = 0.6100

d-step 1 epoch 1 : .......... average_loss = 0.0925, train_acc = 0.9781, val_acc = 0.6050
d-step 1 epoch 2 : .......... average_loss = 0.0775, train_acc = 0.9809, val_acc = 0.6150
d-step 1 epoch 3 : .......... average_loss = 0.0633, train_acc = 0.9852, val_acc = 0.6050
d-step 2 epoch 1 : .......... average_loss = 0.0936, train_acc = 0.9767, val_acc = 0.6050
d-step 2 epoch 2 : .......... average_loss = 0.0754, train_acc = 0.9809, val_acc = 0.6050
d-step 2 epoch 3 : .......... average_loss = 0.0645, train_acc = 0.9839, val_acc = 0.6000
d-step 3 epoch 1 : .......... average_loss = 0.0922, train_acc = 0.9766, val_acc = 0.6050
d-step 3 epoch 2 : .......... average_loss = 0.0743, train_acc = 0.9818, val_acc = 0.6150
d-step 3 epoch 3 : .......... average_loss = 0.0627, train_acc = 0.9846, val_acc = 0.6200
d-step 4 epoch 1 : .......... average_loss = 0.0963, train_acc = 0.9783, val_acc = 0.6100
d-step 4 epoch 2 : .......... average_loss = 0.0815, train_acc = 0.9815, val_acc = 0.6000
d-step 4 e

d-step 4 epoch 1 : .......... average_loss = 0.0621, train_acc = 0.9852, val_acc = 0.5750
d-step 4 epoch 2 : .......... average_loss = 0.0486, train_acc = 0.9887, val_acc = 0.5750
d-step 4 epoch 3 : .......... average_loss = 0.0411, train_acc = 0.9908, val_acc = 0.5800
d-step 5 epoch 1 : .......... average_loss = 0.0605, train_acc = 0.9856, val_acc = 0.5600
d-step 5 epoch 2 : .......... average_loss = 0.0475, train_acc = 0.9893, val_acc = 0.5850
d-step 5 epoch 3 : .......... average_loss = 0.0405, train_acc = 0.9900, val_acc = 0.5900

--------
EPOCH 9
--------

Adversarial Training Generator :  oracle_sample_NLL = 10.3903

Adversarial Training Discriminator : 
d-step 1 epoch 1 : .......... average_loss = 0.0674, train_acc = 0.9851, val_acc = 0.6100
d-step 1 epoch 2 : .......... average_loss = 0.0545, train_acc = 0.9883, val_acc = 0.6150
d-step 1 epoch 3 : .......... average_loss = 0.0463, train_acc = 0.9889, val_acc = 0.6300
d-step 2 epoch 1 : .......... average_loss = 0.0587, train_ac

d-step 1 epoch 3 : .......... average_loss = 0.0311, train_acc = 0.9927, val_acc = 0.5800
d-step 2 epoch 1 : .......... average_loss = 0.0487, train_acc = 0.9900, val_acc = 0.5700
d-step 2 epoch 2 : .......... average_loss = 0.0385, train_acc = 0.9919, val_acc = 0.5750
d-step 2 epoch 3 : .......... average_loss = 0.0326, train_acc = 0.9927, val_acc = 0.5750
d-step 3 epoch 1 : .......... average_loss = 0.0511, train_acc = 0.9889, val_acc = 0.5800
d-step 3 epoch 2 : .......... average_loss = 0.0384, train_acc = 0.9908, val_acc = 0.5750
d-step 3 epoch 3 : .......... average_loss = 0.0322, train_acc = 0.9928, val_acc = 0.5750
d-step 4 epoch 1 : .......... average_loss = 0.0519, train_acc = 0.9882, val_acc = 0.5700
d-step 4 epoch 2 : .......... average_loss = 0.0413, train_acc = 0.9906, val_acc = 0.5800
d-step 4 epoch 3 : .......... average_loss = 0.0330, train_acc = 0.9921, val_acc = 0.5700
d-step 5 epoch 1 : .......... average_loss = 0.0522, train_acc = 0.9886, val_acc = 0.5800
d-step 5 e

d-step 4 epoch 3 : .......... average_loss = 0.0334, train_acc = 0.9931, val_acc = 0.5350
d-step 5 epoch 1 : .......... average_loss = 0.0428, train_acc = 0.9912, val_acc = 0.5400
d-step 5 epoch 2 : .......... average_loss = 0.0345, train_acc = 0.9930, val_acc = 0.5600
d-step 5 epoch 3 : .......... average_loss = 0.0282, train_acc = 0.9934, val_acc = 0.5600

--------
EPOCH 20
--------

Adversarial Training Generator :  oracle_sample_NLL = 10.0086

Adversarial Training Discriminator : 
d-step 1 epoch 1 : .......... average_loss = 0.0415, train_acc = 0.9912, val_acc = 0.5750
d-step 1 epoch 2 : .......... average_loss = 0.0337, train_acc = 0.9922, val_acc = 0.5800
d-step 1 epoch 3 : .......... average_loss = 0.0270, train_acc = 0.9943, val_acc = 0.5950
d-step 2 epoch 1 : .......... average_loss = 0.0406, train_acc = 0.9912, val_acc = 0.5750
d-step 2 epoch 2 : .......... average_loss = 0.0315, train_acc = 0.9932, val_acc = 0.5900
d-step 2 epoch 3 : .......... average_loss = 0.0251, train_a

d-step 2 epoch 2 : .......... average_loss = 0.0284, train_acc = 0.9940, val_acc = 0.5750
d-step 2 epoch 3 : .......... average_loss = 0.0246, train_acc = 0.9943, val_acc = 0.5750
d-step 3 epoch 1 : .......... average_loss = 0.0451, train_acc = 0.9910, val_acc = 0.5850
d-step 3 epoch 2 : .......... average_loss = 0.0348, train_acc = 0.9925, val_acc = 0.5750
d-step 3 epoch 3 : .......... average_loss = 0.0283, train_acc = 0.9940, val_acc = 0.5800
d-step 4 epoch 1 : .......... average_loss = 0.0323, train_acc = 0.9929, val_acc = 0.5800
d-step 4 epoch 2 : .......... average_loss = 0.0250, train_acc = 0.9943, val_acc = 0.5900
d-step 4 epoch 3 : .......... average_loss = 0.0195, train_acc = 0.9958, val_acc = 0.6000
d-step 5 epoch 1 : .......... average_loss = 0.0336, train_acc = 0.9931, val_acc = 0.5800
d-step 5 epoch 2 : .......... average_loss = 0.0280, train_acc = 0.9938, val_acc = 0.5750
d-step 5 epoch 3 : .......... average_loss = 0.0221, train_acc = 0.9955, val_acc = 0.5800

--------


d-step 5 epoch 2 : .......... average_loss = 0.0210, train_acc = 0.9950, val_acc = 0.5500
d-step 5 epoch 3 : .......... average_loss = 0.0172, train_acc = 0.9959, val_acc = 0.5600

--------
EPOCH 31
--------

Adversarial Training Generator :  oracle_sample_NLL = 9.9169

Adversarial Training Discriminator : 
d-step 1 epoch 1 : .......... average_loss = 0.0334, train_acc = 0.9931, val_acc = 0.5900
d-step 1 epoch 2 : .......... average_loss = 0.0263, train_acc = 0.9945, val_acc = 0.5900
d-step 1 epoch 3 : .......... average_loss = 0.0214, train_acc = 0.9952, val_acc = 0.5800
d-step 2 epoch 1 : .......... average_loss = 0.0329, train_acc = 0.9929, val_acc = 0.5800
d-step 2 epoch 2 : .......... average_loss = 0.0255, train_acc = 0.9946, val_acc = 0.5800
d-step 2 epoch 3 : .......... average_loss = 0.0200, train_acc = 0.9959, val_acc = 0.5800
d-step 3 epoch 1 : .......... average_loss = 0.0311, train_acc = 0.9931, val_acc = 0.5850
d-step 3 epoch 2 : .......... average_loss = 0.0248, train_ac

d-step 3 epoch 1 : .......... average_loss = 0.0325, train_acc = 0.9933, val_acc = 0.5650
d-step 3 epoch 2 : .......... average_loss = 0.0241, train_acc = 0.9953, val_acc = 0.5700
d-step 3 epoch 3 : .......... average_loss = 0.0214, train_acc = 0.9957, val_acc = 0.5600
d-step 4 epoch 1 : .......... average_loss = 0.0276, train_acc = 0.9940, val_acc = 0.5600
d-step 4 epoch 2 : .......... average_loss = 0.0217, train_acc = 0.9956, val_acc = 0.5650
d-step 4 epoch 3 : .......... average_loss = 0.0163, train_acc = 0.9966, val_acc = 0.5650
d-step 5 epoch 1 : .......... average_loss = 0.0294, train_acc = 0.9938, val_acc = 0.5650
d-step 5 epoch 2 : .......... average_loss = 0.0211, train_acc = 0.9954, val_acc = 0.5700
d-step 5 epoch 3 : .......... average_loss = 0.0177, train_acc = 0.9962, val_acc = 0.5600

--------
EPOCH 37
--------

Adversarial Training Generator :  oracle_sample_NLL = 9.8889

Adversarial Training Discriminator : 
d-step 1 epoch 1 : .......... average_loss = 0.0301, train_ac

Adversarial Training Generator :  oracle_sample_NLL = 9.9700

Adversarial Training Discriminator : 
d-step 1 epoch 1 : .......... average_loss = 0.0249, train_acc = 0.9949, val_acc = 0.5750
d-step 1 epoch 2 : .......... average_loss = 0.0197, train_acc = 0.9959, val_acc = 0.5750
d-step 1 epoch 3 : .......... average_loss = 0.0161, train_acc = 0.9970, val_acc = 0.5700
d-step 2 epoch 1 : .......... average_loss = 0.0277, train_acc = 0.9944, val_acc = 0.5650
d-step 2 epoch 2 : .......... average_loss = 0.0215, train_acc = 0.9952, val_acc = 0.5750
d-step 2 epoch 3 : .......... average_loss = 0.0159, train_acc = 0.9966, val_acc = 0.5650
d-step 3 epoch 1 : .......... average_loss = 0.0275, train_acc = 0.9942, val_acc = 0.5650
d-step 3 epoch 2 : .......... average_loss = 0.0211, train_acc = 0.9953, val_acc = 0.5700
d-step 3 epoch 3 : .......... average_loss = 0.0164, train_acc = 0.9966, val_acc = 0.5650
d-step 4 epoch 1 : .......... average_loss = 0.0298, train_acc = 0.9941, val_acc = 0.5850


d-step 3 epoch 3 : .......... average_loss = 0.0137, train_acc = 0.9970, val_acc = 0.5800
d-step 4 epoch 1 : .......... average_loss = 0.0305, train_acc = 0.9944, val_acc = 0.5700
d-step 4 epoch 2 : .......... average_loss = 0.0231, train_acc = 0.9953, val_acc = 0.5700
d-step 4 epoch 3 : .......... average_loss = 0.0195, train_acc = 0.9964, val_acc = 0.5650
d-step 5 epoch 1 : .......... average_loss = 0.0262, train_acc = 0.9946, val_acc = 0.5650
d-step 5 epoch 2 : .......... average_loss = 0.0199, train_acc = 0.9958, val_acc = 0.5700
d-step 5 epoch 3 : .......... average_loss = 0.0169, train_acc = 0.9963, val_acc = 0.5700

--------
EPOCH 48
--------

Adversarial Training Generator :  oracle_sample_NLL = 9.9191

Adversarial Training Discriminator : 
d-step 1 epoch 1 : .......... average_loss = 0.0254, train_acc = 0.9950, val_acc = 0.5650
d-step 1 epoch 2 : .......... average_loss = 0.0190, train_acc = 0.9964, val_acc = 0.5600
d-step 1 epoch 3 : .......... average_loss = 0.0157, train_ac